5
$\begingroup$

I have a collection of a collection of numbers that I need to find the smallest number of groups to put them into whereas the distinct set of numbers in each set does not exceed a threshold. For example:

Collection A: 1,2,3 Collection B: 1,2,4,5 Collection C: 1,3,5,7,9 Collection D: 6,7,8,9,10,11 Threshold: 6 

A valid solution would be:

Grouping 1: A,B {1,2,3,4,5} Grouping 2: C   {1,3,5,7,9} Grouping 3: D   {6,7,8,9,10,11} 

Or

Grouping 1: B   {1,2,4,5} Grouping 2: A,C {1,2,3,5,7,9} Grouping 3: D   {6,7,8,9,10,11} 

In my real problem the number of collections is 1000+ each with 50-300 numbers and my Threshold is 600. I am looking for the following.

  1. Is there a common mathematical name for this type of problem I can research.
  2. Is there a good approach to solving such a problem which results in the smallest number of Groupings.

Determining the affinity of 2 Collections is trivial (Union and compare vs each individual collection), but when I start to think about an arbitrary number of collections being grouped I can't seem to think of any non-brute force way to attacking the problem which would take a tremendous amount of time for a computer to solve after even 100 collections.

  • 0
    This problem (the optimization problem) is NP-Hard. So the best algorithms you will get that run in polynomial time are approximations. Dynamic programming is usually a good place to start for problems like this.2014-04-09

1 Answers 1

1

If your sets were disjoint, you would have the classic bin packing problem. It appears when you group two sets you delete any duplicate elements. For the basic bin packing problem, there are heuristics (largest first, first fit) that have been proven not too far from optimal. You might start by looking for greatest overlap between your sets, as that reduces the number of items to put into bins.