1
$\begingroup$

I have a problem that might be a well-known/researched issue in mathematics and set-theory.

Basically I want to efficiently combine pairs of sets into one single sets, and keep doing that (with an additional constraint that I will explain in my example). I'm trying to come up with an efficient algorithm to implement.

Example:

Let's say I start with five one-element sets: {a}, {b}, {c}, {d}, {e}. I run my constraint/condition test, which will tell me which set(s) I will keep and which set(s) I will remove.

Think of it like that: myConstraint({a}) = Yes (meaning I will keep) or No (meaning I will delete).

Let's assume that after running my constraint I am left with:

{a}, {b}, {c}, {e}: Set 1. ({d} did not satisfy my constraint).

Now I want to form pairs from the sets that I already have, which will give me:

{a, b}, {a, c}, {a, e}, {b, c}, {b, e}, {c, e}

  • I can simply do that by starting from first set {a} and pair it with the remaining sets {b}, {c}, and {e}, then go the second set {b} and pair it with the remaining sets {c} and {e}, and finally go to the third set {c} and pair it with the remaining set {e}. - Let's call this Method 1

I run my 'myConstraint' on each set and let's say I'm left with:

{a, b}, {a, c}, {b, c}, {b, e}, {c, e}: Set 2. ({a, e} did not satisfy my constraint)

The total sets that satisfy my constraint are the following (Set 1 + Set 2):

{a}, {b}, {c}, {e}, {a, b}, {a, c}, {b, c}, {b, e}, {c, e}

Now I want to form pairs from those sets. The problem is that if I follow Method 1, I will have many duplicates and duplicates are inefficient to detect; will need to check all previous sets if there's a duplicate)

Here's a sample result for using Method 1

{a, b}, {a, c}, {a, e}, {a, a, b} = {a, b} (duplicate), {a, a, c} = {a, c} (duplicate) . . .: Set3

And finally, I will need to get $Set 1 \cup Set3$ Some duplicates in last step too.

I thought of a couple of algorithms by finding mutually exclusive sets, but I always end up with many duplicates. Is there a neat way to solve this problem?

  • 0
    @Aryabhata: Your answer helped. A lot. I'm currently thinking of all my option. I want to observe all the knowledge. I might have some questions, and I will be commenting on the answers when I make sure I fully understand them, or when I give up and need more help. I will work on this problem tonight. Thanks a bunch2012-04-19

2 Answers 2

2

One thing you can possibly do is to use something similar to Tries.

You will have one trie for each possible set size.

Maintain each set in a sorted fashion, i.e. you represent set $\{d,a,b\}$ as $a\to b\to d$ (say using a linked list).

Now to combine two given sets $A$ and $B$, perform a merge sort of the corresponding linked lists, eliminating dupes.

You can find the length of $A \cup B$ (during the merge itself) and then perform a lookup on the corresponding Trie to see if it has appeared before. If it has not, you insert it into the trie.

Using a trie, a lookup of a given set would be dependent on the size of the set you are looking up, and not the number of existing sets.

Basically, a trie is a form of hashing, where you can do the lookup with time dependent only on the item you are looking up.

For instance, if your universe was smaller (say $\le 64$ elements), then on a $64$-bit machine, you could represent each set as a bitvector, which would correspond to a $64$-bit integer.

To merge two sets, all you do is a bitwise OR.

To make lookups of existing sets faster, you maintain a hashtable of the corresponding integers.

Of course, you could possibly use the bit vector approach when your universe is larger, but then computing the hash itself might take time, there might be more collisions etc. You can replace the hash table with a trie in that case.

  • 0
    @RoronoaZoro: 1) You can define any ordering, for instance you can use the row number in the database table, or just the index in an array where you store the elements. 2) That was just an attempt at an optimization, you can do it the way you say. In fact, you might save some space doing it your way.2012-04-20
1

Ok, here is a solution using BDDs.

Assume a set $U := \{ a_1, \ldots, a_n \}$ of finitely many elements is given. We wish to represent a collection $S \subseteq 2^U$ of subsets of $U$, where adding a new subset to the collection and querying whether some subset is contained should be fast.

Associate to each $a_i$ a propositional variable $p_i$. Then a subset $T \subseteq S$ can be represented by a propositional formula as $\varphi_T = \bigwedge_{i: a_i \in T} p_i \cap \bigwedge_{j: a_j \not\in T} \neg p_j$. For example, if $U = \{ a_1, a_2, a_3, a_4, a_5\}$, the subset $T = \{ a_1, a_2, a_5\}$ is represented as $p_1 \wedge p_2 \wedge \neg p_3 \wedge \neg p_4 \wedge p_5$.

If you have a family $T_1, \ldots, T_k$ of subsets, this family can be represented by the propositional formula $\psi_{\{T_1,\ldots,T_k\}} := T_1 \vee \ldots \vee T_k$.

In other words, build a set representation formula is straightforward, and adding a subset $T$ to the collection $V$ of subsets is implemented as $\psi_{V \cup \{ T \}} = \psi_V \vee \varphi_T$.

To find out whether a subset $T$ is contained in a family $V$, we need to test whether $\psi_V \models \varphi_T$, or equivalently, $\neg \psi_V \wedge \varphi_T$ is unsatisfiable.

Now, to implement this in software, the easiest solution is to use an existing BDD library. This library will provide all required operations, and BDD operations for the formulas occuring in set representations such as these tend to be extremely fast.

  • 0
    No, this is not how it works. $T_1 \vee T_2$ corresponds to $\{ \{ a_1 \}, \{ a_2 \} \}$, not $\{ a_1, a_2 \}$.2012-04-21