1
$\begingroup$

Consider the following problem:

There are two arrays A and B with equal number of elements (say each element is 32-bit signed integer). Now I have to estimate if arrays A and B are exact duplicates of each other (notwithstanding the order of the elements). Example: A={1,3,4,4} is the exact duplicate of B={4,3,1,4}

For sake of my question, assume that we cannot solve this exhaustively or by using a hashtable. Now, I define some operator (say R - which is both commutative and associative) which when applied to the elements of A and the elements of B give the same output V. However, V should be obtained only by this combination of elements.

As a side, I want to know if a combination of XOR and OR satisfies criteria for operator R. Essentially, I find XOR_A and XOR_B (A and B being the arrays). Also I find SUM_A and SUM_B. If XOR_A = XOR_B and SUM_A = SUM_B, can I say that the two arrays are exact duplicates.

If above is not true, is there any such operator R ?

  • 0
    Nope.$R$is dependent on the number of occurrences of each integer. In your example it is incorrect since, "2" occurs only once in A but twice in B. Also "4" occurs twice in A and only once in B.2012-03-07

1 Answers 1

3

Assuming (as the problem seems to imply) that your operator takes 32-bit integers and returns 32-bit integers, then there can't be such an operator R, by a counting argument. It's easiest to show for the simplest case where there are two elements in each array: every such array is equivalent to exactly one array $A$ where $A[0]\leq A[1]$. But for each value $n$ of $A[1]$ there are $n$ values of $A[0]$ that satisfy the 'sorted' condition, and so there are $\displaystyle{\sum_{i=0}^{2^{32}-1}}i = 2^{31}(2^{32}-1)$ different equivalence classes of arrays $A$. Since this is larger than $2^{32}$, it's impossible to map each array uniquely down to a single value no matter what operator you use.

In the more general case, there are ${2^{32}+n-1}\choose{n}$ or (very roughly) $\frac{1}{n!}2^{32n}$ different sorted arrays, and every array is equivalent to exactly one of these; since there are $2^{32n}$ different 'unsorted arrays', for small $n$ it's impossible to create any set of values substantially smaller than the original arrays that can represent them.

Furthermore, if I'm remembering right (and I haven't had a good chance to check my references, so take this with a grain of salt), in general testing if two arrays have the same values is known to require $\Theta(n\log n)$ comparisons (nb: comparisons only; no arithmetic operations are allowed in this model) in the worst case, so while it's possible that there's some set of $O(n)$-computable arithmetic values that uniquely identify each 'sorted list', it would be a surprising result.

  • 0
    (Note that the above argument implicitly assumes a variant of the problem where neither array may contain an element more than once. That can of course be assumed without loss of generality, since a worst-case lower bound for a restricted set of instances is still a worst-case lower bound for the full problem).2012-03-10