1
$\begingroup$

Consider the following problem:

There are two arrays A and B with equal number of elements (say each element is 32-bit signed integer). Now I have to estimate if arrays A and B are exact duplicates of each other (notwithstanding the order of the elements). Example: A={1,3,4,4} is the exact duplicate of B={4,3,1,4}

For sake of my question, assume that we cannot solve this exhaustively or by using a hashtable. Now, I define some operator (say R - which is both commutative and associative) which when applied to the elements of A and the elements of B give the same output V. However, V should be obtained only by this combination of elements.

As a side, I want to know if a combination of XOR and OR satisfies criteria for operator R. Essentially, I find XOR_A and XOR_B (A and B being the arrays). Also I find SUM_A and SUM_B. If XOR_A = XOR_B and SUM_A = SUM_B, can I say that the two arrays are exact duplicates.

If above is not true, is there any such operator R ?

  • 0
    Are you trying to come up with a linear time algorithm? You might be interested in this: http://www.cs.uiuc.edu/~jeffe/teaching/497/06-algebraic-tree.pdf2012-03-07
  • 0
    I have removed the operator-theory tag as it is not relevant2012-03-07
  • 0
    As I read the problem, $R\{1,2,4,4\} = R\{1,2,2,4\}$, since $R$ depends on the combination of elements. Is this correct, or is $R$ also dependent on the number of occurences of each element, as your notion of exact duplicates suggests?2012-03-07
  • 0
    Nope. R is dependent on the number of occurrences of each integer. In your example it is incorrect since, "2" occurs only once in A but twice in B. Also "4" occurs twice in A and only once in B.2012-03-07

1 Answers 1

3

Assuming (as the problem seems to imply) that your operator takes 32-bit integers and returns 32-bit integers, then there can't be such an operator R, by a counting argument. It's easiest to show for the simplest case where there are two elements in each array: every such array is equivalent to exactly one array $A$ where $A[0]\leq A[1]$. But for each value $n$ of $A[1]$ there are $n$ values of $A[0]$ that satisfy the 'sorted' condition, and so there are $\displaystyle{\sum_{i=0}^{2^{32}-1}}i = 2^{31}(2^{32}-1)$ different equivalence classes of arrays $A$. Since this is larger than $2^{32}$, it's impossible to map each array uniquely down to a single value no matter what operator you use.

In the more general case, there are ${2^{32}+n-1}\choose{n}$ or (very roughly) $\frac{1}{n!}2^{32n}$ different sorted arrays, and every array is equivalent to exactly one of these; since there are $2^{32n}$ different 'unsorted arrays', for small $n$ it's impossible to create any set of values substantially smaller than the original arrays that can represent them.

Furthermore, if I'm remembering right (and I haven't had a good chance to check my references, so take this with a grain of salt), in general testing if two arrays have the same values is known to require $\Theta(n\log n)$ comparisons (nb: comparisons only; no arithmetic operations are allowed in this model) in the worst case, so while it's possible that there's some set of $O(n)$-computable arithmetic values that uniquely identify each 'sorted list', it would be a surprising result.

  • 1
    Equivalent to sorting, in which sense? For example, it is not clear that having a fast on-line oracle for recognizing pairs of arrays whose elements are the same would allow one to build a fast sorting algorithm. (In fact, I'm almost sure it is clear that it doesn't).2012-03-08
  • 0
    @HenningMakholm I'm having a hard time finding the reference for that specific claim - obviously with a 'sorting subroutine' one can solve this problem in $O(n)$ time, but I agree that it doesn't clearly run the other way. It's been a while since I saw the reference; I may be conflating some information-theoretic result that it requires $\Omega(n\log n)$ time (e.g. by a connected-components argument).2012-03-09
  • 0
    Yes, there's a $\Omega(n\log n)$ worst-case bound for this problem in the comparison model. A correct algorithm cannot answer "yes" without having collected enough information to infer exactly what the correspondence between one array and the other is -- if the evidence admits two _different_ correspondences, there are always arrays that _don't_ match that could have produced it. But there are $n!$ such possible correspondence, and each comparison gives you at most $\log_2 3$ bits of information. Stirling's approximation then says there must be at least $\Omega(n\log n)$ comparisons.2012-03-10
  • 0
    (Note that the above argument implicitly assumes a variant of the problem where neither array may contain an element more than once. That can of course be assumed without loss of generality, since a worst-case lower bound for a restricted set of instances is still a worst-case lower bound for the full problem).2012-03-10