3
$\begingroup$

.

Let $X \in \mathbb{Z}^n$. I need to find the number of pairs $ (i, j)$ with $ 0 < i < j \le n$ such that $ X_i>X_j$ as well as the number of pairs $ (i, j),\ 0 < i < j \le n$, such that $ X_i=X_j$.

I know how to calculate in $O(n \log(n))$ time complexity. How to prove that a comparison-based algorithm with better complexity does not exist (check "update" section on the bottom)?


An algorithm with $O(n log(n))$ time complexity (one of the known methods):

We store:

1) 2 accumulators for the first and for the second value. (both of them initially 0)

2) Balanced search tree (AVL, for example) with the size of subtree at each vertex (operation "add element" is $O(\log \text{size})$ cost). Elements in this tree are 2-tuples $((-\infty\cup\mathbb{Z}\cup\infty), \mathbb{Z} )$, comparison is done by first element, if first elements are equal - by the second element. Initially this tree is empty.

Subprocedure - find the number of elements in the tree that lay in the interval $[X, Y]$ - with the time cost $O(\log n)$ (trivial)

Call this procedure as count$(X, Y)$

So, for $i$ from $1$ to $n$:

  1. add to first accumulator count( ($-\infty,1$),($X_i -1, i$))
  2. add to second accumulator count (($X_i -1, -1$), ($X_i -1, i-1$))
  3. add to balanced tree ($X_i, i$)

So, $n$ steps with $O (\log{n})$ cost

Update: It cannot be calculated faster than $O(n)$ complexity: The second count can be calculated with $O(n)$ time complexity (with the simple hash tables), so I need to prove the time complexity lower bound only for counting the pairs $ (i, j),\ 0 < i < j \le n$ such that $X_i>X_j$

  • 1
    Out of curiosity, how would one do it in $O(n\log(n)) $?2017-01-19
  • 0
    I'll edit post now2017-01-19
  • 0
    Simple to send you a code, but not in this community :)2017-01-19
  • 1
    thanks for taking the time to editing the post; proving there is no better algorithm may be related to proving that a comparison-based sorting algorithm cannot perform better than $n \log n$2017-01-19
  • 0
    I think about it, but I cannot create fast sorting over this task2017-01-19
  • 1
    I don't mean that you should sort this in some way. What I mean is that the proofs may be related, or they may be similar.2017-01-19
  • 0
    Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/52100/discussion-between-kotomord-and-rserrao).2017-01-19

2 Answers 2

1

Inversion counting is a problem with a long history, and of great importance even today (particularly in e.g. streaming models).

It is natural to see its connection to sorting, since sorting effectively "erases" inversions; and in fact, it is straightforward to see how one can "piggyback" inversion counting on MergeSort without asymptotically increasing its complexity: what happens when I have already sorted two halves of an array, and I am merging them, and I pick an element of from the second half while $j$ elements from the first still remain unmerged?

In fact, if we restrict ourselves to comparison-based counting, i.e. the only way to access our array $x_1,...x_n$ is by asking for given any pair $(i,j)$ whether $x_i< x_j$, inversion counting requires at least as many comparisons as sorting, making its complexity $\Omega(n\lg(n))$. To see it, note that any set of comparisons defines implicitly a directed acyclic graph $G$ whose nodes correspond to $x_1,...,x_n$, and whose arcs correspond to comparisons made; and any in-order visit of $G$ (one that does not visit $x_i$ before $x_j$ if there is an arc from $x_j$ to $x_i$, i.e. if $x_jfollowing element, and then continue my visit of $G$ independently of my choice. This induces two total orderings that are both compatible with all comparisons made so far, and differ by exactly one inversion ($x_j$ with $x_j$), meaning that the parity of their inversion counts is different.

NOTE: I believe this is the same argument given above by kotomord, but I hope it's clearer written this way - because I for one was confused.

This connection with sorting is double-edged, however, in the sense that just like the complexity of sorting varies between $\Theta(n)$ and $\Theta(n \lg(n))$ depending on the machine model one is using, so does the complexity of inversion counting. For example, in the popular and reasonable Word RAM model (in which sorting can be done in time $O(n\lg\lg(n))$), inversion counting can be done in time $O(n\sqrt{\lg(n)})$ (check the tech report here, which also sports a fair bibliography on the topic).

Note that in the Word RAM model the bound for inversion counting is worse than that for sorting, showing that in general you can't reduce inversion counting to sorting. Nor is the converse true: note that the argument on inversion counting requiring at least as many comparisons as sorting, above, does not hold in general if one considers cases when both sorting and inversion counting have the same side-information - e.g. if one knows that the permutation holds exactly $1$ inversion! One can, however, always reduce inversion counting to (2D) dominance counting, and that's how many people approach it.

  • 0
    In my prove I was reduce sorting to inversion counting, not inversion counting to sorting.2017-01-24
  • 1
    No, in your answer you reduce sorting to what I'd call "inversion finding", i.e. finding all the individual inversions: if you can do that, you can sort. But "inversion finding" is harder than inversion counting, because in the former case your output is a set, in the second just the cardinality of that set. So you have just shown that sorting is "easier" than a problem that's "harder" than inversion **counting** - which tells you nothing about the relative difficulty of sorting and inversion **counting**2017-01-24
  • 0
    Sorry, I'll edit my answer.2017-01-24
  • 1
    In my answer I was not referring to yours. I was pointing out that since it's so easy to modify MergeSort to do inversion counting with essentially no overhead, one might be tempted to assume that one can easily modify *any* sorting algorithm to do so, meaning that if one could sort in time $T(n)$ in some machine model, one would be able to count inversions in time $O(T(n))$. **But it's just not the case**, as far as we know (that's what I wrote) because in the Word RAM model we know how to sort in time $O(n \lg\lg(n))$, but we only know how to count inversions in time $O(n\sqrt{\lg(n)})$.2017-01-24
  • 0
    In WordRAM model we can sort anycomparable objects or doubles only?2017-01-24
  • 0
    Information-theory based result about $O(n*log(n))$ as minimal possible sorting time complexity - how the Word RAM model cancel it?2017-01-24
  • 1
    Remember that $O(n\lg(n))$ is the limit on *comparison-based* sorting, where the only piece of information you can obtain about two elements is which is larger. That's a fairly stringent (and unrealistic, as most people who work on the stuff will tell you) model. Just search Word RAM model and you'll find a ton of references about it. Note that I'm not saying Word RAM is the "right" model; but I'm saying that it's a reasonable model that many people use, and that if you want more precise information about that $\lg(n)$ factor you have to be more precise about the machine model you are using.2017-01-24
  • 0
    Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/52407/discussion-between-kotomord-and-anonymous).2017-01-24
3

I found a proof

Get the subcase of this task - set what all $X_i$ are distinct.

Let exist an algorithm F with time complexity O(f(n)), $n \le f(n) \le nlog(n)$

Now we create a new array Y of pairs $(X_i, i)$ and cast with algo to Y.
At any comparizon in F put into log pair (i, j), where $X_i

Where are O(f(n)) values in the log. See it as a graph G with n vertices, where pairs are directed edges. G contains no cycles

Lemma 1. The time complexity of topological sorting of graph G with n vertices and m edges is no more $O(n+m)$

Lemma 2. If graph G contains two different topological orders,then G contains two different topological orders, difference between them is permutation of two adjanced elements.

Both lemmas are proved by the trivial algorithm. http://codepad.org/fLDdQjpt - where is a simple java implementation

So, fix orders $Y_1$ and $Y_2$ Create by them two start arrays $X_1$ and $X_2$, difference beetween them is permutation of 2 elements => $X_1$ and $X_2$ have different inversions count, but F must works equally.

So, we can sort X with $O(n) + O(f(n)) + O(f(n) +n) = O(f(n))$ time complexity (third is a time complexity of the topological sorting of G)

So, $n*log(n) < f(n)$ asymptotically => $f(n) = O(n*log(n))$

  • 1
    Note that what you have proved is that if you know the set of all *individual* inversions, then you have effectively sorted. But you are not looking for the actual set, only for its cardinality (far less information!); so you can't conclude that inversion *counting* is no easier than sorting.2017-01-24
  • 0
    Sorry. my english is bad. If G is not uniqually topological sorted, we can create 2 X's, on it F must have same behaviour (if algorithm is comparable-based)2017-01-24
  • 1
    Note that in lemma 2, you have proved that the two topological orders have *at least* one inversion *relative to each other*. But they could have two (or more) that "cancel out", so they really have the same inversion count: e.g. 1243 and 2134 have the same inversion count (1) and 2 inversions relative to each other. An algorithm that finds the inversion count might do so without ascertaining whether you have 2134 or 1243, in which case you know the inversion count, but you don't know which of the two situations you are in!2017-01-24
  • 0
    Another update - algorithm with only one inversion relative to each other2017-01-25
  • 1
    I believe I understand now what you mean, and I agree. I have typed it up in my own answer (with a reference to yours) in a way that I hope it's a little clearer for the reader. Clever idea, by the way!2017-01-25