4
$\begingroup$

Given some finite sequence $a_n$ for $n \in \{1..N\}$, is there a more efficient method to solve (P1) than the O($N^2$) brute force method?

($P1$) maximise $n_2-n_1$ such that $a_{n_1}>a_{n_2}$

  • 0
    Should that be "maximize $n_2 - n_1$"? Your title doesn't match your equation; maximizing $n_1 - n_2$ implies that $n_1 \gt n_2$, which means that the *later* number in the sequence should be greater than the earlier number—the opposite of your title.2017-01-04
  • 0
    Also, there may not be a "most separated pair" if two equally separated pairs fit the condition. In other words there could be a tie. What should the algorithm return, precisely? The difference only, or the pair? Or the indices of an arbitrarily chosen pair of numbers amongst those pairs "tied" for "most separated" that fit the condition?2017-01-04
  • 0
    Correct on the first point, although the solution to either is a mirror of the other. Any maximizing pair will suffice for a tie. Any such algorithm capable of maximizing this difference will be easy to return whatever quantity is of interest, which in this case is the indices themselves ($n_1$ and $n_2$).2017-01-04
  • 0
    Hmmm. Intuitively, I don't believe you can do better than a brute force method, other than the obvious approach of optimizing your sequence. For example, your first comparison should be between $a_1$ and $a_N$, because if $a_1 \gt a_N$, then no further comparisons are needed. This is provably the optimal algorithm on this *best case.*2017-01-04
  • 0
    @willem I have added an answer that pseudocodes but if it is not understandable, I can write it up fully as a program too. Just let me know in a comment + which language you would prefer (c++/python preferably).2017-01-04

3 Answers 3

3

This is $O(N)$.

If $a_n$ is greater than all the elements before it, call it a peak; and if $a_n$ is less than all elements after it, call it a trough.

Then if $(a_i,a_j)$ is a maximally separated pair satisfying $a_i>a_j$, we know that $a_i$ must be a peak (otherwise there would be an earlier element $a_k$ such that $a_k>a_j$); and similarly, $a_j$ must be a trough.

So here is the algorithm:

  1. Scan forwards through the sequence, marking all peaks ($N$ steps).
  2. Scan backwards through the sequence, marking all troughs ($N$ steps).
  3. Set $p = 1$ (this is the peak index). Note that $a_1$ is a peak by definition.
  4. Set $t$ (the trough index) to be the index of the last trough such that $a_t < a_p$.
  5. Now $t-p$ is a candidate for the maximal separation; if it is the best yet, remember it.
  6. If we have scanned all peaks, we are done. Otherwise, set $p$ to be the index of the next peak and go to Step 4.

Note that we can start the scan in Step 4 from the value of $t$ in the previous iteration, so the total number of operations used by that step in the whole algorithm is $O(N)$.

Edited to add: In fact we can omit Step 1, and just scan for the next peak in Step 6. (We can't omit Step 2, because scanning for troughs has to be done backwards.)

  • 0
    Yes this is the sort of approach I was imagining for an $O(N)$ answer. Nicely done (y)2017-01-05
  • 0
    @gowrath: Is that a two-hands-clapping emoji? I've never seen it before.2017-01-05
  • 0
    Haha its a thumbs up from the early blackberry days2017-01-05
2

I believe you can do this in $O(N \log N)$, but there may be a faster way. Here is a high level description of the algorithm:

  1. Store the elements as tuples of $(value, index)$ where $value$ is the value of the element and index is its position in the original array. So for ex. if your input was [3, 1, 4, 2], you would store it as [(3,0), (1,1), (4,2), (2,3)]. This will be $O(N)$
  2. Sort this new array by the first element (value field) in the tuple in decreasing order. Thus previous example would now look like: [(4, 2), (3, 0), (2, 3), (1, 1)]. This can be done in $O(N\log N)$
  3. Find the maximum difference between second elements (index fields) of the tuples such that the larger one comes after the smaller one in the new array. This can be done in $O(N)$.

Total complexity = $O(N \log N)$.

I think steps $1$ and $2$ are pretty self explanatory. If you are curious about how step $3$ can be done in $O(N)$, here is some code:

def max_diff_indexes(arr):
    """
    @params:

        arr - This is just an array of the SECOND parts of the tuple in 
        the algorithm described above. That is, arr contains only ints 
        representing the second values of the tuple (index field) after 
        the sorting in step 2. 

    @return:

        max_pair - Returns a pair of indexes n_1 and n_2 that maximise 
        n_2 - n_1 such that a_{n_1} > a_{n_2}. The first property is 
        ensured by this algorithm while the second property in ensured
        by the fact that the array was initally sorted in decresing order
        of value

    """
    if not arr: return None
    min_val = arr[0]
    max_diff = 0
    for elem in arr:
        max_diff = max(elem - min_val, max_diff)
        min_val = min(elem, min_val)

    max_val = max_diff - min_val
    return min_val, max_val         
  • 0
    Yes, I think that's it. I had started down this path, but couldn't think of an efficient method for step #3. I think you've got it, thanks.2017-01-04
  • 0
    I believe my answer gives an $O(N)$ algorithm.2017-01-05
0

Alright, I took a crack at it myself and came up with the following approach which seems to be $O\big( (N/M)^2 + N \log(N) \big)$, where $M$ is the optimal value of (P1).

Loop the following over $k$ from $1,2,\dots$:

  • Partition the sequence into $2^k$ contiguous sequences of size at most $\lceil N 2^{-k} \rceil $.
  • Find each group's local minima and maxima, saving the indices as well.
  • Perform this brute-force search among the local maxima / minima to find the best $n_1$ and $n_2$ belonging to different groups
  • if $n_2-n_1 > N 2^{-k}$, break out of this loop and return this pair, since no better pair may be found from within the same group from further subdivision

Each $k$th iteration requires: $O\big( N+ 4^k\big)$ operations, so summing over all necessary $k$ gives the total complexity $O\big((N/M)^2 + N \log(N) \big)$, where the former term dominates for small $M$ and the latter terms dominates for $M \gtrsim \sqrt{N}$.