3
$\begingroup$

I have a sequence of motions from images and I need to store all these values. For example:

  • a) 1, 2, 3, 4, 3, 2, 1
  • b) 1, 2, 3, 2, 3, 4, 5, 4, 3, 2, 1

In order to save space, it would be necessary to have some function that could generate unique numbers to represent each of these sequences. But also, there is another problem: I need representations which could be compared, so, suppose I have

  • a) 1, 2, 3, 4, 4, 3, 2, 1
  • b) 1, 2, 3, 4, 3, 2, 1

The motion in a) is very similar to b), so I would need some way to represent similarity. There is two patterns for increment/decrement in these sequences, either by 1 or 0, or any other integer value.

I am starting to read about pairing functions and some other similar topics here but I am not sure if I can apply for this (like this topic).

  • 0
    @Lubin Edited the question, thanks.2012-04-01

2 Answers 2

3

If you have some upper bound $n$ on the length of the sequences, then you could store them using the Cantor pairing function as follows. Let $\langle x,y\rangle$ denote the output of the pairing function given $x,y$. Then let $f(x_1,x_2,\ldots,x_k)=\langle \underset{n+1\text{ times}}{\cdots} \langle k,x_1\rangle,x_2\rangle\cdots,x_k\rangle,0\rangle,\cdots,0\rangle$ which allows you to code for both the length of the sequence and its terms by iterating the standard pairing function until it has coded for all the information. However, if you don't have such an upper bound $n$, the only approach I know of is to let $f(x_1,x_2,\ldots,x_k)=p_1^{x_1}p_2^{x_2}\cdots p_k^{x_k}$ where $p_i$ denotes the $i$th prime number. Obviously this is much less efficient.

  • 0
    @Heijden That is what [chat](http://chat.stackexchange.com/) is for.2012-04-02
2

For a unique number to represent the sequence, you could consider a hash function. However, hash functions will give you ugly output that won't be readable and may not be easily reversible. It doesn't seem like that's what you're looking for. If you can't say for sure that your function will have integers, a hash table is a good way to do this. But, the hash values still won't be directly comparable in the way that you seem to want.

For the sequence 1, 2, 3, 4, 3, 2, 1, you could simply consider the unique number 1234321. This might depend on the range of your numbers - it would help/may only be reasonable if you can put some bound on the numbers in your sequence. (if you know all the numbers are integers between 0 and 9, this is fine - we can extend this to numbers between 0 and N (or a and a+N) by representing the sequence in base N+1). This may not save you any space, though - certainly not compared to a hash table.

If you know for sure that your terms will only go up and down by 0 or 1, you could represent them by the string $+++0−−0−$, for example, or equivalently as the number $22201101$ (where 2 is up, 1 is down). As an aside, this could be stored very efficiently by working in base 3. This seems like a good solution for you.

Similarity can be measured in several ways (e.g Hamming distance). In the case of your example, where you have a "deletion" (a 4 has disappeared), it seems sensible to consider the Levenshtein distance. Levenshtein distance applies to strings of text characters, really, but you can easily represent your sequence this way as described in the paragraph above.

It's difficult to understand exactly what you want without more context, but I hope this will help you to research what you need! Further comments welcomed.