Let $A$ and $B$ two finite ordered sets where $A\subseteq B$. How do I count the number of consecutive and non-consecutive ocurrences of $A$ in $B$? For instance, I have nine ocurrences of set $A=\{0,1\}$ in set $B=\{0,0,1,2,3,0,1,2,0,1\}$.
Ordered subsets summation
-
0You mean finite sequences, right? As sets $A = \{0,1\}$, $B = \{0,1,2,3\}$ and there is nothing like "number of occurences" ... And: How do you want to count? – 2012-08-06
-
0Are you looking for an algorithm? – 2012-08-06
-
0@martini, I mean finite sequences. For instance, in $B\{0, 1, 2, 0, 1\}$, the ordered subset $A\{0,1\}$ appears twice. I'd like to count this. – 2012-08-06
-
0Yes, @ArthurFischer. I'm looking for an algorithm. – 2012-08-06
-
0Apparently you mean appearances only of **A** as consecutive entries in **B**, what might best be called consecutive subsequences. – 2012-08-06
-
0If **A** is [0,0] and **B** is [0,0,0], how many appearances of **A** in **B** do you count? That is, are appearances of **A** allowed to overlap? – 2012-08-06
-
0Do you also want to count overlapping subsequences? For example, take $A=(0,1,0)$, $B=(0,1,0,1,0)$, should the result be $1$ (because the second $0$ is already "used up" by the first subsequence) or $2$ (because both at the first and the third item there starts a subsequence $(0,1,0)$)? – 2012-08-06
-
0@hardmath, I want to count overlapping subsequences. I count 3 appearances of $A\{0,0\}$ in $B\{0,0,0\}$ – 2012-08-06
-
0@celtschk, I want to count overlapping subsequences. I count 5 appearances of $A\{0,1,0\}$ in $B\{0,1,0,1,0\}$. – 2012-08-06
-
0@MarcosdaSilvaSampaio: Five? then which one of the two $1$'s in $0,1,0,1,0$ gets the most hits? – 2012-08-06
-
0The proper term to use is that you want to count the number of occurences of a given _word_ $A$ as a _subsequence_ of another word $B$. The term "ordered set" is horribly ambiguous and certainly not appropriate here. Also I've often wondered why people would want to write something like $A\{0,1,0\}$ without the necessary equals sign after $A$. – 2012-08-06
-
1@Marcos: Then how do you find only $3$ occurrences of `01` in `0012301201`? If you're counting non-consecutive subsequences, there would be $9$. – 2012-08-06
-
0@HenningMakholm. It was my mistake. There are indeed 9 non-consecutive subsequences. – 2012-08-06
-
0@MarcvanLeeuwen. Thank you for the advise about $A=\{0,1,0\}$ instead of $A\{0,1,0\}$, once I'm a beginner studying math. Can I count also non-consecutive occurences of _word_ $A$ in _subsequence_ of another word $B$? – 2012-08-06
2 Answers
I think the following algorithm should do what you want (there are obvious optimizations which could be done):
define count(a, b): if a = [] return 1 else if b = [] return 0 else if first(a) = first(b) return count(rest(a), rest(b)) + count(a, rest(b)) else return count(a, rest(b))
Here first(s)
gives the first element of the sequence s
(e.g. first([1,2,2,1,3])
gives 1
), rest(s)
gives the sequence with the first sequence removed (e.g. rest([1,2,2,1,3])
gives [2,2,1,3]
) and []
is the empty sequence. I've used the convention that the empty sequence occurs exactly once in any sequence (this simplified the algorithm).
For example, evaluating count([0,1],[0,1,0,1])
gives in sequence
count([0,1],[0,1,0,1]) → count([1],[1,0,1]) + count([0,1],[1,0,1]) count([1],[1,0,1]) → count([],[0,1]) + count([1], [0,1]) count([],[0,1]) → 1 count([1],[0,1]) → count([1],[1]) count([1],[1]) → count([],[]) + count([1],[]) count([],[]) → 1 count([1],[]) → 0 1 + 0 → 1 1 + 1 → 2 count([0,1],[1,0,1]) → count([0,1],[0,1]) count([0,1],[0,1]) → count([1],[1]) + count([0,1],[1]) count([1],[1]) → count([],[]) + count([1],[]) count([],[]) → 1 count([1],[]) → 0 1 + 0 → 1 count([0,1],[1]) → count([1],[]) count([1],[]) → 0 1 + 0 → 1 2 + 1 → 3
So the algorithm correctly finds 3 different subsequences of [0,1]
in [0,1,0,1]
.
-
1One of the obvious optimizations one has to apply is to memoize the results of `count`. Otherwise, the algorithm will produce the result by adding $1$s together, and since the result can be as large as $\binom{|B|}{|A|}$, that can be vey inefficient. – 2012-08-06
What you're looking for is a string searching algorithm. (From an algorithmic point of view there's no relevant difference between sequences of numbers and sequences of characters).
Such algorithms come in a wide variety of choices, providing different tradeoffs between ease of implementation and speed of search.
The simplest is just to look at every position in $B$ sequentially and check whether a copy of $A$ begins there or not. Count the number of successes.
If, on the other hand, you want to count non-contiguous occurrences of $A$ in $B$, I suggest a dynamic-programming approach. For $1\le p\le|A|$ and $1\le q\le |B|$, let $N(p,q)$ be the number of district non-contiguous appearances of the first $p$ elements of $A$ within the first $q$ elements of $B$. Then you can compute $N(p,q)$ recursively $$ N(p,q) = \begin{cases} N(p,q-1)+N(p-1,q-1) & \text{if }A_p=B_q \\ N(p,q-1) & \text{otherwise} \end{cases} $$ plus some easy base cases that I'll let you imagine. If you fill in a table of $N(p,q)$s starting with low $q$s, the entire computation up to $N(|A|,|B|)$ takes time $O(|A|\times|B|)$.
-
0It does not suffice to count the number of initial positions, as the counting requested is of subsequences, not of subwords. – 2012-08-06
-
0@Marc: But the example in the question is clearly counting subwords, not subsequences. – 2012-08-06