0
$\begingroup$

Let $A$ and $B$ be two finite ordered sets where $A\subseteq B$. How do I count the number of consecutive and non-consecutive occurrences of $A$ in $B$?

For instance, I have nine occurrences of set $A=\{0,1\}$ in the set $B=\{0,0,1,2,3,0,1,2,0,1\}$.

  • 0
    @MarcvanLeeuwen. Thank you for the advise about $A=\{0,1,0\}$ instead of $A\{0,1,0\}$, once I'm a beginner studying math. Can I count also non-consecutive occurences of _word_ $A$ in _subsequence_ of another word $B$?2012-08-06

2 Answers 2

0

I think the following algorithm should do what you want (there are obvious optimizations which could be done):

define count(a, b):   if a = []     return 1   else if b = []     return 0   else if first(a) = first(b)     return count(rest(a), rest(b)) + count(a, rest(b))   else     return count(a, rest(b)) 

Here first(s) gives the first element of the sequence s (e.g. first([1,2,2,1,3]) gives 1), rest(s) gives the sequence with the first sequence removed (e.g. rest([1,2,2,1,3]) gives [2,2,1,3]) and [] is the empty sequence. I've used the convention that the empty sequence occurs exactly once in any sequence (this simplified the algorithm).

For example, evaluating count([0,1],[0,1,0,1]) gives in sequence

count([0,1],[0,1,0,1]) → count([1],[1,0,1]) + count([0,1],[1,0,1])   count([1],[1,0,1]) → count([],[0,1]) + count([1], [0,1])     count([],[0,1]) → 1     count([1],[0,1]) → count([1],[1])       count([1],[1]) → count([],[]) + count([1],[])         count([],[]) → 1         count([1],[]) → 0         1 + 0 → 1     1 + 1 → 2   count([0,1],[1,0,1]) → count([0,1],[0,1])     count([0,1],[0,1]) → count([1],[1]) + count([0,1],[1])       count([1],[1]) → count([],[]) + count([1],[])         count([],[]) → 1         count([1],[]) → 0         1 + 0 → 1       count([0,1],[1]) → count([1],[])         count([1],[]) → 0       1 + 0 → 1   2 + 1 → 3 

So the algorithm correctly finds 3 different subsequences of [0,1] in [0,1,0,1].

  • 1
    One of the obvious optimizations one has to apply is to memoize the results of `count`. Otherwise, the algorithm will produce the result by adding $1$s together, and since the result can be as large as $\binom{|B|}{|A|}$, that can be vey inefficient.2012-08-06
3

What you're looking for is a string searching algorithm. (From an algorithmic point of view there's no relevant difference between sequences of numbers and sequences of characters).

Such algorithms come in a wide variety of choices, providing different tradeoffs between ease of implementation and speed of search.

The simplest is just to look at every position in $B$ sequentially and check whether a copy of $A$ begins there or not. Count the number of successes.


If, on the other hand, you want to count non-contiguous occurrences of $A$ in $B$, I suggest a dynamic-programming approach. For $1\le p\le|A|$ and $1\le q\le |B|$, let $N(p,q)$ be the number of district non-contiguous appearances of the first $p$ elements of $A$ within the first $q$ elements of $B$. Then you can compute $N(p,q)$ recursively $ N(p,q) = \begin{cases} N(p,q-1)+N(p-1,q-1) & \text{if }A_p=B_q \\ N(p,q-1) & \text{otherwise} \end{cases} $ plus some easy base cases that I'll let you imagine. If you fill in a table of $N(p,q)$s starting with low $q$s, the entire computation up to $N(|A|,|B|)$ takes time $O(|A|\times|B|)$.

  • 0
    @Marc: But the example in the question is clearly counting subwords, not subsequences.2012-08-06