0
$\begingroup$

Let $A$ and $B$ two finite ordered sets where $A\subseteq B$. How do I count the number of consecutive and non-consecutive ocurrences of $A$ in $B$? For instance, I have nine ocurrences of set $A=\{0,1\}$ in set $B=\{0,0,1,2,3,0,1,2,0,1\}$.

  • 0
    You mean finite sequences, right? As sets $A = \{0,1\}$, $B = \{0,1,2,3\}$ and there is nothing like "number of occurences" ... And: How do you want to count?2012-08-06
  • 0
    Are you looking for an algorithm?2012-08-06
  • 0
    @martini, I mean finite sequences. For instance, in $B\{0, 1, 2, 0, 1\}$, the ordered subset $A\{0,1\}$ appears twice. I'd like to count this.2012-08-06
  • 0
    Yes, @ArthurFischer. I'm looking for an algorithm.2012-08-06
  • 0
    Apparently you mean appearances only of **A** as consecutive entries in **B**, what might best be called consecutive subsequences.2012-08-06
  • 0
    If **A** is [0,0] and **B** is [0,0,0], how many appearances of **A** in **B** do you count? That is, are appearances of **A** allowed to overlap?2012-08-06
  • 0
    Do you also want to count overlapping subsequences? For example, take $A=(0,1,0)$, $B=(0,1,0,1,0)$, should the result be $1$ (because the second $0$ is already "used up" by the first subsequence) or $2$ (because both at the first and the third item there starts a subsequence $(0,1,0)$)?2012-08-06
  • 0
    @hardmath, I want to count overlapping subsequences. I count 3 appearances of $A\{0,0\}$ in $B\{0,0,0\}$2012-08-06
  • 0
    @celtschk, I want to count overlapping subsequences. I count 5 appearances of $A\{0,1,0\}$ in $B\{0,1,0,1,0\}$.2012-08-06
  • 0
    @MarcosdaSilvaSampaio: Five? then which one of the two $1$'s in $0,1,0,1,0$ gets the most hits?2012-08-06
  • 0
    The proper term to use is that you want to count the number of occurences of a given _word_ $A$ as a _subsequence_ of another word $B$. The term "ordered set" is horribly ambiguous and certainly not appropriate here. Also I've often wondered why people would want to write something like $A\{0,1,0\}$ without the necessary equals sign after $A$.2012-08-06
  • 1
    @Marcos: Then how do you find only $3$ occurrences of `01` in `0012301201`? If you're counting non-consecutive subsequences, there would be $9$.2012-08-06
  • 0
    @HenningMakholm. It was my mistake. There are indeed 9 non-consecutive subsequences.2012-08-06
  • 0
    @MarcvanLeeuwen. Thank you for the advise about $A=\{0,1,0\}$ instead of $A\{0,1,0\}$, once I'm a beginner studying math. Can I count also non-consecutive occurences of _word_ $A$ in _subsequence_ of another word $B$?2012-08-06

2 Answers 2

0

I think the following algorithm should do what you want (there are obvious optimizations which could be done):

define count(a, b):   if a = []     return 1   else if b = []     return 0   else if first(a) = first(b)     return count(rest(a), rest(b)) + count(a, rest(b))   else     return count(a, rest(b)) 

Here first(s) gives the first element of the sequence s (e.g. first([1,2,2,1,3]) gives 1), rest(s) gives the sequence with the first sequence removed (e.g. rest([1,2,2,1,3]) gives [2,2,1,3]) and [] is the empty sequence. I've used the convention that the empty sequence occurs exactly once in any sequence (this simplified the algorithm).

For example, evaluating count([0,1],[0,1,0,1]) gives in sequence

count([0,1],[0,1,0,1]) → count([1],[1,0,1]) + count([0,1],[1,0,1])   count([1],[1,0,1]) → count([],[0,1]) + count([1], [0,1])     count([],[0,1]) → 1     count([1],[0,1]) → count([1],[1])       count([1],[1]) → count([],[]) + count([1],[])         count([],[]) → 1         count([1],[]) → 0         1 + 0 → 1     1 + 1 → 2   count([0,1],[1,0,1]) → count([0,1],[0,1])     count([0,1],[0,1]) → count([1],[1]) + count([0,1],[1])       count([1],[1]) → count([],[]) + count([1],[])         count([],[]) → 1         count([1],[]) → 0         1 + 0 → 1       count([0,1],[1]) → count([1],[])         count([1],[]) → 0       1 + 0 → 1   2 + 1 → 3 

So the algorithm correctly finds 3 different subsequences of [0,1] in [0,1,0,1].

  • 1
    One of the obvious optimizations one has to apply is to memoize the results of `count`. Otherwise, the algorithm will produce the result by adding $1$s together, and since the result can be as large as $\binom{|B|}{|A|}$, that can be vey inefficient.2012-08-06
3

What you're looking for is a string searching algorithm. (From an algorithmic point of view there's no relevant difference between sequences of numbers and sequences of characters).

Such algorithms come in a wide variety of choices, providing different tradeoffs between ease of implementation and speed of search.

The simplest is just to look at every position in $B$ sequentially and check whether a copy of $A$ begins there or not. Count the number of successes.


If, on the other hand, you want to count non-contiguous occurrences of $A$ in $B$, I suggest a dynamic-programming approach. For $1\le p\le|A|$ and $1\le q\le |B|$, let $N(p,q)$ be the number of district non-contiguous appearances of the first $p$ elements of $A$ within the first $q$ elements of $B$. Then you can compute $N(p,q)$ recursively $$ N(p,q) = \begin{cases} N(p,q-1)+N(p-1,q-1) & \text{if }A_p=B_q \\ N(p,q-1) & \text{otherwise} \end{cases} $$ plus some easy base cases that I'll let you imagine. If you fill in a table of $N(p,q)$s starting with low $q$s, the entire computation up to $N(|A|,|B|)$ takes time $O(|A|\times|B|)$.

  • 0
    It does not suffice to count the number of initial positions, as the counting requested is of subsequences, not of subwords.2012-08-06
  • 0
    @Marc: But the example in the question is clearly counting subwords, not subsequences.2012-08-06