10
$\begingroup$

There's a famous interview question variously credited to Microsoft, Google and Yahoo:

Suppose you have given N companies, and we want to eventually merge them into one big company. How many ways are there to merge them?

Assuming you can merge as many companies as you like in a single step, I thought this boils down to "find the number of partitions of a set with N elements", in which case the answer is the Bell number $B_{n}$. This can be computed with this handy recursion cribbed shamelessly from Wikipedia:

$B_{n+1}=\sum_{k=0}^{n}{{n \choose k}B_k}$

$1, 1, 2, 5, 15, 52, 203...$

And you have to substract one since you're already starting from one of the possible sets: $B_{2}=2$, but there's only one way to combine A and B into AB.

However, there are a lot of sources on the net which claim that the correct solution is the Catalan number:

$C_n = \frac{1}{n+1}{2n\choose n} = \frac{(2n)!}{(n+1)!\,n!} = \prod\limits_{k=2}^{n}\frac{n+k}{k} \qquad\mbox{ for }n\ge 0$

$1, 1, 2, 5, 14, 42, 132...$

Which is correct, and why? Or are they both correct depending on the assumptions you make about the somewhat vague problem statement?

  • 1
    Reading your question, I only see one solution: There's only one way to do it, and that's to put them all together. There's definitely something vague about the problem formulation.2012-10-15
  • 0
    I think this is ambiguous as well. Does "eventually" mean that you can take multiple steps? If so, at least how many companies do I have to merge in each step? I could simply choose not to do anything for an arbitrary number of steps, giving me an answer of $\infty$.2012-10-15
  • 0
    Yes, my understanding is that you can take multiple steps, and you can merge as many as you like (including all) in a single step. So for N=3, the valid ways to merge them would be A,B,C -> (AB,C | AC,B | BC,A) -> ABC plus A,B,C -> ABC, for a total of 4 paths.2012-10-15
  • 0
    @jpatokal: How about A,B,C -> A,B,C -> ... ($n$ times) ... -> A,B,C -> ABC where $n$ varies from $1$ to $\infty$, giving me an infinite number of ways?2012-10-15
  • 0
    Presumably the intention is that each step has to merge at least one company with another, and that once merged companies can no longer be broken up.2012-10-15
  • 1
    Given the explanation, it would seem that the number of partitions of a set _does not_ give the right answer: it just counts the possibilities do merge any of them into a partition on the first step (including the possibility to keep them singletons), and then merge them all to one company in one fell sweep on the second step (including the possibility that they _were_ already made into one company in the first step).2012-10-15
  • 1
    Note that $4$ (apparently the answer for $N=3$) is neither a Bell number nor a Catalan number...2012-10-15
  • 0
    As stated in the question, you need to substract one, because you're starting from one of the possible states.2012-10-15
  • 0
    Here is an awesome explanation: http://www.wired.co.uk/magazine/google-answers2014-10-21

4 Answers 4

6

My first interpretation is that a way of merging the $n$ companies corresponds to a sequence $\langle P_k:k=0,\dots m\rangle$ such that:

  1. $P_0=\{1,\dots,n\}$;
  2. $P_{k+1}$ is a partition of $P_k$ for $k=0,\dots,m-1$;
  3. $|P_{k+1}|<|P_k|$ for $k=0,\dots,m-1$; and
  4. $|P_m|=1$.

These sequences evidently correspond to rooted trees with $n$ labelled leaves. The immediate correspondence between the partition sequences and the trees allows the latter to have internal vertices with only one child, so that each $P_k$ corresponds to a level of the tree, and requires that the leaves all be on the same level, but we can clearly collapse the non-branching branches and instead count rooted trees with $n$ labelled leaves in which every internal vertex has at least two children.

Let $M(n)$ be the number of such partition sequences or trees. Clearly $M(1)=M(2)=1$, and $M(3)=\binom32+\binom33=4$.

Edit2: In fact the correct numbers are $1,1,4,26,236,2752$, OEIS A000311, the number of phylogenetic trees with $n$ vertices. $M(n)$ is the sum over all partitions $\sum_{i=1}^mn_ip_i=n$, where the $p_i$ are distinct positive integers, of the terms $$\frac{n!}{\prod_{i=1}^mp_i!^{n_i}n_i!}\cdot\prod_{i=1}^mM(p_i)^{n_i}=\frac{n!}{n_1!n_2!\dots n_m!}\prod_{i=1}^m\left(\frac{M(p_i)}{p_i!}\right)^{n_i}\;;$$ verifying this is a matter of fairly straightforward counting. OEIS gives the much nicer recurrence $$M(n+1)=(n+2)M(n)+2\sum_{k=2}^{n-1}\binom{n}kM(k)M(n-k+1)\;,$$ among several others.

Restricting to binary mergers yields the sequence $1,1,3,15,105$, which appears to be A001147, $(2n-3)!!$. The OEIS entry says that $a_n$ is the number of distinct products of $n=1$ variables with commutative, non-associative multiplication, and those would appear to correspond to the binary merger trees.

  • 0
    I suspect you are missing a large number: what about $(ab)c(de)$? i.e. merge $a$ and $b$, merge $d$ and $e$ aand then merge the three remianing?2012-10-15
  • 0
    @Henry: You’re right: I seem to have shifted to binary mergers part way through. That’s may be a good thing, since I really didn’t like the numbers that I was getting.2012-10-15
  • 0
    Could someone please explain the first `(n+2)*M(n)` term in second equation of Edit2? `M(n+1) = (n+2)M(n) + ...` Shouldn't it be simply `M(n)` as it corrresponds to the case when firstly all `n` elements are merged together then the `n+1th` element is merged in possibly only one way.2012-11-15
  • 0
    @srbh.kmr: That’s simply taken from the OEIS entry; I didn’t try to analyze it, though I did have to adjust the indices to match my starting point.2012-11-15
  • 0
    The partition model in the first paragraph is not the same as the phylogenetic tree model. See my answer.2016-10-28
  • 1
    @Will: I did not say that it was. The phylogenetic tree model corresponds to the series-reduced trees.2016-10-28
  • 0
    My apologies. I should have read more carefully.2016-10-29
  • 0
    @Will: That’s okay; I’m glad that someone has tackled the harder version.2016-10-29
5

So turns out the question probably originates from Steven Skiena's The Algorithms Design Manual, whose answer wiki gives the solution as:

$\prod_{i=2}^{n} \frac{i(i-1)}{2} = \frac{n! (n-1)!}{2^{n-1}}$

Unlike Christian's answer, this assumes "pairwise" (only two companies at a time) non-simultaneous mergers.

4

Assume that any number of companies may be merged at a time. We must decide how much to care about timing. For example, do $$ \begin{aligned} &a,b,c,d\rightarrow\{a,b\},\{c,d\}\rightarrow\{a,b,c,d\}\\ &a,b,c,d\rightarrow\{a,b\},c,d\rightarrow\{a,b\},\{c,d\}\rightarrow\{a,b,c,d\}\\ &a,b,c,d\rightarrow a,b,\{c,d\}\rightarrow\{a,b\},\{c,d\}\rightarrow\{a,b,c,d\}\\ \end{aligned} $$ count as one outcome, two outcomes, or three outcomes? If we care only about which groups get merged, but not about the timing, counting these as one outcome makes sense. But if timing is important, counting these as three outcomes makes sense. One might also disallow the first of the three, assuming that two mergers never take place exactly simultaneously, in which case only the last two would count.

If we assume that timing is unimportant, so that the scenarios above count as one outcome, then for four companies we have $$ \begin{aligned} &a,b,c,d\rightarrow\{a,b,c,d\}\\ &a,b,c,d\rightarrow\{a,b,c\},d\rightarrow\{a,b,c,d\}\quad\text{(plus three others)}\\ &a,b,c,d\rightarrow\{a,b\},c,d\rightarrow\{a,b,c,d\}\quad\text{(plus five others)}\\ &a,b,c,d\rightarrow\{a,b\},c,d\rightarrow\{a,b,c\},d\rightarrow\{a,b,c,d\}\quad\text{(plus 11 others)}\\ &a,b,c,d\rightarrow\{a,b\},\{c,d\}\rightarrow\{a,b,c,d\}\quad\text{(plus two others),} \end{aligned} $$ which is a total of $26$ outcomes. This is the result in the OEIS link relating to phylogenetic trees in Brian Scott's answer. If we consider timing to be important, then the three outcomes in the last line turn into nine outcomes, and the result is $32$ outcomes total. This is the correct answer for the partition model described, but not actually implemented, in Brian Scott's answer. (For a full discussion of the partition model, see Lengyel's constant; for the sequence, see A005121.) If we forbid simultaneous mergers, then we reduce the total by $3$, leaving $29$ total outcomes, as in Christian Blatter's answer.

Added: There was a question in the comments to Brian Scott's answer about the derivation of the recurrence $$ \begin{aligned} M(n+1)=&(n+2)M(n)+2\sum_{k=2}^{n-1}\binom{n}{k}M(k)M(n-k+1)\\ =&nM(n)+2\sum_{k=2}^n\binom{n}{k}M(k)M(n-k+1) \end{aligned} $$ for the number of outcomes in the phylogenetic tree model. The idea is to partition the set of merger histories according to the size, which we will denote by $k+1$, of the first conglomerate to contain firm $n+1$. Note that $k$ ranges from $1$ to $n$.

If $k\ge2$ then there are two scenarios, distinguished by the manner in which firm $n+1$ gets merged with the other $k$ firms. In Scenario I, the other $k$ firms first merge into a single conglomerate, and then firm $n+1$ merges with it. In Scenario II, the other $k$ firms first merge into two or more conglomerates, and then firm $n+1$ and these conglomerates all merge in a single step. Either scenario can take place in $M(k)$ ways, for a total of $2M(k)$ ways.

Once firm $n+1$ has joined a conglomerate, that conglomerate must then be merged, along with the $n-k$ remaining firms, into a single conglomerate. This can take place in $M(n-k+1)$ ways. As there are $\binom{n}{k}$ ways to choose the $k$ firms with which firm $n+1$ forms its first conglomerate, there are a total of $2\binom{n}{k}M(k)M(n-k+1)$ outcomes for a given $k\ge2$. The only difference when $k=1$ is that Scenario II cannot occur. Hence the coefficient $2$ becomes $1$.

The term $(n+2)M(n)$ is a combination of the $k=1$ and $k=n$ terms of the sum.

2nd addition: For completeness, I record the recurrence for the partition model here as well. Let $Z(n)$ be the number of ways in which $n$ firms may be merged. The initial set of $n$ firms may be partitioned into $k$ non-empty parts in $S(n,k)$ ways, where $S(n,k)$ is the Stirling number of the second kind. The set of $k$ parts represent the set of firms present after the first round of mergers. Since at least one merger must occur in each round, we have $1\le k\le n-1$. The new set of firms may be merged in $Z(k)$ ways. From this we get $$ Z(n)=\sum_{k=1}^{n-1}S(n,k)Z(k). $$

3

According to jpatokal's comment there are two notions:

Given a set $S$ of $n\geq1$ distinguishable objects a merger consists in selecting an arbitrary $k$-subset $A\subset S$, $2\leq k\leq n$, and forming the new set $S':=(S\setminus A)\cup\{A\}$ containing $n-k+1

Given a finite set $S_0$ a merger history is a sequence of mergers $S_0\to S_1\to S_2\to \ldots\to S_r$ whereby $|S_r|=1$. (We do not consider the case that at the same instant two disjoint subsets $A_1$, $A_2\subset S$ are merged into two different amalgams $\{A_1\}, \ \{A_2\}$.)

The problem is to determine the number of possible merger histories for a starting set $S$ containing $n\geq1$ elements. Denote this number by $H(n)$. Then $H(1)=H(2)=1$, and the $H(n)$ satisfy the following recursion:

$$H(n)=\sum_{k=2}^n {n\choose k} H(n-k+1)\qquad (n\geq3)\ .$$

The first few values are $$1,1,4,29,336,5687,132294,4047969,157601068, \ldots\quad\ .$$ OEIS does not list such a sequence.

  • 0
    Interesting, but Brian M. Scott's answer, particularly the parallel to phylogenetic trees, seems more convincing. Can you spot a flaw in his argument... or in yours?2012-10-16
  • 4
    @jpatokal: The difference is that in Brian M. Scott's setup there can be simultaneous mergers of disjoint subsets (that's why he is talking about partitions), whereas in my setup mergers happen one at a time.2012-10-16
  • 0
    This [sequence](https://oeis.org/A256006) was added to OEIS in 2015. I've proposed an edit to the sequence linking to this answer.2016-10-28
  • 0
    I think it is better to say that in the phylogenetic tree model of Brian Scott's post the merger histories of two firms prior to being merged with each other are asynchronous. In other words, if $a$ and $b$ merge, and $c$ and $d$ merge, and then the combined firms $\{a,b\}$ and $\{c,d\}$ merge, the phylogenetic tree model does not distinguish the cases where $a$ and $b$ merged before, at the same time as, or after $c$ and $d$ merged. The original partition model in Brian Scott's post does, however, distinguish these three cases.2016-10-29