13
$\begingroup$

I'm trying to reproduce Excel's MULTINOMIAL function in C# so

$${MULTINOMIAL(a,b,..,n)} = \frac{(a+b +\cdots +n)!}{a!b! \cdots n!}$$

How can I do this without causing an overflow due to the factorials?

  • 0
    I googled the first thing that came to my mind, "computing multinomial coefficients efficiently" and found a recursion formula for multinomial coefficients: http://home.comcast.net/~tamivox/dave/multinomial/index.html2012-09-28
  • 0
    @Graphth That formula isn't as efficient as it looks - even with caching of smaller values it still requires you to know $M(a', b', \ldots, n')$ for all $a\lt a', b\lt b', \ldots$; in other words, the running time (and storage) is $O(abc\ldots n)$.2012-09-28

4 Answers 4

4

Another implementation (in pari/GP) avoids logarithms and works completely in integer mode. It is an extension of the method to compute the binomial sequentially : [update] $\binom {a+b}{a} = {a+1\over 1} \cdot{a+2\over 2} \cdot{a+3\over 3}\cdot{a+4\over 4}\cdots{a+b\over b} = (((((a+1)/1 \cdot (a+2) )/2 \cdot (a+3) )/ 3 \cdots(a+b) / b$. (This is also in the answer of Steven) [/update]
Here is the code:

{mulbin(v)=local(pr,g,maxf);
 pr = vector(#v,r,1); 
 g  = vector(#v); g[1]=1;for(k=2,#v,g[k]=g[k-1]+v[k-1]);
 maxf=1; for(k=2,#v,maxf=max(maxf,v[k]));
for(f=1,maxf,
 for(k=2,#v , if(f>v[k],next());
    pr[k] *= g[k]; pr[k] /=f;
    g[k] ++ ;
  );
 );
 return(prod(k = 2,#v,pr[k])); }

 gp> mulbin([8,2,3])
 %293 = 12870

It is an optimization to provide the elements of the vector with the highest number first; because the outer loop (maxf) is the maximum of the second to last entries in the vector only.

The method has the further advantage that when one decides to use logs anyway: then because of the organization of the outer and the inner loop it needs not compute the log of a number twice (which is a costly operation). The code would simply change to:

{logmulbin(v)=local(logpr,g,maxf,logf);
 logpr = vector(#v,r,0); 
 g  = vector(#v); g[1]=1;for(k=2,#v,g[k]=g[k-1]+v[k-1]);
 maxf=1; for(k=2,#v,maxf=max(maxf,v[k]));
for(f=1,maxf, logf=log(f);
 for(k=2,#v , if(f>v[k],next());
    logpr[k] += log(g[k]); logpr[k] -= logf;
    g[k] ++ ;
  );
 );
 return(sum(k = 2,#v,logpr[k])); }

gp> exp(logmulbin([8,3,2]))
%294 = 12870.0000000
9

Another, more efficient way of computing the coefficients exactly that generally shouldn't overflow unless the result does is by using the characterization of the multinomial coefficient as a product of binomial coefficients: $${a+b+c+\cdots+n\choose a\;b\;c\cdots\;n} = {a+b\choose b}{a+b+c\choose c}\cdots{a+b+c+\cdots +n\choose n}$$ This is easy to prove by multiplying out the right hand side and noting that factors of $(a+b)!,\;(a+b+c)!,\;\ldots$ cancel out. This simplifies the problem of computing a multinomial coefficient to just computing a binary one - and there are several straightforward ways of doing that without overflow. The easiest may be use the recursive formula ${n+1\choose k+1} = \frac{n+1}{k+1}{n\choose k}$ along with some careful manipulation of the (exact) results to make sure you're always dividing out GCDs; for more information on this approach, see http://blog.plover.com/math/choose.html . Alternately, you can use Kummer's theorem that the power of a prime $p$ dividing ${a+b\choose b}$ is the number of carries when $a$ and $b$ are written as base-$p$ numbers and added. By doing this for a suitable range of primes, you can find the prime factorization of the result and thus the result itself (indeed, you could just accumulate the result by finding the dividing exponent for all the primes less than the top of your binomial coefficient in turn).

  • 0
    An outsider to this field says: Don’t you mean the number of carries when $a$ and $b$ are added in $p$-ary notation?2012-09-28
  • 0
    I do - thanks for the catch!2012-09-28
  • 0
    Also, rather than prime factorization, one could use the Chinese remainder theorem.2012-09-28
  • 0
    @DouglasS.Stones I was thinking that as well, but that can get a little complicated w/ managing prime powers, and it's tricky to stay out of overflow range in intermediate computations.2012-09-28
  • 0
    @Ivan Kochurkin This is indeed a programming problem. But some math always helps in optimising the computations!! As proven by the recursive solution with binomial coefficients.2018-01-15
6

If your computer can handle \logarithms and exponentials of numbers, you could calculate the following: $$\log(\frac{(a+b +\cdots +n)!}{a!b! \cdots n!})=\log(a+b +\cdots +n)!)-\log(a!b! \cdots n!)=$$ $$\log(2)+\log(3)+\dots+\log(a+b +\cdots +n)-\log(a!)-\log(b!)-\ldots -\log(x!)=$$ $$\log(2)+\log(3)+\dots+\log(a+b +\cdots +n)-\log(2)-\ldots-\log(a)-\log(2)\ldots-\log(b)\ldots \log(2)-\ldots \log(x)$$

When you do all of these additions, you exponentiate the result using whatever base you started with (as written the base is ten, but you could choose another base).

As a historical note, this is essentially what people did before computers. They used $\log$ tables. I am not a computer programer, so I do not really know how feasible this is in practice or if their are better methods. You can also multiply huge numbers more quickly with a fast fourier transform, but that might be overkill.

  • 0
    It's great! I updated my answer with this formula.2012-09-28
  • 1
    This is interesting and clever. Unfortunately, computers have a fixed finite precision (floating point numbers). This means that this method of computation will quickly yield incorrect results as the numbers grow larger.2016-04-23
5

It's question more about programming, but not mathematics :)

There are four solutions (also see code on github):

  1. Use BigInteger type for large numbers. Also see this question on SO.
  2. Use my code below. It's not most efficient, but pretty good and appears to be better then sledge-hammer implementation.
  3. Calculate values for certain arguments in excel and store their to table for further using. But perhaps not all cases can be calculated by reason of their huge quantity.
  4. Use logarithms how it's described in @Baby Dragon answer (my logarithms code also placed below).

My implementation:

public static ulong Mutinomonal(params uint[] numbers)
{
    uint numbersSum = 0;
    foreach (var number in numbers)
        numbersSum += number;

    uint maxNumber = numbers.Max();
    var denomFactorPowers = new uint[numbers.Max() + 1];
    foreach (var number in numbers)
        for (int i = 2; i <= number; i++)
            denomFactorPowers[i]++;
    for (int i = 2; i < denomFactorPowers.Length; i++)
        denomFactorPowers[i]--; // reduce with nominator;

    uint currentFactor = 2;
    uint currentPower = 1;
    double result = 1;
    for (uint i = maxNumber + 1; i <= numbersSum; i++)
    {
        uint tempDenom = 1;
        while (tempDenom < result && currentFactor < denomFactorPowers.Length)
        {
            if (currentPower > denomFactorPowers[currentFactor])
            {
                currentFactor++;
                currentPower = 1;
            }
            else
            {
                tempDenom *= currentFactor;
                currentPower++;
            }
        }
        result = result / tempDenom * i;
    }

    return (ulong)result;
}

Naive implementation:

public static ulong Mutinomonal(params uint[] numbers)
{
    uint numbersSum = 0;
    foreach (var number in numbers)
        numbersSum += number;

    ulong nominator = Factorial(numbersSum);

    ulong denominator = 1;
    foreach (var number in numbers)
        denominator *= Factorial(number);

    return nominator / denominator;
}

public static ulong Factorial(ulong n)
{
    ulong result = 1;
    for (ulong i = 1; i <= n; i++)
        result = result * i;
    return result;
}

I've checked both implementations on $1, 2, 3, 4, 5, 6$ numbers and in last implementation overflow has been occurred, unlike the first.

UPDATE

I estimed @Baby Dragon answer and wrote this logarithm implemetation on C#:

public static List Logarithms = new List();

public static ulong Multinominal(params uint[] numbers)
{
    uint numbersSum = 0;
    foreach (var number in numbers)
        numbersSum += number;

    double sum = 0;
    for (int i = 2; i <= numbersSum; i++)
    {
        if (i - 2 < Logarithms.Count)
            sum += Logarithms[i - 2];
        else
        {
            var log = Math.Log(i);
            Logarithms.Add(log);
            sum += log;
        }
    }

    foreach (var number in numbers)
        for (int i = 2; i <= number; i++)
            sum -= Logarithms[i - 2];

    return (ulong)Math.Exp(sum);
}
  • 0
    I had to use doubles with your function to deal with big numbers (ulong doesn't cut it).2012-09-28