A recent discussion, which may be found here, examined the problem of counting the number of acyclic digraphs on $n$ labelled nodes and having $k$ edges and indegree and outdegree at most one. It was established that the bivariate mixed generating function of this class $\mathcal{G}$ of graphs on $n$ nodes and with $k$ edges is $$ G(z, u) = \exp \left(\frac{z}{1-uz} \right).$$ This immediately implies that the expected number of edges of a random graph from $\mathcal{G}$ is $$\epsilon_n = [z^n]\left. \frac{d}{du} G(z,u) \right|_{u=1}.$$ Evaluating this quantity we obtain $$ \left. \exp \left(\frac{z}{1-uz} \right) z (-1)\frac{1}{(1-uz)^2} (-z)\right|_{u=1} = \left. \exp \left(\frac{z}{1-uz} \right) \frac{z^2}{(1-uz)^2} \right|_{u=1} = \exp \left(\frac{z}{1-z}\right) \left(\frac{z}{1-z}\right)^2$$ Continuing with this calculation we find $$ \epsilon_n = [z^n] \sum_{m\ge 0} \frac{1}{m!} \left(\frac{z}{1-z}\right)^{m+2} = \sum_{m=0}^{n-2} \frac{1}{m!} [z^n] \left(\frac{z}{1-z}\right)^{m+2} = \sum_{m=0}^{n-2} \frac{1}{m!} [z^{n-m-2}] \left(\frac{1}{1-z}\right)^{m+2} = \sum_{m=0}^{n-2} \frac{1}{m!} \binom{n-m-2+m+1}{m+1} = \sum_{m=0}^{n-2} \frac{1}{m!} \binom{n-1}{m+1}.$$ This closed form is actually quite nice, but it does not answer the question that is the most obvious one for this problem, namely Is there an asymptotic expansion of $\epsilon_n$ and if yes, what is the first term?
asymptotics of the expected number of edges of a random acyclic digraph with indegree and outdegree at most one
-
0+1, nice question. I think your first equation for $\epsilon_n$ is missing $[z^n]$ in front? – 2012-12-24
-
0Thanks, fixed it just now. Season's greetings. – 2012-12-24
-
0@Marko Riedel It is preferred to keep the title short. That is why I edited it. But you have re-edited it. Let me know if you would like to keep this long title. – 2012-12-26
-
0@Marvis. That would be kind indeed. I am relatively new here, and I will remember your advice for the future. For now I prefer the title that represents 100% of the problem definition. I would have asked your advice before doing the rollback but I have not figured out the messaging on this site yet. – 2012-12-26
-
0It is also important to note that Joriki pointed out a mistake in the above analysis and you should read his post as well as my reply correcting the mistake. – 2012-12-26
5 Answers
What you calculated is not the expected number of edges but the total number of edges of all graphs in $\mathcal G$ on $n$ nodes. To get the expected number of edges you'd have to divide by the number of these graphs.
Another approach for approximating the expected number of edges is to find the mode of the number of graphs as a function of the number of edges and estimate by how much the mean might deviate from the mode. The number
$$ q_{nk}=\binom nk\binom{n-1}kk!=\frac{n!(n-1)!}{k!(n-k)!(n-1-k)!} $$
of graphs on $n$ nodes with $k$ edges is at a maximum when the new factor $k$ being added to the denominator is equal to the factors $(n-k)(n-1-k)$ being removed:
$$k=(n-k)(n-1-k)\;,$$ $$k=n\pm\sqrt n\;.$$
To see how close the mean is to the mode at $k=n-\sqrt n$, we can approximate the logarithm of the number of graphs using Stirling's approximation:
$$ \begin{align} \log q_{nk}\approx&n\log n+(n-1)\log(n-1)-k\log k \\ &-(n-k)\log(n-k)-(n-1-k)\log(n-1-k)-k \end{align} $$
and thus
$$ \begin{align} \frac{\mathrm d\log q_{nk}}{\mathrm dk} &\approx-\log k+\log(n-k)+\log(n-1-k)\;,\\ \frac{\mathrm d^2\log q_{nk}}{\mathrm dk^2} &\approx-\frac1k-\frac1{n-k}-\frac1{n-1-k}\;. \end{align} $$
Setting the first derivative to zero yields $k=(n-k)(n-1-k)$ again as expected, and substituting $k=n-\sqrt n$ into the second derivative yields approximately $-2/\sqrt n$. Thus the number of graphs as a function of $k$ can be approximated by a Gaussian with width of order $\sqrt[4]n$, so the difference between the maximum and the mean should be at most $O(\sqrt[4]n)$. In fact numerical results suggest a much smaller difference, apparently $O(\log\log n)$ and perhaps even $O(1)$:
$$ \begin{array}{r|r|r|c} \log_2n&\langle k\rangle&n-\sqrt n&\langle k\rangle-(n-\sqrt n)\\\hline 0&0.0000&0.0000&0.0000\\ 1&0.6667&0.5858&0.0809\\ 2&2.1370&2.0000&0.1370\\ 3&5.3441&5.1716&0.1725\\ 4&12.1957&12.0000&0.1957\\ 5&26.5548&26.3431&0.2116\\ 6&56.2228&56.0000&0.2228\\ 7&116.9171&116.6863&0.2308\\ 8&240.2364&240.0000&0.2364\\ 9&489.6129&489.3726&0.2404\\ 10&992.2432&992.0000&0.2432\\ 11&2002.9903&2002.7452&0.2452\\ 12&4032.2466&4032.0000&0.2466\\ 13&8101.7379&8101.4903&0.2476 \end{array} $$
Here's the code to produce those numbers. It might be interesting to find the next term in the expansion analytically.
-
0I congratulate you on this calculation and I am glad to see that the value $n-\sqrt{n} + \frac{1}{4}$ confirms my results for the closed-form expression. – 2012-12-26
Thanks Joriki for this important pointer correcting my mistake. In fact the number of graphs in $\mathcal{G}$ is given by $$ n! [z^n] G(z, 1) .$$ But we have $$[z^n] \exp \left( \frac{z}{1-z} \right) = [z^n] \sum_{m\ge 0} \frac{1}{m!} \left( \frac{z}{1-z} \right)^m$$ which is $$ \sum_{m=1}^n \frac{1}{m!} [z^n] \left( \frac{z}{1-z} \right)^m = \sum_{m=1}^n \frac{1}{m!} [z^{n-m}] \left( \frac{1}{1-z} \right)^m = \sum_{m=1}^n \frac{1}{m!} \binom{n-m+m-1}{m-1} = \sum_{m=1}^n \frac{1}{m!} \binom{n-1}{m-1}.$$ It follows that the value $\epsilon_n$ is given by $$\epsilon_n = \frac{n! \sum_{m=0}^{n-2} \frac{1}{m!} \binom{n-1}{m+1}}{n! \sum_{m=1}^n \frac{1}{m!} \binom{n-1}{m-1}} = \frac{\sum_{m=0}^{n-2} \frac{1}{m!} \binom{n-1}{m+1}}{\sum_{m=1}^n \frac{1}{m!} \binom{n-1}{m-1}} = \frac{\sum_{m=2}^n \frac{1}{(m-2)!} \binom{n-1}{m-1}}{\sum_{m=1}^n \frac{1}{m!} \binom{n-1}{m-1}}.$$ It would appear that this implies $$\epsilon_n \sim n,$$ answering my original question. This even includes the coefficient on $n$ and it means that small components having $o(n)$ edges do not contribute in the limit. I will wait a few hours to give you a chance to prove this last asymptotic result, which does not look all that difficult.
-
0Actually the above does not seem quite right as the maximum number of edges in a graph from $\mathcal{G}$ is $n-1$, so it looks like there must be another term in the expansion. To get $n$ edges the graph would have to be circular. – 2012-12-25
-
0I'm not sure I understand what you're doing here -- the numerical results show quite clearly that the leading terms are $n-\sqrt n$, as expected from my analysis -- that implies $\epsilon_n\sim n$, and it's not in contradiction with the fact that there can be at most $n-1$ edges. – 2012-12-25
-
0I just ran a numeric experiment and it confirms that $$\epsilon_n \sim n -\sqrt{n}.$$ Very beautiful! I also get as a conjecture $$\epsilon_n \sim n - \sqrt{n} + 0.248.$$ In order to confirm this expand the quotient of the two sums. – 2012-12-25
-
0What I am getting is that $$\frac{\sum_{m=2}^n \frac{1}{(m-2)!} \binom{n-1}{m-1}}{\sum_{m=1}^n \frac{1}{m!} \binom{n-1}{m-1}} = \epsilon_n \sim n - \sqrt{n} + 0.248$$ so this confirms your result. We just need to expand the quotient of the two sums to get the whole asymptotic expansion. We could even compute the variance from this. – 2012-12-25
-
0I don't understand -- what sort of numerical experiment did you do that confirmed this more than it had already been confirmed by my results? – 2012-12-25
-
0Also I don't understand how you're using $\sim$ -- it seems you don't mean "asymptotic to" (which is what that symbol is usually used for), since in that case $\epsilon_n\sim n$ already implies $\epsilon_n\sim n-\sqrt n$ and $\epsilon_n\sim n-\sqrt n+0.248$. – 2012-12-25
-
0I am not questioning your results. I read them and they make sense. As for numerics, I computed the quotient for large $n$ and that helped me confirm your work. Do you have a method to get the closed form of the constant term? We just need to do the asymptotic expansions to conclude. – 2012-12-25
-
0Sorry, perhaps I had misunderstood some of what you wrote; you're right, it did sound to me like you hadn't read them. Regarding the "constant term", I'm not convinced yet that it's constant; my results also seem consistent with it being $O(\log\log n)$ -- up to which $n$ have you calculated? – 2012-12-25
-
0I calculated up to $n=2^{15}$. I will reread everything you have written (reading my material carefully you will see that it confirms your work) and check back in a few hours. My point is, given that we now have closed formulas for the expectation and the variance we should be able to get the first few terms of the asymptotic expansions for these two statistics. The generating function encodes information about all factorial moments, therefore, while being more difficult to handle, they are usually superior to approximations of the first term only. – 2012-12-25
We can expand this calculation to the variance in addition to the expected value. To do so we compute $$E[X(X-1)] = \frac{ [z^n] \left. \left(\frac{d}{du}\right)^2 G(z, u) \right|_{u=1}}{[z^n] G(z, 1)}$$ where $X$ is the RV representing the number of edges. Now $$\left. \left(\frac{d}{du}\right)^2 G(z, u) \right|_{u=1} = \left.\left( \frac{d}{du}\right)\exp\left(\frac{z}{1-uz}\right) \left(\frac{z}{1-uz}\right)^2 \right|_{u=1} = \left.\exp\left(\frac{z}{1-uz}\right) \left(\frac{z}{1-uz}\right)^4 + 2 \exp\left(\frac{z}{1-uz}\right) \left(\frac{z}{1-uz}\right)^3 \right|_{u=1} = \exp\left(\frac{z}{1-z}\right) \left(\frac{z}{1-z}\right)^4 + 2 \exp\left(\frac{z}{1-z}\right) \left(\frac{z}{1-z}\right)^3$$ Now expand the two terms in turn. Left term, $$[z^n] \exp\left(\frac{z}{1-z}\right) \left(\frac{z}{1-z}\right)^4 = [z^n] \sum_{m\ge 0} \frac{1}{m!} \left( \frac{z}{1-z}\right)^{m+4} = [z^n] \sum_{m=0}^{n-4} \frac{1}{m!} \left( \frac{z}{1-z}\right)^{m+4} $$ or $$ \sum_{m=0}^{n-4} \frac{1}{m!} [z^{n-m-4}] \left( \frac{1}{1-z}\right)^{m+4} = \sum_{m=0}^{n-4} \frac{1}{m!} \binom{n-m-4+m+3}{m+3} = \sum_{m=0}^{n-4} \frac{1}{m!} \binom{n-1}{m+3}.$$ Right term, $$2 [z^n] \exp\left(\frac{z}{1-z}\right) \left(\frac{z}{1-z}\right)^3 = 2 [z^n] \sum_{m\ge 0} \frac{1}{m!} \left( \frac{z}{1-z}\right)^{m+3} = 2 [z^n] \sum_{m=0}^{n-3} \frac{1}{m!} \left( \frac{z}{1-z}\right)^{m+3} $$ or $$ 2 \sum_{m=0}^{n-3} \frac{1}{m!} [z^{n-m-3}] \left( \frac{1}{1-z}\right)^{m+3} = 2 \sum_{m=0}^{n-3} \frac{1}{m!} \binom{n-m-3+m+2}{m+2} = 2 \sum_{m=0}^{n-3} \frac{1}{m!} \binom{n-1}{m+2}.$$ It follows that $$E[X(X-1)] = \frac{\sum_{m=0}^{n-4} \frac{1}{m!} \binom{n-1}{m+3} + 2 \sum_{m=0}^{n-3} \frac{1}{m!} \binom{n-1}{m+2}}{\sum_{m=1}^n \frac{1}{m!} \binom{n-1}{m-1}}.$$ The question now becomes, what is the asymptotic expansion of this quotient of sums? That would put the variance within reach, given that $E[X^2] = E[X(X-1)] + E[X].$
The following admittedly quite naive program makes it possible to calculate $\epsilon_n$ for $n$ as large as $300000.$ The value that we obtain for the constant term of the expansion is $0.2496.$ This would suggest the conjecture that $$\epsilon_n \sim n - \sqrt{n} + \frac{1}{4}.$$
The code follows.
#include#include #include long double term(int n, int m) { long double v = (long double)1; while(m>1){ v *= (long double)((n-1)-(m-2)); v /= (long double)(m-1); v /= (long double)(m); m--; } return v; } int main(int argc, char **argv) { int n; if(argc!=2 || sscanf(argv[1], "%d", &n)!=1 || n<1){ fprintf(stderr, "expecting a positive integer\n"); exit(-1); } long double p = 0, q = 1; int m; for(m=2; m<=n; m++){ if(m%(n/10)==0) printf("%d%%\n", m*100/n); long double t = term(n, m); p += t*(long double)(m)*(long double)(m-1); q += t; } long double eps = p/q; printf("%.18Le %.18Le %.18Le\n", p, q, eps-((long double)n-sqrtl((long double)n))); exit(0); }
-
0It would be much more efficient to re-code this without the binomial coefficients, because each term in the two sums for some $m$ is a multiple of the one that precedes it. This gives a constant number of multiplications per term as opposed to $m$ multiplications per term and might be useful in the study of the asymptotics of the variance. – 2012-12-26
The following program can be used to calculate $E[X(X-1)]$ from the closed form as a quotient of sums. It would appear to support the conjecture that $$ E[X(X-1)] \sim n^2 - 2 n^{3/2} + \frac{1}{2} n.$$
#include#include #include long double term(int n, int m) { long double v = (long double)1; while(m>1){ v *= (long double)((n-1)-(m-2)); v /= (long double)(m-1); v /= (long double)(m); m--; } return v; } int main(int argc, char **argv) { int n; if(argc!=2 || sscanf(argv[1], "%d", &n)!=1 || n<1){ fprintf(stderr, "expecting a positive integer\n"); exit(-1); } long double p = 0, q = 0, r = 0; int m; for(m=1; m<=n; m++){ if(m%(n/10)==0) printf("%d%%\n", m*100/n); long double t = term(n, m); p += t* (long double)(m)* (long double)(m-1)* (long double)(m-2)* (long double)(m-3); q += 2*t* (long double)(m)* (long double)(m-1)* (long double)(m-2); r += t; } long double eps = (p+q)/r; printf("%.18Le %.18Le %.18Le\n", p, q, (eps-(((long double)n)*((long double)n) -(long double)2* (long double)n*sqrtl((long double)n)))/ (long double)n); exit(0); }
-
0This value can be confirmed by calculating the asymptotic expansion of $$ (n-\sqrt{n}+1/4)(n-1-\sqrt{n-1}+1/4) \sim n^2 - 2 n^{3/2} + \frac{1}{2} n. $$ – 2012-12-27
-
0This gives for the variance $$ Var[X] = E[X(X-1)] + E[X] - E[X]^2 \sim \frac{1}{2} \sqrt{n} - \frac{1}{2}.$$ – 2012-12-27