My background is not really in mathematics, so please bear with me!
Also: I'm fairly sure this has been studied in literature, but I have no idea how to find it, or even what's its name. It shouldn't be too far from Gaussian Binomial Coefficients, I guess...
Problem statement:
I have a few computer nodes in a network that share information between them.
Every few seconds, a new round starts: each node will randomly select another node in the network, and ask him for what he knows (the information is pulled from that node).
I am interested in knowing how much time it takes for a single piece of information to travel through the network. This can be estimation of the average number of rounds, or the 95th percentile, etc.
A few assumptions:
- For simplicity, I assume a synchronous model: everyone starts the round at exactly the same time, and can only pull information acquired from previous rounds.
- Nodes do not forget! Once they pull the information, they will keep it.
- The set of nodes doesn't change, and every node can contact every other node with equal probability.
Notation:
- $N$ is the total number of nodes in the network
- $N-1$ is the number of nodes that a given node can talk to in a given round.
- $k_0$ is the number of nodes that initially know about the piece of information.
- $k_i$ is the number of nodes that know about the piece of information after $i$ rounds.
- $N-k_i$ is the number of nodes that remain clueless about the piece of information about $i$ rounds.
Rationale:
Things start off quite easily: we can see the problem from the point of view of unknowledgeable nodes randomly choosing to pull information from a knowledgeable node.
- $P($choosing a knowledgeable node$)=\frac{k_i}{N-1}$
- $P($choosing a clueless node$)=\frac{N-k_i-1}{N-1}$
Then I started checking probabilities by changing the number of rounds, and it resembles a binomial distribution:
Probability of everyone knowing after 1 round:$\left(\frac{k_0}{N-1}\right)^{N-k_0}$
Probability of everyone knowing after 2 rounds:$\sum_{k_1=k_0}^{N}\left[\left(\frac{k_0}{N-1}\right)^{k_1-k_0}\left(\frac{N-k_0-1}{N-1}\right)^{N-k_1}\right]$
Generalizing -- for r rounds:$\sum_{k_1=k_0}^{N} \left[\left(\frac{k_0}{N-1}\right)^{k_1-k_0}\left(\frac{N-k_0-1}{N-1}\right)^{N-k_1}\sum_{k_2=k_1}^{N} \left[\left(\frac{k_1}{N-1}\right)^{k_2-k_1}\left(\frac{N-k_1-1}{N-1}\right)^{N-k_2}\sum_{k_3=k_2}^{N} \\ \ldots \\ \sum_{k_i=k_{i-1}}^{N}\left[\left(\frac{k_{i-1}}{N-1}\right)^{k_i-k_{i-1}}\left(\frac{N-k_{i-1}-1}{N-1}\right)^{N-k_{i}} \sum_{k_{i+1}=k_i}^{N}\left[\left(\frac{k_{i}}{N-1}\right)^{k_{i+1}-k_i}\left(\frac{N-k_i-1}{N-1}\right)^{N-k_{i+1}}\sum_{k_{i+2}=k_{i+1}}^{N} \\ \ldots \\ \sum_{k_{r-1}=k_{r-2}} \left[ \left(\frac{k_{r-2}}{N-1}\right)^{k_{r-1}-k_{r-2}}\left(\frac{N-k_{r-2}-1}{N-1}\right)^{N-k_{r-1}} \left(\frac{k_{r-1}}{N-1}^{N-k_r-1}\right) \right] \ldots \right] \right] \ldots \right] \right] $
Simplifying you get:
$\sum_{k_1=k_0}^{N}\sum_{k_2=k_1}^{N}\sum_{k_3=k_2}^{N}\ldots\sum_{k_{r-1}=k_{r-2}}^{N} \left(\prod_{i=1}^{r}\left[\left(\frac{k_{i-1}}{N-1}\right)^{k_{i}-k_{i-1}}\left(\frac{N-k_{i-1}-1}{N-1}\right)^{N-k_{i}}\right] \left(\frac{k_{r-1}}{N-1}^{N-k_{r-1}}\right)\right) $
I would then like to be able to answer stuff like what's the probability everyone knows in less than r rounds.
Questions
- Any alternate way to tackle the problem that makes things easier?
- Any useful identities/etc that I should know about? Currently it looks intractable to me!
- Is there any theory behind this? This looks generic enough, and looks like it has been studied tons of times before, appearing in all sorts of problems. I would really like to know the name!
- I've been told this is a hypergeometric distribution, rather than a binomial. But I would state it is still not quite! Happy to be proven wrong, though.