2
$\begingroup$

I have a Poisson based distribution as follows:

$P(1)=0.1708$;

$P(2)=0.138$;

$P(3)=0.092$;

...

...

$P(10)=0.000034$;

I pick numbers between $1$ and $10$ according to this distribution but if a number is previously picked, I re-iterate the routine to find a non-picked number in this interval.

In the beginning, I suppose none of the numbers are picked.

I have to pick $5$ numbers among these $10$ numbers.

How can I find the maximum expected number that is picked?

Thanks...

  • 0
    You could calculate the probabilities of each of the $10\times 9 \times 8 \times 7 \times 6= 30240$ possible pick patterns2017-01-18
  • 0
    This looks quite like Coupon Collector Problem2017-01-18
  • 0
    @Henry We can reduce the number of patterns to $\sum_{n=5}^{10} {n-1\choose 4} = 252$2017-01-18
  • 0
    Among these 252 patterns, the max-probable would be the 1-2-3-4-5 pattern, wouldn't it? So what strategy should I follow?2017-01-18
  • 0
    I think you mean that, for each pattern, I will note the picking probability, and the maximum number picked. For the 252 patterns, I will consider all these maxValue-pickingProbability pairs and find the most probable maximum number. Is this the proposed method?2017-01-18
  • 0
    @JaroslawMatlak: Note that the order of picks changes probabilities: the probability the first three picks are $(1,2,3)$ in that order is about $0.0038$ while the probability of $(3,2,1)$ is about $0.0031$.2017-01-18
  • 0
    @Henry Yes, but in all cases $(1,2,3), (3,2,1), (1,3,2),...$ the greatest number is $3$ and their probabilities sum up to $P(\{1,2,3\})=P(1)+P(2)+P(3)$2017-01-18
  • 0
    @JaroslawMatlak: sadly their probabilities do not add up to anything simple. The probabilities of the six possible patterns for the first three choices being $\{1,2,3\}$ add up to $0.020794$ which is neither $P(1)+P(2)+P(3)$ nor $6 P(1)P(2)P(3)$2017-01-18
  • 0
    @Henry - well, I meant they sum up to $P(1)P(2)P(3)$, I don't know why i've posted pluses. But if it don't sum up to the product of probabilities (which seems to be intuitive), then you're right - there are 30240 patterns.2017-01-18
  • 0
    Ok I will evaluate all these data and look if I can calculate this probability using nested loops in matlab. Thanks so much Henry and @JaroslawMatlak. If I can reach the expected maximum, I will have solved that how distant a node will connect to the neighbor nodes in a circular network. So I will reach the average node distance in such formatted semi-regular network. The initial distribution defines the connectiong probability to the 1,2...,10 distant nodes.2017-01-18

2 Answers 2

0

Because my previous answer was wrong, I will use method suggested by @Henry - bruteforce checking all possible pick schemes:

Note, that $$P(a,b,c,d,e)=\frac{P(a)\cdot P(b)\cdot P(c)\cdot P(d)\cdot P(e)}{Q(a)\cdot Q(a,b)\cdot Q(a,b,c)\cdot Q(a,b,c,d)}$$ where $$Q(a,b,...)=1-P(a)-P(b)-...$$

Let $R(x)$ be a function describing the probability, that the greatest drawn ball is $x$. We have then

$R(x)=\sum\limits_{a,b,c,d < x,\\ a\neq b, a\neq c, a\neq d,\\ b\neq c, b\neq d, c\neq d} P(a,b,c,d,x)$

The expected value $E$ is then equal to: $$E=\sum_{n=5}^{10}nR(n)$$

My python script:

P=[0]*11
R=[0]*11

P[1]=0.1708
P[2]=0.138
P[3]=0.092
P[4]=0.0509
P[5]=0.0234
P[6]=0.009
P[7]=0.0029
P[8]=0.00078
P[9]=0.00018
P[10]=0.000034

# Probabilities don't sum up to 1, so we will normalize them
s=sum(P)
for i in xrange(1,11):
  P[i]=P[i]/s


for i in xrange(1,11):
  for j in xrange(1,11):
    if j != i:
      for k in xrange(1,11):
        if k!=i and k!=j:
          for l in xrange(1,11):
            if l!= i and l!=j and l!=k:
              for m in xrange(1,11):
                if m!= i and m!=j and m!=k and m!=l:
                  maxSelected=max(i,j,k,l,m)
                  r = P[i]*P[j]*P[k]*P[l]*P[m]/((1-P[i])*(1-P[i]-P[j])*(1-P[i]-P[j]-P[k])*(1-P[i]-P[j]-P[k]-P[l]))
                  R[maxSelected]+=r

E=0
for i in xrange(1,11):
  E+=i*R[i]
print 'E = '+E

Out: E = 5.60641877293

  • 0
    Yes, this seems more accurate. Thanks very much for spending time.2017-01-19
  • 0
    On the other hand, assuming that the first four number is already picked (because of the high probabilities and repeating routine), and finding the expected "last item" among the numbers >=5 by normalizing their probabilities to sum up to1, I find 5.497 as expected value with this "rough" approximation. Thanks again for your effort ;)2017-01-19
0

According to @JaroslawMatlak's suggestion, I wrote the following Matlab code and found the probabilites of picking maximum numbers as following:

X=(combnk(1:10,5));  % Finds all the 5-combinatons of 10 numbers
Y=sortrows(X,5);  % sorts them according to the last column-the geratest one.
P(1)=0.1708;
P(2)=0.138;
P(3)=0.092;
P(4)=0.0509;
P(5)=0.0234;
P(6)=0.009;
P(7)=0.0029;
P(8)=0.00078;
P(9)=0.00018;
P(10)=0.000034;
Prob=zeros(10,1);

for i=5:10    % Start from 5 because the minimum max-number is 5.
    ind=Y(:,5)==i;   % Find the indices that are equal to i.
    A = Y(ind,:);  % Copy those rows to array A
    for j=1:size(A,1)
        Prob(i)=Prob(i)+P(A(j,1))*P(A(j,2))*P(A(j,3))*P(A(j,4))*P(A(j,5));
    end
end

And I found the probabilities for the numbers to be the maximum one as follows:

P(5)=2,58E-06;
P(6)=2,01E-06; 
P(7)=8,17E-07; 
P(8)=2,36E-07; 
P(9)=5,55E-08; 
P(10)=1,05E-08;

Running the last equation (averaging on x.p(x)) I find 5.81 as the expected value of the maximum number picked. I normalized the probebilities for the numbers >=5 so that their sum is 1. Please comment on the reliability of the result.