2
$\begingroup$

(A) Suppose we choose n people at random from the population, and the heights of the people within the population are normally distributed with average height m and standard deviation s. What would be the expected height of the shortest person within the sample?

(B) Then, suppose we remove the shortest person from the original sample as well as k other people chosen at random (these may not include the shortest person, who has already been removed). That is, there is now (n - k - 1) people in the group, where (n - k - 1) > 0. What would be the expected height of the shortest person within this new group?

(C) If step (B) were repeated i times, such that each time the new group contains [n - i(k + 1)] people, what would be the expected height of the shortest person within the group at the ith step? Again, assume here that [n - i(k + 1)] > 0.

From my understanding, the answer to (A) is quite straightforward and may be calculated based on our knowledge of the normal distribution. However, I have no way of answering (B) and (C) - at least without a computer simulation. Any help would be appreciated. Thanks.

  • 0
    Even part (A) is a difficult question for a normal distribution. See [this paper](http://www.untruth.org/~josh/math/normal-min.pdf) about the minimum of independent normal variables.2017-02-16
  • 0
    This is the same question you already asked. What was wrong with my answer?2017-02-16
  • 0
    The original question was flagged for being unclear - so I have re-worded it.2017-02-16
  • 0
    @AngusMurray Why didn't you simply edit the original question to reword it?2017-02-16
  • 0
    Please answer my question:. What was wrong with my answer? Was the answer not clear? Please undelete your previous question so I could save my answer. It disappeared with your question.2017-02-16
  • 0
    I apologise for deleting the question - I am relatively new to this forum and I was unaware of the normal protocol. I will try to recover the deleted answer.2017-02-16
  • 0
    Just for a second, please. Then you can delete it again.2017-02-16
  • 0
    ????????? I worked on that answer more than one hour. You ignored and deleted my answer. This is not nice.2017-02-16
  • 0
    @zoli Please find original question undeleted2017-02-16
  • 0
    Now, delete it. I copied my answer here. Please consider my answer now.2017-02-16

1 Answers 1

1

Question A

So, we have $n$ random measurements and we are interested in the distribution (and the mean) of the smallest element. Usually the lament is denoted by $X_{(1)}$.

In order to get the pdf of $X_{(1)}$ at an $a$, we have to calculate

$$f_{(1)}(a)=\lim_{\Delta a\to 0}\frac1{\Delta a} P(X_{(1)}\in[a,a+\Delta a)).$$

S0, let's proceed the following way

$$P(X_{(1)}\in[a,a+\Delta a))=P(\Phi( a)\le \Phi(X_{(1)})<\Phi(a+\Delta a))=$$ $$=P(\Phi( a)\le U_{1}<\Phi(a+\Delta a))$$ where $\Phi$ is the gaussian cdf with parameters $m$ and $s$ and $U_{(1)}$ is the smallest element of a sample of $100$ independent and uniformly distributed random variables (uniform over, of course $[0,1]$).

It is known that for the smallest element of a sample of size $100$ of i.i.d uniform random variables the probability to fall in an interval is

$$P(\Phi( a)\le U_{1}<\Phi(a+\Delta a))=n(1-\Phi(a))^{n-1}(\Phi(a+\Delta a)-\Phi(a)).$$

As a result, the pdf of the smallest element is

$$f_{(1)}(a)=\lim_{\Delta a\to 0}\frac1{\Delta a}n(1-\Phi(a))^{n-1}(\Phi(a+\Delta a)-\Phi(a))=n(1-\Phi(a))^{n-1}\phi(a).$$ Where $\phi$ is the cdf belonging to $\Phi$.

In the case of the standard normal distribution ($m=0$ and $s=1$) the cdf looks like this

enter image description here

The mean of this distribution is

$$n\int_{-\infty}^{\infty}a(1-\Phi(a))^{n-1}\phi(a)\ da.$$

This integral can be evaluated only by numerical methods. For $n=100$ $m=0$ and $s=1$ (which data don't really belong to height measurements) the mean is $\approx -2.5$.

As far as the largest number, the solution is symmetrical...


Question B

Let's assume that we omit the smallest number and $1$ more randomly selected samples from our set of height measurements. The probability that we omit the second smallest data is $\frac{1}{n-1}$ and the probability that don't remove that element is then $\frac{n-2}{n-1}$.

So, the pdf of the random variable we seek for is

$$f^1=\frac{n-2}{n-1}f_{(2)}+\frac1{n-1}f_{(3)}$$

where $f_{(2)}$ and $f_{(3)}$ are the densities belonging to the order statistic of the original sample of $n$ element. For both of these densities we can use the argumentation above except that we have to use the formulas given here.

If we omit $k$ further samples the the first step is to calculate the probabilities that we omit the remaining smallest element will be $X_{(2)}$ or $X_{(3)}$ or $X_{(4)}$ and so on.

Having these probabilities we can use the argumentation given in the answer answer in part A. Again, the necessary formulas can be found here.

Question C

Suppose that we have $n=\ell (k+1)+1$ numbered balls sitting on the table in the order of their numbers. Now, remove the first ball and $k$ other balls randomly. The balls are still in order but $k+1$ of them are missing. Remove the ball of the smallest number again an $k$ other balls, and so on. Do this until only one ball remains. Since we had to repeat the operation $\ell$ times the first $\ell$ balls will disappear for sure. Now, we have to calculate the probabilities that the number on the remaining ball is $\ell+1$ or $\ell+2$, ... If you know these probabilities then you will know the probabilities that $X_{(\ell+1)}$ or $X_{(\ell+2)}$ or ... will be the order statistic whose pdf will have to be calculated.

The calculation, again is based on answer A and the formulas given here.

  • 0
    I appreciate the time you've taken to write this out. You have basically answered part (A) for me. Thank you.2017-02-16
  • 0
    OK, I gave further hints to help...2017-02-16