5
$\begingroup$

I am trying to generate a set of N random numbers where the set has a normal distribution.

I'm currently using a brute force approach:

  1. Randomly select N numbers from a normal distribution.
  2. Check the set's standard deviation (more important than mean).
  3. If it is the best set so far, keep it.
  4. Repeat 10000 times, and use the best set.

Is there any better approach? Anyone know where I could look?

Thanks in advance!

  • 1
    If you stop after step 1, then you already have a _sample_ from a normal distribution. I don't think it makes any sense to say that any particular set has a normal distribution. I guess you are trying to maximize the likelihood that this particular sample came from a normal distribution, given some meta-distribution of possible distributions?2011-10-11
  • 3
    Look at [this](http://en.wikipedia.org/wiki/Normal_distribution) article for ideas. The usual pseudo-random number generators give you, more or less, random variables uniformly distributed on $[0,1]$. Given such a generator, you can fiddle with it to get a standard normal. The Box-Muller method is good, not hard to implement. To simulate a normal with mean $\mu$, standard deviation $\sigma$, multiply by $\sigma$, add $\mu$. Repeat $5000$ times to get your simulated sample.2011-10-11
  • 0
    André: That's currently what I'm doing. I was just curious if there was a way to avoid the "repeat 5000 times" step.2011-10-11
  • 0
    Dan: I suppose I should phrase it as "the final set should appear to be normally distributed to a naive observer" (if that makes any sense).2011-10-11
  • 0
    @sharoz: We want our simulated sample to be roughly indistinguishable from sampling (independently) $10000$ times from a normal. We should feel lucky to get them two at a time! Anyway, on a computer, even a primitive one, the process is blazing fast. Maybe there is a variant of Box-Muller that does better, but I don't know of it. Speed is really not an issue. And something close to independence is very important.2011-10-11
  • 0
    @sharoz: How naive should the observer be? You can presumably fool enough of the people enough of the time, but I would want our simulated sample to pass sophisticated randomness tests.2011-10-11
  • 0
    You might find the [Marsaglia Polar Method](http://en.wikipedia.org/wiki/Marsaglia_polar_method) useful. See [my answer to Transform uniform distribution to normal distribution using Lindeberg–Lévy CLT](http://math.stackexchange.com/questions/69245/transform-uniform-distribution-to-normal-distribution-using-lindeberglevy-clt/69284#69284) for a comparison with the [Box-Muller Transform](http://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform).2011-10-11
  • 2
    @sharoz, I don't think it makes sense. Using Box-Muller in step 1, you have a sample from a normal distribution. Rejecting and iterating gives you a sample from some _other_ distribution no matter what criteria are used. What good is it for it to appear normal if it is actually not? (Although now I am curious exactly what distribution does this procedure produce?)2011-10-11
  • 0
    Like @Dan said. But to know what the method actually produces, one needs a definition of what it means that a sample *appear(s) to be normally distributed to a naive observer*.2011-10-11
  • 0
    @Didier, I think the procedure is specified well enough: it generates a sample with a sample variance very close to 1 and presumably a sample mean very close to 0 as well. But it's _too_ normal, i.e. not normal, and I imagine the naive observer would be able to recognize it as such.2011-10-11
  • 0
    @Dan, if I read you correctly, you think each new sample replaces the old one if its empirical variance is closer to $1$ than the old sample's empirical variance is? (The empirical means do not enter the picture since there is no ex aequo empirical variances.) Hmmm... It is too early in the day for me here to see what the distribution of the sample after a large number of steps is. For a sample of size $2$ this should more or less impose the value of $|x_1-x_2|$ to be close to 2, and something similar for larger sizes, but apart from that...2011-10-11
  • 0
    @Didier, yes that is how I am reading the question, although it's not clear how or if the empirical means themselves are checked. It is late here but I turned my comments into an answer just now.2011-10-11
  • 0
    I should say that this is for a perception experiment. If you sample 10 values from a normal distribution and get all 0s, it is technically a random set (albeit unlikely). However, someone looking at it would say that it appears to have low variance. See figure 1 [here](http://www.journalofvision.org/content/8/11/9.full?sid=36832cae-6a78-4081-834e-893fc18da5d1)2011-10-11
  • 1
    *If you sample 10 values... and get all 0s*, the normality is dubious. // Since we are considering sample variances, I might refer you to [this](http://en.wikipedia.org/wiki/Variance#Population_variance_and_sample_variance), if only to stress that the empirical variance of a sample is itself random, and liable, in principle, to take any positive value whatsoever.2011-10-11
  • 0
    The step involving checking the standard deviation is revealing. Somehow it seems desired to achieve a certain standard deviation (although sharoz never says that!). But that's really easy: divide every observation by their standard deviation, then multiply by whatever standard deviation you want to get. You're done in one step; there's no need to do it 10000 times.2011-10-11

1 Answers 1

4

Assume we are talking about a standard normal distribution with zero mean and unit variance. For the observer to be able to answer the question "is this sample from a standard normal distribution?" with a high probability of correctness, he needs to know the distribution of distributions from which the sample may have been generated. The probability that the observer will guess "yes" is maximized when the sample is generated from a standard normal distribution, assuming that is possible. So according to my interpretation, you should use the values generated by Box-Muller in step 1 without inspecting them.

  • 0
    Repeating comment from above: I should say that this is for a perception experiment. If you sample 10 values from a normal distribution and get all 0s, it is technically a random set (albeit unlikely). However, someone looking at it would say that it appears to have low variance. See figure 1 [here](http://www.journalofvision.org/content/8/11/9.full?sid=36832cae-6a78-4081-834e-893fc18da5d1)2011-10-11
  • 0
    I skimmed the paper and I think my answer still applies. If you are trying to determine the perceptual threshold for normally-distributed orientation variance, then you should stop at step 1. Otherwise you are using a distribution that is _not_ normal. Also, you should generate a new random image for _each_ perception experiment to maximize the SNR.2011-10-11
  • 0
    Very well. One step it is. Thanks for the help and info!2011-10-11
  • 0
    You are welcome sir.2011-10-11