1
$\begingroup$

I was trying to implemented normal distribution sample and I came upon this answer (java implementation). It was very simple to understand. Essentially, we start with a list of probabilities (decreasing with sum = 1) and then create a increasing list out of it such that value of last element in the list is 1. Then I read about normal distribution and their probability distribution function (PDF). So, I thought I can modify the code so that it uses PDF of normal distribution to generate list of probabilities. The following is my Sage implementation (run the code):

import random

sigma = 8/sqrt(2 * pi)  # standard deviation
variance = sigma^2      # variance
mu = 0                  # mean (or center of distribution)

# probability density of the normal distribution
def pdf(x, sigma, variance, mu):
    return N((1 / sqrt(2 * variance * pi)) * e^(-(x - mu)^2 / (2 * variance)))

# note that sum of probabilities that this function generates is equal to 1/2
# because we are ignoring probabilities of negative numbers
def create_positive_domain_probability():
    probability_list = [0] * ceil(variance)

    for x in range(0, variance):
        probability_list[x] = pdf(x, sigma, variance, mu)

    return probability_list

# normalize probability list such that sum of probability would equal to 1
def normalize_probabilities(probability):
    s = sum(probability)
    return map((lambda x: x / s), probability)

# use output of this function for sampling
def make_increasing_distribution(probability_list):
    size = len(probability_list)
    distribution = [0] * size
    sum = 0

    for i in range(size):
        sum = sum + probability_list[i]
        distribution[i] = sum

    return distribution

# sample from (increasing) distribution
def sample():
    rand = random.uniform(0, 1)
    index = 0

    while(rand > distribution[index]):
        index = index + 1

    return index


probability_list = create_positive_domain_probability()
probability_list = normalize_probabilities(probability_list)
print "1. bar chart of probability list"
bar_chart(probability_list).show() # bar chart of probability list


distribution = make_increasing_distribution(probability_list)
print "2. bar chart of increasing distribution list"
bar_chart(distribution).show()     # bar chart of increasing distribution list

arr = [0] * len(probability_list)
for i in range(10000):    # count of samples
    s = sample()
    arr[s] = arr[s] + 1

print "3. bar chart (histogram) of samples generated"   
bar_chart(arr).show()     # bar chart (histogram) of samples generated

My question is:

  1. what is this method called? is it truly rejection sampling?
  2. what is the increasing sequence that is generated from list of probabilities called?
  • 0
    The increasing sequence is called the cumulative distribution function (CDF).2017-01-15
  • 0
    @JohnHughes so am I doing inversion sampling or rejection sampling?2017-01-15

1 Answers 1

1

Your program is definitely not doing rejection sampling, because it never rejects any output of random.uniform.

As noted in a comment, the array distribution is a cumulative distribution function, abbreviated CDF. Assuming it works, the program performs inversion sampling on this CDF.

You might want to consider simply setting the last value of distribution to $1$ rather than counting on the sum to add up exactly. If the sum does not add up exactly (due to roundoff error or some other effect), and happens to be slightly less than $1,$ there is a very, very small chance that random.uniform will return a number greater than the last entry in distribution. It's not clear what happens then; it appears sample() will attempt the read values beyond the end of the declared length of the array.

You're not really using a normal distribution here. For one thing, all your sampled values are integers, whereas the normal distribution is continuous. You're not even getting a histogram of the normal distribution, since you truncate it just below its mean and somewhere further above. You're just getting some unnamed discrete distribution that is very heavily biased toward low numbers.

Notice what happens if you simply double the value of sigma at the start of the program. The random values range from $0$ through $40$ inclusive ($41$ values) instead of $0$ through $10$ ($11$ values) and about $90\%$ of the samples are in the first one-quarter of the range ($0$ to $10$). Values in the second half of the range (above $20$) have almost no chance of occurring. I wonder if that is what you wanted.