-2
$\begingroup$

I need the code the following function, where I pass in a mean and a standard deviation, and i pass in a value to sample from that distribution and i should get a probability from 0 to 1.

float getProbabilityFromNormal(float mean, float std, float value)

I have already figured out how to apply that normal distribution formula and all:

Normal Probability Density Function $$F(x) = \frac1{\sigma \sqrt{2\pi}} e^{-(x-\mu)^2/2\sigma^2}$$

However, the value I am getting is not from 0 to 1. I know there is an additional step involved to get an actual 0 to 1 value (You cannot just normalize/scale it. You have to take the area of it or something). If someone could provide me a code sample on how to get this I would really appreciate it!

I have already looked at several examples on this site, but it only has mainly mathematical symbols without too much of a focus on code implementation.

1 Answers 1

1

The formula you have linked to at http://sites.nicholas.duke.edu/statsreview/files/2013/06/normpdf1.jpg is the probability density function (pdf) represented as $f(x)$. This function must be integrated to get a probability from 0 to 1. You are looking for

$F(x) = \int_{-\infty}^xf(\tau)d\tau$

and this type of function is called the Cumulative Distribution Function.

So, since you are not looking for more theory and you just want code, the short answer is that there is no known analytical answer to $F(x)$, and the known solutions come in either tables such as the Q-Function form, the Error Function form, or doing your own numerical approximation to the integral.

https://en.wikipedia.org/wiki/Q-function

You must store a table in your code and interpolate the values for points in between those you explicitly define, or, you must numerically solve the integral of $f(x)$ (approximately) by doing repeated sums over small values of $dx$.

For example, let's say you want a granularity of 0.01. $dx = 0.01$, and a practical lower bound can be $-10\sigma$.

$F(x) \approx \sum_{\tau = -10\sigma}^xf(\tau)0.01$

I hope this math notation is simple enough for you in the above formula. It is extremely easy to implement in code. The hard part is probably finding the right granularity and lower bound assuming all of the possible values of $\sigma$, $\mu$, and $x$ that can be passed to your function, but you can write some simple code to optimize this as well.

EDIT: Also, take note of the interpretation of $F(x)$. It represents the probability that a sample from the distribution lies between $-\infty$ and $x$. The probability of hitting an exact number in a continuous distribution, like the normal, is zero. Usually, people talk about the probability of hitting a number within a range, such as between $x_1$ and $x_2$. For this, the approximate solution is usually written as

$F(x_2) - F(x_1) = \sum_{\tau = x_1}^{x_2}f(\tau)0.01$