0
$\begingroup$

I am trying to understand $\beta$-distribution, but could not understand how to select the values of $\alpha$ and $\beta$.

To my understanding, the higher the value of alpha, it means more success probability.

Another issue is what does the value of y-axis 'exactly' depicts in beta distribution?

  • 0
    When you ask about the graph, are you talking about the probability density function, $x^\alpha (1-x)^\beta / B(\alpha, \Beta)$? And when you say "this purpose", what purpose do you have in mind?2017-02-16
  • 0
    As a general rule, the numerical values of the ordinates of a pdf have no interest... (there may be small exceptions).2017-02-16

1 Answers 1

2

The value on the y axis of a probability density function like the Beta function is called a probability density. It has the definition that if $p(x)$ is your probability density, then the probability that the random variable is in the interval $[a,b]$ is $$\int_a^bp(x)dx. $$ So the y-value $p(x)$ loosely describes the probability of finding the random variable in a tiny interval near $x$, divided by the length of that interval (limiting as the length gets small).

The beta distribution has pdf $$ \beta_{a,b}(x) = \frac{1}{\beta(a,b)}x^{a-1}(1-x)^{b-1}$$ where $\beta(a,b)$ is the beta function. The random variable described by this distribution only takes values in $[0,1].$ If you sketch this for different values of the parameters $a$ and $b$ you will see some trends:

  1. If $a=b$ the distribution is symmetric. As $a$ and $b$ increase (while keeping $a=b$) the distribution gets more and more peaked around the origin. $a=b=1$ is a uniform distribution and if $a=b<1$ it is peaked near the edges at $x=0$ and $1$.
  2. If $a>b$ the distribution is peaked more on the right-hand side and the variable is more likely to be more than $1/2.$ If $a

You comment about $a$ being related to the number of successes is probably related to the beta distribution being the conjugate prior for the binomial distribution in Bayesian statistics. This pertains to when you're flipping a coin and trying to tell if it is biased (or some analogous experiment).

Say we have a coin of unknown probability of heads $p$. In Bayesian statistics, we start with a prior (guess) distribution for $p$ which is a probability density function (let's call it $f(p)$.) Then after we collect data we revise our prior distribution into a posterior distribution based on the data combined with our prior. It is derived via Bayes' rule $$ f(p|D) = \frac{f(D|p)f(p)}{\int_0^1f(D|p)f(p)dp}$$ where $D$ is the data and $f(D|p)$ is the probability for the data at a fixed value of $p.$

The nice thing about choosing a beta prior for $p$ is that when you do the math the posterior is also beta-distributed. In fact if the data is $n_s$ successes and $n_f$ failures and the prior is $\beta_{a,b},$ then the posterior is $$f(p|D) =\beta_{a+n_s,b+n_f}(p).$$

This makes sense given the shape of the beta distribution described above. Adding to $a$ is like adding successes, which makes $p$ more likely to be greater than $1/2$ and adding to $b$ is like adding failures which makes it more likely that $p<1/2.$ Adding to both $a$ and $b$ is like adding a lot of successes and failures, and if they're roughly equal in amount, then it's likely that $p$ is close to $1/2,$ and this accords with the distribution tightening.

So if you're using a beta prior, and you choose $a=50$ and $b=30,$ it is as if you were initially very unsure about the value of $p$ but had seen an experiment with (about) $50$ successes and $30$ failures and revised your viewpoint.

So that's a basic overview of the beta distribution. Yes higher $a$ generally means higher success probability and hopefully what I said above will help quantify that further. As for how to select it for your application, that would depend on exactly what you're doing, but hopefully this gives you a better feel.