I'm having trouble understanding the definition of variance of a Bernoulli distribution. I thought that the variance was the sum of the squared absolute values of each data point's distance from the mean divided by the number of distributions. However, I see a different definition of the variance for Bernoulli distributions:
In general, it is useful to think about a Bernoulli random variable as a random process with only two outcomes: a success or failure. Then we build our mathematical framework using the numerical labels 1 and 0 for successes and failures, respectively. If p is the true probability of a success, then the mean of a Bernoulli random variable X is given by:
$$ \mu = E[X] = P(X = 0) * 0 + P(X = 1) * 1 $$ $$ = (1 p) ⇥ 0 + p ⇥ 1 = 0 + p = p $$
Similarly, the variance of X can be computed:
$$ \sigma ^2 = P(X = 0)(0 - p)^2 + P(X = 1)(1 - p)^2 $$
$$ p = (1 - p)p^2 + p(1 - p)^2 $$
$$ = p(1 - p) \ $$
What is going on above? That doesn't seem like the standard definition of variance. Why are we taking the probability of $X = 0$ and multiplying it by the squared difference of p from 0?