4
$\begingroup$

Let S be a set, and let $2^S$ be the power set of S. From my current understanding (which is very limited, especially since I haven't seen any measure theoretic approaches to probability), the axioms of probability tell us that there exists a function $p: 2^S \rightarrow [0,1]$ with certain properties.

But this seems a bit unsatisfying for two reasons:

  • Why do we need to make such an assumption at all? For a given set $S$, if we explicitly construct a function $p:2^S \rightarrow [0,1]$ and show that it has the desired properties, then we didn't need to assume the existence of such a function; conversely, if we can't find such a function, then it seems that we can't say anything about probabilities in the first place!

  • How do we know that such a function has anything to do with whatever "probability" means "in the real world"? All we've done is identify real numbers in $[0,1]$ with subsets of the set $S$.

  • 0
    The other thing you get wrong is that there might be different probability functions, $p$. You can see this when $S$ is finite.2011-05-25

3 Answers 3

7

I got my hands on a copy of an older edition of Ross's A First Course in Probability. By the axioms of probability, he means the following:

Let $S$ be a set. Then a subset $E \subset S$ is called an event, and the axioms are axioms for a function $P$ which assigns to each event $E \subset S$ a real number $P(E)$. They are:

(1) For all $E$ in $S$, $0 \leq P(E)\leq 1$. (Thus, the probability of an event lies between $0$ and $1$.)

(2) $P(S) = 1$.

(3) For any sequence $\{E_n\}_{n=1}^{\infty}$ of events which are pairwise disjoint -- for all $i \neq j$, $E_i \cap E_j = \varnothing$, we have

$P(\bigcup_{n=1}^{\infty} E_i) = \sum_{n=1}^{\infty} P(E_i)$.

(This is the famous "countable additivity" axiom.)

This is literally what he says. Of course, as various people here have already pointed out, this is wrong. He addresses this in a paragraph at the end of the section:

"Technical Remark. We have supposed that $P(E)$ is defined for all events of the sample space. Actually, when the sample space is an uncountably infinite set $P(E)$ is defined only for a class of events called measurable. However, this restriction need not concern us as all events of practical interest are measurable."

Well, as a pure mathematician this is something of a slap in the face, but never mind.

Maybe the OP's confusion is coming from something in the nature of an "axiom"? If you take high school geometry, it seems like an axiom means "something you will just have to assume because you won't be able to prove it". But this is not what the term means in modern mathematics. Rather an axiom is a property that a given structure might or might not satisfy. Often we want to study all structures satisfying some family of axioms -- e.g., groups, rings, topological spaces -- and the merit of the axioms is that one can prove results which hold for any structure satisfying the axioms. And indeed, in the next section the author proves some simple properties that must hold in any probability space, i.e., for any set $S$ and function $P$ from subsets of $S$ to $\mathbb{R}$, e.g. that if $E_1 \subset E_2$ then $P(E_1)\leq P(E_2)$.

Finally, if I may: I don't think that any student of probability has to take this "frequentist" stuff very seriously. To my mind it sounds like applied mathematics but it is actually philosophy: i.e., it is much harder to develop a coherent and satisfactory theory of $P(E)$ as a certain limit defined via a ratio and repeated experimentation than it is either to develop the theory of probability as a branch of pure mathematics or to apply it to solve actual problems. For instance, many professional poker and bridge players can and do solve certain not entirely trivial probability problems in real time, and they don't do so by philosophizing on the nature of frequency...

  • 0
    When I saw, for example, the "group axioms", I didn't really think of them as "axioms", but more as the definition of a group. I guess in this case, it makes sense to think of the "probability axioms" more as the definition of a probability.2011-05-25
4

The function $p$ is the probability. The axioms attempt to capture the intuitive notion of probability.

Suppose you had some (frequentist) probability $p$ that you can assign to any "event" (subset of $S$). Since we're using the frequentist interpretation, the function is always between $0$ and $1$, and is additive - if $A$ and $B$ are disjoint sets of events, then the probability that either of them happens should be the sum of the individual probabilities. When $S$ is infinite things get slightly more complicated.

Furthermore, you can construct the function $p$ by asking yourself what the probability should be. If your notion of probability is reasonable, then the resulting $p$ will satisfy all the axioms.

The axioms are important so that we can prove things like the law of large numbers (which justifies the frequentist interpretation) or the central limit theorem. In order to prove a general theorem, we need to describe to what object it applies.

It turns out that the axioms mentioned are enough to get some interesting results - like the law of large numbers - and not enough to get others, such as the central limit theorem (you need finite moments).

In other words, you could say that the axioms of probability capture the notion for which the law of large numbers applies.

1

To answer you first question, the "axioms of probability" help us to articulate our intuition and properties we desire for a probability function.

For instance, if you toss a normal coin and you know that a tail has a probability of $\frac1{5}$ to occur then the probability of head is fixed at $\frac4{5}$. You have only one degree of freedom in this case.

Similarly, if you are rolling a normal die with $6$ faces displaying numbers from $1$ to $6$ and if you know that $2$ occurs with probability $\frac5{21}$ and $5$ occurs with probability $\frac6{97}$, then this fixes the probability of some of the other events. For instance, the probability of getting neither $2$ nor $5$ is fixed at $1 - \frac5{21} - \frac6{97}$.

The sets over which the probability gets fixed by specifying the probability of some sets is the motivation for the definition of algebra /$\sigma$-algebra.

Further, note that your definition of probability need not be on the entire $2^S$. In fact in many interesting cases, you cannot define a function, which matches with our intuitive definition of probability, on the entire $2^S$ (For instance, you cannot define a non-negative and translation-ally invariant function on $\mathbb{R}$ to be countably additive on the entire $2^{\mathbb{R}}$).

It is enough to define a function on a $\sigma$-algebra of $S$ and not on the entire $2^S$. ($2^S$ is a $\sigma$-algebra). A very simple $\sigma$-algebra could be something like $\{\emptyset,A,A^C,S\}$. This allows us to talk of probability of the event $A$. This is called the $\sigma$-algebra generated by the set $A$. Defining the probability on $A$ fixes the probability of the rest of the elements in the $\sigma$-algebra generated by $A$. More generally, if you are interested in some events say $S = \{A_{\alpha}\}_{\alpha \in \Gamma}$, it is enough to restrict our definition of probability to the $\sigma$-algebra generated by $S$ denoted by $\sigma(S)$.

For the second question, "How do we know that such a function has anything to do with whatever "probability" means "in the real world"? All we've done is identify real numbers in [0,1] with subsets of the set S"

All we have done now is, as you have said, for each set we have assigned a number between $0$ and $1$ satisfying certain axioms. It is up-to to you to identify this with the probability in "real-world".

For instance, let $S = \{0,1\}$ and define $\mathbb{P}(\{0\}) = \frac1{147}$. You may or may not choose this to model the toss of a coin. If you believe the coin is fair, you will not choose this model since this says the coin is biased.

The theory sets up the framework and it is up-to the modeler to check if this model fits "reality" or the problem he is interested in.