8
$\begingroup$

Suppose we are picking points uniformly at random from the surface of the Earth. I want to compute the probability that I pick a point in the Western hemisphere, given that I pick a point on the equator. The answer should clearly be $1/2$.

From the definition, we have

$$P(A|B)=\frac{P(A\cap B)}{P(B)},$$

where event $A$ is choosing a point in the Western hemisphere and $B$ is choosing a point on the equator. As the equator is a $1$-dimensional smooth line on a $2$-dimensional surface, it has measure $0$. So I compute $P(B)=0$. But using the conditional probability formula requires $P(B)>0$. In fact, this is the definition of conditional probability! So how do I make sense of $P(A|B)$? Clearly, it should work out to be $1/2$, but what is the rigorous way to compute it?

  • 0
    ,sorry i forget territory how equator is located,but why is probability $1/2$?2012-11-14
  • 0
    See http://en.wikipedia.org/wiki/Equator. I reason that the answer is $1/2$ because precisely $1/2$ of the equator lies in the Western hemisphere.2012-11-14
  • 0
    then yes,if first is equator,then it would $1/2$2012-11-14
  • 0
    you can count probability of $B$ as $1$,because if you take point from equator,it does not matter from which part you take,probability is just $1$2012-11-14
  • 0
    But clearly the probability of selection a point on the equator from a uniform distribution on the Earth is 0.2012-11-14
  • 0
    why?equator divides earth into two part,so it would be $1/3$2012-11-14
  • 0
    we dont care about area of each hemisphere,let us assume that area is one,so it is 1/3,shortly north part,south part,and equator2012-11-14

3 Answers 3

7

This is a surprisingly philosophical question, and as such, here is a link to a philosophical paper about it: What Conditional Probability Could Not Be

Practically speaking though, you're absolutely correct - this probability is $1/2.$ However it is difficult to describe this fact using conditional probability the way it is usually understood.

The way I would "rigorously" approach this problem the following: let's say you have a probability space $(X,\Sigma,P)$ and a subspace $Y\subset X$ such that $P(Y)=0$. How do we 'condition' on this space? Well, the same way we consider a "line" integral in $\mathbb{R}^2$: $Y$ becomes your new universe, so you have to define a new probability space $(Y,\Sigma_2,P_2)$ where you can answer questions such as this. The statement $P(A\cap B)/P(B)$ is somewhat like trying to measure the length of a line segment using a bathroom scale - the scale ignores the line segment, so you have to get a ruler instead!

  • 1
    thank you very much for link,$+1$2012-11-14
1

Not an answer, but for those who might have approached it in a different way and got an answer different from $\frac12$, a comment on why that might be ok.

To see why different several answers might be ok in some sense, let's start with what is the purpose of conditioning. Even when we are conditioning to an event of positive probability, it is usually for the purpose of later invoking the law of total probability or the law of total expectation in order to enable computation of some probability or expected value via case-by-case analysis. That is, we divide into cases $B_1,\dots, B_n$ (from a partition of the probability space), and we restrict our attention to each smaller probability space $B_i$ and we observe that some random variables become constant in the smaller space, and we exploit that to compute $P(A|B_i)$ and then we move back to the original big probability space to obtain $P(A)$ by invoking the law of total probability $P(A) = \sum_i P(A|B_i) P(B_i)$. To enable such case-by-case analysis is why we define and use conditional probabilities. That is, the purpose of conditioning is in enabling the trick of double counting in probability theory.

Since such case-by-case analysis has been shown to be extremely useful at least in discrete situations and since there are many situations where one would want to divide the probability space into continuum-many cases, one is bound to ask the following questions:

(1) can we have a good probability theory with probability spaces which are not discrete at all?

(2) If so, can we have a good theory of dividing such a space into uncountably many parts in a way that enables most case-by-case analysis that we might want to employ?

Now as you know, Question (1) is answered yes at the inevitable cost of abandoning the requirement that every subset be assigned a probability.

Question (2) is also answered yes and this time at the inevitable cost of abandoning the requirement that $P(A|B)$, for events A, B with P(B)=0, be assigned a unique number independent of how we partition the ambient probability space (and independent of how we approximate $B$ by positive probability events). Seeing that this cost is necessary is easy, as Borel–Kolmogorov paradox demonstrates. What's slightly less easy is whether this cost (plus some other minor costs) is sufficient for, say, if you are dealing with continuous random variables instead of discrete ones. But as you know, the notion of conditional probability density function works quite well in assigning numbers to $P(A|B)$ in a way that enables double counting arguments whenever a partition is fixed and if the partition comes from a continuous random variable.

Even when we are working with random variables that are not continuous nor discrete (random walk, Brownian motion, etc), we can say yes to Question (2): If the probability space is in some sense approximated by discrete ones (the limiting probability space is called a standard probability space) and if we are partitioning the space using a measurable function (intuitively, this means that the way we partition the space is also approximated by discrete ones), then we can assign numbers to $P(A|B)$ for all events A and partition elements B in such a way that it enables double counting arguments. (For those who want to google precise results, this is known as disintegration of measures)

0

This post is quite old, but I'm surprised that no one pointed out that in problem given by @Potato,

$$P(A∩B)=\frac{1}{2}P(B)$$

so

$$P(A|B)=\frac{P(A∩B)}{P(B)}=\frac{\frac{1}{2}P(B)}{P(B)}=\frac{1}{2} $$

One can see this by thinking about the sets of points involved. B is the set of points on the equator. A∩B is intersection of the set of points in the area of the Western Hemisphere with the set of points on the equator. In other words, A∩B is the set of points on the equator in the Western hemisphere, which is half the set of points on the equator. It doesn't matter that the chances of picking a point exactly on the equator are infinitesimally small. The chance of picking a point on the equator in the Western hemisphere is always exactly half that of picking a point anywhere on the equator. Basically you have to think about what the sets of points involved mean, how they are related, and not get caught up so much on the equator being an infinitesimally small portion of the globe. The ratio of probability relationships associated with halves and wholes remains the same, even in the presence of infinitesimals (or infinities). Assuming a uniform distribution of global point choices, one has to think in terms of "the probability of X for this half is the same as the probability of X for the other half."

It's also important to realize that this line of thinking is simple in this case, because the equator is orthogonal to the meridians that define East/West hemispheres, and both bisect the surface area into exact halves. If one were to use a different circumference, such as one at an angle to the equator, or if one chooses a different latitude than the equator, it gets more complicated.

As an aside, Bayes' Rule works to get the same answer:

$$P(A|B)=\frac{P(B|A)P(A)}{P(B)} $$

Then observe that

$$P(B)=P(B|A)P(A)+ P(B|¬A)P(¬A) $$

Which is to say that the probability of picking a point somewhere on the equator is equal to the sum of the probability of picking a point on the equator from the set of points comprising the Western hemisphere weighted by the probability of picking a point in the Western hemisphere from the global set of points and plus the probability of picking a point on the equator from the set of points that are not in the Western hemisphere weighted by the probability of picking a point that is not in the Western hemisphere from the global set of points. Making that substitution gives

$$P(A|B)=\frac{P(B|A)P(A)}{P(B|A)P(A)+ P(B|¬A)P(¬A)} $$

Next note that P(A) is just the probability of picking a point in the Western hemisphere from the global set of all points. Since the Western hemisphere is, by definition, half of the globe,

$$P(A)=\frac{1}{2}$$

and since the part of the globe that is not in the Western hemisphere is simply the other half,

$$P(¬A)=\frac{1}{2}$$

P(B|A) is the probability of a point being on the equator given that it is in the Western hemisphere. This probability is infinitesimally small, but we know that, as we have stated, the Western hemisphere is half the globe, and that exactly half of the equator lies in the Western hemisphere and the other half in the Eastern, we can say that this infinitesimally small probability is the same for both hemispheres, so

$$P(B|A)=P(B|¬A)$$

Making these substitutions gives

$$P(A|B)=\frac{P(B|A)\frac{1}{2}}{P(B|A)\frac{1}{2}+ P(B|A)\frac{1}{2}}=\frac{1}{2} $$

  • 0
    Unfortunately that all rests on sand -- because $P(A\cap B)=0$ and $P(B)=0$ we have $P(A\cap B)=\frac12 P(B)$ but we also have $P(A\cap B)=\frac13 P(B)$ and $P(A\cap B)=42 P(B)$. Likewise, the fraction in your last formula comes out as $\frac00$.2018-10-03
  • 1
    It's a case where infinitesimals can be considered differently from actual zero. It probably should be stated more rigorously, but the thinking is not too dissimilar from taking the limit of some expression as a variable goes to zero, which can have a value, even while the expression is undefined when an actual zero is substituted.2018-10-03
  • 0
    Let's look at it this way: Let's consider a band centered on the equator with a width, $$ w : w<=1$$, such that the area of the equatorial band is $$E=w S$$ where S is the area of the sphere. Then P(B) $$P(B)=\frac{E}{S}=\frac{w S}{S}=w$$ Likewise $$P(A∩B)=\frac{w}{2}$$ $$\lim \limits_{w \to 0}P(A|B)=\lim \limits_{w \to 0}\frac{A∩B}{P(B)}=\lim \limits_{w \to 0}\frac{w}{2 w}=\frac{1}{2}$$2018-10-03
  • 0
    Sorry, the w is not the width, but a multiplier of S as a consequence of the width defining the area of the band. I apparently I didn't catch that I typed it incorrectly until past the allowed edit time.2018-10-03
  • 0
    On the logic of infinitesimals and zero: It is implicitly assumed that one can randomly pick a point on the surface of a sphere, which implies the probability of picking a point cannot be identically zero despite being infinitely small, or it would be impossible to pick any point at all. Furthermore, the probability of picking one point from a set containing multiple points must be more than the probability of picking any one specific point. So P(B) is also not identically zero.2018-10-03
  • 0
    The problem with your "band of land around the equator" argument is that there are other sequences of bands that are all fatter in, say, the western hemisphere, but where the _limit_ of the sequence is exactly the same equator as that of your sequence of uniform bands. And it doesn't look like you have any principled way to prefer one sequence over another, based on features that are visible in the probability space abstraction.2018-10-03
  • 0
    If you have a way of reasoning about "infinitesimal probabilities" that is _not_ based on the Kolmogorov formalization of probabilities, I'm sure the world would be interested to know. But you would have a lot of work ahead of you proving that it _actually works_ as well as the Kolmogorov axioms do.2018-10-03
  • 0
    Well, it's simply an application of ideas that underpin calculus. I'm actually glad you're taking the time to reply because frankly I thought answer on such an old post might never even be noticed :-)2018-10-03
  • 0
    Regarding the equatorial band approach, I didn't explicitly state that it was a uniformly wide band (I'm not sure why one would choose some other band). If not uniformly wide, it has to at least be symmetric in the regions of both hemispheres or it would bias the resulting probability. It's the same reason one might find a surface of revolution by taking by uniform cylinders rather than some strangely wavy surface.2018-10-03
  • 0
    A probability space does not come with any concept of "symmetric in the regions of both hemispheres".2018-10-03
  • 0
    I'll grant that's true in a more general problem space, but the probabilities in this particular problem are tightly coupled to spherical geometry which embeds all sorts of symmetry. The original poster made clear in comments that the probability distribution over the whole sphere is uniform, so probabilities of symmetric regions must be the same. If one takes into account a non-uniform distribution, given a global probability function, one can still take the probabilities as a continuous scalar field and use the tools of calculus to calculate the probabilities of sub-regions.2018-10-03