1
$\begingroup$

If I know something with 51% certainty of being true, how many other independant facts on the same subject (all with 51% certainty of being true) do I need to know to have 99% certainty that my knowlege of the subject is correct?

This is assuming that any one 51% probability fact being proven false would disprove all my knowledge on the subject.

This came up in a discussion on science, where I asserted that no piece of knowledge has to be 99% certain to be 99% certain overall, because when we aggregate independant but related facts, the probability of being correct is greater than the probability of any one fact being correct.

4 Answers 4

3

The key part is where you have said if ONE thing is wrong, it is all wrong.

So you cannot multiply the probabilities of being wrong, this gives you the probability of being wrong on ALL things instead of just one.

You need to multiply the probability of being correct 0.51 but then we find that with every extra contribution, we become less and less sure over all.

The probability of being correct overall when you are 51% sure of even two things is 26%, three things is 13.26% etc.

You can never be more sure overall than how sure you are of the most sure individual component.

0

revised answer

This is a rather complex issue, not just a simple probability matter. A priority probability also comes in, and if 51% probability is significantly higher, repeated results of 51% will increase the probability of the premise being true, not make it less likely, as for tests for ESP. Another parallel is that repeated tests showing positive for a disease increase the probability of the test being correct.

However, if you say that one assertion being wrong disproves the premise, Roy is correct.

  • 0
    Hmm, I actually thought it would be a very low number! I was trying to prove (to a creationist) that if you know lots of things but know that they only have a slightly better than even probaility of being right, your method is statistically more likely to be the truth than if you know less things but are more certain. Thanks!2012-09-19
  • 0
    I have revised my answer, as i missed the part of your question that said that if one fact is proved wrong, the premise is disproved2012-09-19
0

A simpler way to look at this since you discussing the nature of evidence and the certainty you can expect from aggregating a certain type of evidence you could maybe look at it like this...

You are trying to build up an aggregate level of certainty based on a number of independent sources, that is to say that you are looking for the expected level of certainty you will get from the sum of all of these sources.

Each piece of evidence is essentially a random variable and the expected value of the sum of these random variables is the level of certainty you can expect from this source.

If the evidence is composed of all independent and identically distributed random variables $X^k$ you can say that the expected certainty from continuous inputs of this evidence is:

$\frac{X_1^k+X_2^k...+X_n^k}{n} \rightarrow EX^k \quad as\quad n \to \infty$

Meaning you get $EX^k$ or 51% if that's the expected value of one piece of this kind of evidence.

0

The hardest part about a problem like this is figuring out what you actually mean.

The setup it sounds like you are trying to describe is this:

  • You have a collection of statements $F_i$ -- the "facts" you know
  • You have a statement $H$ -- the "truth" you are trying to learn about
  • $P(F_i) = 0.51$ -- the probability of any particular fact being true is 51%
  • All of the $F_i$ are independent
  • If any $F_i$ is false, then $H$ is false too.

From this, you can deduce that if you have $n$ "facts", then $P(H) \leq 0.51^n$. An intuitive interpretation is that each $F_i$ you add to your repository of knowledge is evidence against $H$: you are collecting many, independent ways in which $H$ could fail, and each way has a 49% chance of actually happening.


It's probable that this isn't really what you meant. Maybe you meant something like $P(F_i | H) = 0.51$ -- that is, "if $H$ is true, then $F_i$ will be true with 51% probability".

In some sense, each $F_i$ you collect is evidence for $H$, and each $F_i$ that turns out to be false is evidence against $H$, but this really doesn't make sense in isolation. One way for it to make sense is when you're comparing $H$ against some other hypothesis $H'$, in which case you can make sense of the rate at which the evidence favors $H$ versus $H'$.

In particular, if you had $P(F_i | H') = 0.5$, then each $F_i$ you see that's true would tilt things slightly in favor of $H$, and each one that's false would tilt things slightly in favor of $H'$. If you get a lot more $F$'s that are true than false, then this can accumulate to a lot of evidence favoring $H$ over $H'$. Of course, if you get too many true $F$'s, you should strongly suspect tampering with the data or severe errors in your model.

If there were some other set of facts $G_j$ such that $P(G_j | H) = 0.9$ and $P(G_j | H') = 0.5$, then discovering $G_j$'s would rapidly accrue evidence in favor of $H$, but each one you find that turns out false would be a severe blow against $H$.

Whether looking for $F$'s instead of $G$'s is better overall in distinguishing between $H$ and $H'$ would depend very much on how many of each you get to look at.