3
$\begingroup$

I tried to make a counterexample that splitting data into sets can lead to the wrong conclusion and it seems to me to generate a fallacy just by splitting data into subsets.

What I did was making up sequences that can mean observation i.e. the sequence 0,1 means "not A and then B (A=0, B=1)" and the sequence 1,1 means "A occured and then B occured" and likewise common truth values .

Then I arranged 40 + 40 of these values, looked at the first forty "base-rates" and could conclude that there wasn't a base-rate that strengthened that A would imply B, there were too many other occurences of "not A and then B" in the first subset. In the second subset the conclusion was the same: Looking at the base-rates of sequences implied that A doesn't cause B according to the numbers of the sequences.

And when the whole population of sequences, instead of two samples, are examined then there is support for the conclusion "A causes B" according to base-rates.

Which fallacy is this if any? If you like to know the actual example I have it written down somewhere and I apologize if this questions is obviously mistaken about some assumption (it wasn't Granger-test about cause /effect, I only looked at base-rates to make an example that even though all observations are considered, the opposite conclusion is made compared to looking at all the data in a population from a fallacy that a conclusion about subset A + a consludion about subset B is correct even though we examined all observation.)

Do you agree that the result is somewhat strange and a fallacy? An interpretation could be that A means "it's cloudy" and B means "it rains" and I'm trying to conclude whether cloudiness is an accurate predictor for rain and I found that putting together conclusions from samples doesn't make the same conclusion as looking at the whole population.

  • 0
    Thank you mixedmath. Like Simpson's paradox states what I'm asking about is exactly like the first example you link to, about Democrats and Republicans. Looking at the subsets on each we find the same conclusion comparing bas-rates: "More democrats than republicans favors the decision" is true about all the 2 subsets but about the whole population, the opoosite conclusion is true (in fact when looking at the whole population the it was more republicans than democrats who favored the decision.) Thank you for recognizing the topic.2011-08-28

1 Answers 1

2

I recommend looking into Simpson's Paradox. Unlike many other mathematical fallacies or common flaws in logic, I suspect that I'm still vulnerable to Simpson's Paradox. It just feels so... natural. So I have to think of it specifically when I look at data.