2
$\begingroup$

The blood type distribution in the US is as follows (according to this link):

O 45%
A 40%
B 11%
AB 4%

However the blood type is a result of the 2 alleles (in lower case) that a person gets from her parents.

aa -> A
oa -> A
ao -> A
oo -> O
ob -> B
bo -> B
bb -> B
ba -> AB
ab -> AB

How can I extract the distribution of the alleles {a,b,o} from the the known distribution of the blood types {O,A,B,AB} ?

2 Answers 2

2

Think of many people in population meeting randomly, having your sought-after distribution of alleles. Let me denote by $a$ the fraction of people with allele $a$, and similarly for $b$ and $o$.

Suppose two people from the population meet. The rules for allele to blood type conversion imply the following (row is allele of one person, column is allele of the other person): $$\begin{array}{cccc} &a&b&c\\ a&\text{A}&\text{AB}&\text{A}\\ b&\text{AB}&\text{B}&\text{B}\\ o&\text{A}&\text{B}&\text{O}\\ \end{array}$$ If meeting probability is independent of allele and blood type, out of $100$ meetings, $40$ should be in cells with $\text{A}$, $11$ in cells with $\text{B}$, $45$ in cells with $\text{O}$ and $4$ in cells with $\text{AB}$. In terms of frequencies, these numbers are $40/100$, $11/100$, $45/100$ and $4/100$.

Your $a$, $b$ and $o$ fractions then imply following equations: $$\begin{aligned} \tfrac{40}{100}&=A=a^2+2oa\\ \tfrac{11}{100}&=B=b^2+2ob\\ \tfrac{40}{100}&=O=o^2\\ \tfrac{40}{100}&=AB=2ab\\ \end{aligned}$$

These are four equations with three variables so there is no guarantee it will have a solution. Since we are dealing with real-world data, your best hope is to solve for $a$, $b$, and $o$ using three of the equations and then check if the last one is approximately satisfied (you can try all possible combinations which one to leave out, but the results are insensitive to which one is left out). If I leave the last equation out and solve, I am getting $0.25$, $0.08$ and $0.67$ (after rounding).

-2

Let's start with the easiest one. Since the only way to be type O is to have genotype OO, and the probability of having OO is the SQUARE of the frequency of O. we have that p(O) = SQRT(p(OO)) = SQRT(0.45) = 0.67.

Then we have that p(type A) = p(AA) + p(AO) = 0.4. As before, p(AA) = p(A)^2; and p(AO) = p(A(father),O(mother)) + P(O(father),A(mother)) = 2*p(A)*p(O) = 1.34*p(A). So this gives a quadratic equation p(A)^2 + 1.34*p(A) - 0.4 = 0, which you can solve to get two answers, one of which is negative and must be ignored, and the other of which is p(A) = 0.25.

Finally, $p(B) = 1 - p(O) - p(A) = 0.08$. I've kept everything rounded to 2 decimals because that's all we got in the original data, but if you want to assume that those numbers are infinitely precise and calculate more digits, be my guest.