This is actually a computational biology problem, but you don't really need biology knowledge to understand it:
You are given expression values of a given protein P for 10 individuals with a normal condition and 10 individuals with a disease condition.
Normal: 5.2 6.4 7.8 3.1 2.9 1.0 2.3 0.6 4.3 3.2
Disease: 7.8 9.1 10.4 11.5 4.3 6.5 7.6 6.7 10.1 2.1
Your team built a disease predictor such that when the expression of the protein is higher than 5.5, the predictor claims that the individual has the disease. You now want to evaluate out of the disease predictions made by the predictor if the actual number of disease individuals predicted is obtained by chance. Compute the p-value according to the correct statistical test seen in class and assume that a p-value < 0.05 significantly differs from what should be obtained by chance. Show your calculations.
I was given the standard answer that
$ p-value = 1 - \sum_{i=0}^7 \frac {\binom {10} i \binom {20-10}{i}} {\binom {20} {10}} $
which is using the hypergeometric distribution.
I know what is a $p-value$, but what is hypothesis we are testing with this $p-value$?
For me it seems that the formula is calculating if we pick 10 people from the 20 samples, what's the probability that we get more than 7 normal people. But this probability doesn't seem to have any connection with predictor. I'm confused...