0
$\begingroup$

I have the following birthday data (Just the head):

  #pop Freq
    1 2976
    2 1238
    3  738
    4  467
    5  352
    6  243

This means that there are 2976 people who are the only one for this value in the data. (For example, there might be only 1 person with 1/1/1965 and 1 person with 1/2/1957 and there are 2976 people in total who are like this). There are 1238 people who share their birthday with any other person, 738 people who share with 2 other people, etc.

I can get the % of people of the population who have each of these characteristics (Freq/n). I was just summing up the (Freq/n) for the population to get the value that given a given birthday what the probability is for the population. But I am thinking this isn't correct.

How do I get the overall probability that given a birthday that I can determine who the person is in the overall population?

  • 0
    Not sure I am clear on the question. If I understand your table (not certain) then there are exactly $2976+1238+738+\cdots + 243 = 6014$ dates which come up as birthdays for at least one person in your sample. Is that correct? If so then, again from the table, there are exactly $2976$ of these which determine a person uniquely so the answer you want is $\frac {2976}{6014}$. Or have I misunderstood?2017-02-26
  • 0
    2976/n is the number of people who if you have their birthdate in another dataset you can uniquely identify them. 1238/n is the number of people who if you have their birthdate you have a 50 percent chance of determining their identity. 738/n is the number of people who if you have their birthdate you have a .33 chance of identifying them accuractly.2017-02-26
  • 0
    Ok...but I still don't understand what your question is. Maybe what you want is the weighted average of those probabilities? Thus $\frac {2976\times 1 +1238\times \frac 12 + 738\times \frac 13+\cdots}{6014 }$?2017-02-26

1 Answers 1

0

Lulu was correct, what I was looking for was weighted average. In my code, I forgot to sum them up by percentage and then divide over n.