I'm doing a training exercise looking at 'test' policy claims database data where some of the policy data contains dummy dates (like a date that says it was taken out on 31 Dec 9999)
I'm planing on running a program over the data to find the most common dates used, and I'm expecting that it will bring back the dummy data, but I'd like to know in the case that the dummy data isn't something obviously wrong (say it was 1 April 1970) what the probability of the date and year being the same on $n$ records from a set of $m$ records, over a fixed range of years (say 100 years).
I'd try and look into this myself but I have no clue about probability. I couldn't find it on the wiki page nor in a check of the various birthday problem questions on this stack exchange.