I have a large set of data and a copy of that data. The whole data set is $n$ bytes. I want to be 99.999% certain that the sets are identical. Assuming that copying errors occur randomly, how many bytes do I need to randomly select and compare against the reference to be 99.999% certain the two sets are completely identical?
This problem, it appears to me, relates to that one here: Determining Sample Size for a Desired Margin of Error -- however I'm confused by the margin-of-error and confidence interval both occuring in the formula, but the sample size not being dependent at all on the input size (in that example, total number of students).