2
$\begingroup$

we collect a lot of data on a daily basis via an API, and part of this data includes fields that represent sample means. Specifically, we get provided with a sample mean and the sample size, lets call them avg_pos and impressions. If you can't tell yet, the domain is advertising, avg_pos tells us the average position our ad showed in over the period of a day, and impressions tells us how many impressions we got (the size of the sample used for avg_pos).

Now, suppose I want to get the average position for a 30 day period — we only have available to us 30 avg_pos values and 30 impressions values. The sum of impressions is the amount of impressions over the entire period.

Question: intuitively I think one way to estimate the 30 day average position would be to do something like:

$$ total\_impressions = \sum impressions \\ \sum avg\_pos * \frac{impressions}{total\_impressions} $$

Can someone explain the implications of this approach, and when it is likely to be accurate and when it is not? I assume its accuracy will depend on how normally distributed all the position values are that make up the avg_pos sample means.

1 Answers 1

0

Your formula is exactly right and completely accurate (up to rounding error). If I get some number of observations each day, I will get exactly the same answer whether I wait to the end of the month and then average them all, or average them each day and then calculate a weighted average of my daily averages at the end of the month, which is what you're doing. This is because of linearity of expectations.