we collect a lot of data on a daily basis via an API, and part of this data includes fields that represent sample means. Specifically, we get provided with a sample mean and the sample size, lets call them avg_pos
and impressions
. If you can't tell yet, the domain is advertising, avg_pos
tells us the average position our ad showed in over the period of a day, and impressions
tells us how many impressions we got (the size of the sample used for avg_pos
).
Now, suppose I want to get the average position for a 30 day period — we only have available to us 30 avg_pos
values and 30 impressions
values. The sum of impressions
is the amount of impressions over the entire period.
Question: intuitively I think one way to estimate the 30 day average position would be to do something like:
$ total\_impressions = \sum impressions \\ \sum avg\_pos * \frac{impressions}{total\_impressions} $
Can someone explain the implications of this approach, and when it is likely to be accurate and when it is not? I assume its accuracy will depend on how normally distributed all the position values are that make up the avg_pos
sample means.