In a Monte Carlo simulation, my goal is to compute an estimate of the mean of a distribution via sampling. Traditional, straightforward statistics generates samples (via simulation) and computes the mean and variance of the samples, and uses the variance estimate divided by $n$ as an error estimate on the mean. This all works and produces the desired estimate with error bounds on that estimate.
But I know a little more about my problem. Abstracting the details, my simulation is actually picking samples from multiple independent distributions, each with a different probability of being picked. Think of my sample as having two (or more) urns, each filled with its own distribution of values. Their means and distributions may be the same or completely different... I don't know, and that's why I'm sampling. But I also don't care about the mean of each urn; at the end I only care about some weighted combination of their means.
I can achieve the sampling easily by picking my samples randomly from one urn or the other with the appropriate weight, then computing the mean and variance of the final samples. This all certainly works.
But this is also throwing away a lot of information which may improve my estimates... I have this known stratification in my problem, and the random sampling is throwing that potential information away. What if the urns have differing variances, and after pulling 100 samples from each, I can see one urn's variance is much higher than the others? Then I should start taking more samples from the high variance urn (and compensating by reducing the weight of each one of those samples.) But that decision on the variance is in itself sampled, so I shouldn't trust it! In the worst case, I could take two samples and they happen to be identical, my variance estimate for that urn is 0.. but that doesn't mean I should trust that estimate and ignore the urn from now on!
So my question is given multiple distributions $D_i$ with unknown mean and variance each, and a known set of weights $\alpha_i$, I want to sample values from these $D$ to efficiently form an unbiased estimate of $\sum \alpha_i \bar{D}_i$.
In my problem, I'll typically sample 10,000 to 50,000 times, and I may have between 2 and 10,000 sub-distributions. This may affect strategy, since I sometimes have so many distributions I can't even afford one sample for every one of them.. some have to be skipped!
My current plan will work but it feels ad-hoc. I'll first take say 1000 samples from the distributions, where the number of samples for distribution $D_i$ is proportional to $\alpha_i$. I'll then compute the estimated variance for each $D_i$ and continue sampling, this time with samples apprortioned proportionally to the measured variance of each $D$. I think this will work pretty well, but it's not necessarily the most efficient. And what happens if I have an $\alpha_i$ that's so small that I don't give many (or any) initial samples to the distribution? Should I start combining disparate low-$\alpha$ distributions into one so that they have a better chance of being sampled?
My entire goal is to minimize the number of samples needed, but the strategy to apportion them is far from obvious. Any thoughts or suggestions are welcome!