I would like to more intuitively understand where the power of Monte Carlo integration comes from for large-dimensional domains of integration.
Other questions on this site have referenced the proof that the scaling of the error in a MC integration goes as $N^{-1/2}$, where $N$ is the number of function evaluations, and does not depend on the dimension $d$ of the domain of integration. On the other hand, the scaling of the error in a uniform sampling integration goes as some power of $N^{-1/d}$. Consequently, for a sufficiently large $d$, one may achieve a desired level of accuracy using a fewer number of function evaluations with a Monte Carlo method than with a uniform numerical quadrature method. The essence of the proof is an invocation of the central limit theorem. I understand the proof on a formal level.
However, I have no intuition for why this proof works. It still seems like "cheating" to me that by either randomly or quasi-randomly selecting the locations to evaluate the function being integrated, you can achieve a more accurate integration than you would by choosing to evaluate the points in a uniform set of spacings (provided that the dimensionality of the integral is large enough).
I have tried to construct an extremely non-rigorous analogy to conducting a statistical survey. One could either poll people at regular distance spacings in a neighborhood, or instead poll the same number of people in completely randomized locations. If there were no correlation between a person's response and the location where that person lives, then I would be able to conclude that my sampling error would be the same using either method. If there were such a correlation after all, then I might bias my results. Is this sort of reasoning at all on the right track to building intuition for the proof?
To phrase my question another way, it seems that when performing a high-dimensional integral you gain accuracy by including a variety of length scales in the points where you evaluate the functions. Why is that? And, is the role of the randomness essentially to ensure that you use a wide range of length scales?