Let $\mathbf{x}\in\Bbb{R}^n$ be a multivariate normal vector, i.e., $\mathbf{x}\sim\mathcal{N}(\bar{\mathbf{x}},\Sigma)$, where the mean vector $\bar{\mathbf{x}}$ and the covariance matrix $\Sigma\in\Bbb{S}_{++}^n$ are given. Note that $\Bbb{S}_{++}^n$ denotes the set of symmetric positive definite $n\times n$ real matrices.
Also, let $h\colon\Bbb{R}^n\to\Bbb{R}$ be a real-valued function of $\mathbf{x}\sim\mathcal{N}(\bar{\mathbf{x}},\Sigma)$. Since $\mathbf{x}$ is a random vector, we can find the mean value of $h$ as $$ \bar{h} = \int_{\Bbb{R}^n}\! h(\mathbf{x})f(\mathbf{x}) \,\mathrm{d}\mathbf{x}, $$ where $f$ denotes the probability density function of $\mathbf{x}$.
In the simple case where $h$ is the identity function, i.e., $h(\mathbf{x})=\mathbf{x}$, the mean value of $h$ is just the mean value of $\mathbf{x}$; that is, $$ \bar{h} = \int_{\Bbb{R}^n}\! \mathbf{x}f(\mathbf{x}) \,\mathrm{d}\mathbf{x} = \bar{\mathbf{x}}. $$
Now, let's assume that $h$ is an arbitrary function and we can generate the following samples from $\mathcal{N}(\bar{\mathbf{x}},\Sigma)$: $\mathbf{x}_i\in\Bbb{R}^n$, $i=1,\ldots,N$.
I have the following questions:
A. Does it hold true that, as $N$ tends to infinity, the mean value of $h$ can be estimated by the following quantity? If so, is this a consequence of the central limit theorem? $$ \tilde{h} = \frac{1}{N}\sum_{i=1}^{N}h(\mathbf{x}_i) $$
B. How many samples, $N$, should we draw from the distribution so that we have a good (with some given error $\epsilon$?) estimation of $\bar{h}$? How is this related with the dimensionality of the input space?
For instance, in the case of $h(\mathbf{x})=\mathbf{x}$, how many samples do we need to have a good estimation of the mean $\bar{\mathbf{x}}$?
In general, in the case of an arbitrary function $h$, can we have any error bounds for choosing the sampling size $N$? Is there a method for finding such bounds based on the explicit form of $h$?
EDIT Based on the excellent answer of @Batman below, I tried the following (work in progress):
First attempt (Failed)
The McDiarmid’s inequality (aka the bounded-difference inequality). For completeness sake, I copy the following theorem from this monograph by Raginsky and Sason (Sect. 2.2.3, pp. 18-19):
Let $\mathcal{X}$ a set, and let $h\colon\mathcal{X}\to\Bbb{R}$ be a function that satisfies the bounded difference assumption:
$$ \sup_{x_1,\ldots,x_n,x_i^\prime} \lvert h(x_1,\ldots,x_{i-1},x_i,x_{i+1},\ldots,x_n) -h(x_1,\ldots,x_{i-1},x_i',x_{i+1},\ldots,x_n) \rvert\leq d_i $$ for every $1\leq i\leq n$, where $d_i$ are arbitrary non-negative real constants. This is equivalent to saying that, for every given $i$, the variation of the function $h$ with respect to its $i$-th coordinate is upper bounded by $d_i$.
Theorem (McDiarmid’s inequality). Let $\{X_k\}_{k=1}^{n}$ be independent (not necessarily identically distributed) random ariables taking values in a measurable space $\mathcal{X}$. Consider a random variable $U = h(x_1,\ldots,x_n)$, where $h\colon\mathcal{X}\to\Bbb{R}$ is a measurable function satisfying the bounded difference assumption. Then, for every $r\geq0$, $$ P\left(\lvert U-\Bbb{E}U\rvert\geq r\right) \leq 2\exp \left(-\frac{2r^2}{\sum_{k=1}^{n}d_k^2}\right) $$
The function $h$ I am interested in, is the so-called "hinge loss", i.e.$h(\mathbf{x})=\max(0, 1-y(\mathbf{w}^\top\mathbf{x}+b))$, where $\mathbf{w}$, $b$, and $y$ are given parameters.
It seems that the McDiarmid’s inequality is not appropriate, since it does not satisfy the bounded difference assumption.
So, now I'm looking for another such inequality appropriate for $h(\mathbf{x})=\max(0, 1-y(\mathbf{w}^\top\mathbf{x}+b))$.
However, besides this, what I still don't understand is how the sampling size $N$ (for estimating $\tilde{h} = \frac{1}{N}\sum_{i=1}^{N}h(\mathbf{x}_i)$) can be related to the "error" $r$ and the dimensionality $n$. Can you help on this particular issue?
Second attempt (Needs review)
Lipschitz functions of Gaussian variables
Let's first recall that a function $f\colon\Bbb{R}^n\to\Bbb{R}$ is $\mathcal{L}$-Lipschitz with respect to the Euclidean norm if $$ \lvert f(\mathbf{x})-f(\mathbf{y})\rvert\leq\mathcal{L}\lVert\mathbf{x}-\mathbf{y}\rVert. $$
Theorem: Let $\mathbf{x}=(x_1,\ldots,x_n)$ be a random vector of $n$ i.i.d. standard Gaussian variables, and let $f\colon\Bbb{R}^n\to\Bbb{R}$ be $\mathcal{L}$-Lipschitz with respect to the Euclidean norm $\lVert\cdot\rVert$. Then the variable $h(\mathbf{x})-\Bbb{E}[h(\mathbf{x})]$ is sub-Gaussian with parameter at most $\mathcal{L}$, and hence $$ P\left(\lvert h(\mathbf{x})-\Bbb{E}[h(\mathbf{x})] \rvert \geq r\right) \leq 2\exp\left(-\frac{1}{2}\left(\frac{t}{\mathcal{L}}\right)^2\right). $$
We are interested in the function $h(\mathbf{x})=\max(0, 1-y(\mathbf{w}^\top\mathbf{x}+b))$, where $y\in\{\pm1\}$, and $\mathbf{w}$, $b$ are given parameters.
We can easily show that $h$ is $\mathcal{L}$-Lipschitz with respect to the Euclidean norm, i.e., $$ \lvert f(\mathbf{x})-f(\mathbf{y})\rvert\leq\mathcal{L}\lVert\mathbf{x}-\mathbf{y}\rVert, $$ where $\mathcal{L}=\lVert\mathbf{w}\rVert$. This means that $$ P\left(\lvert h(\mathbf{x})-\Bbb{E}[h(\mathbf{x})] \rvert \geq r\right) \leq 2\exp\left(-\frac{1}{2}\left(\frac{r}{\lVert\mathbf{w}\rVert}\right)^2\right). $$