I want to find an approximation of the gradient $\nabla V(J)$ of the following function $V(J) = P(X\in T(J))$. Where $X$ is a multidimensional stochastic vector with a smooth continuous probability density and $T$ is a (convex) set that depends on some parameters $J=(J_1,...,J_n)$. $V(J)$ is considered to be many times continuously differentiable.
I have $N$ independent samples $x_n$ drawn from $X$. And $V(J)$ is approximated with the relative frequency $\hat{V}(J)=\frac{ | \{n \mid x_n\in T(J) \}| }{N}.$ The approximation is piecewise constant.
Now I want to find an operator $\hat{D}$ that approximates the gradient.
For example: If J is one-dimensional. The gradient could be approximated by central difference $\hat{D}\hat{V}(J) = \frac{\hat{V}(J+h)-\hat{V}(J-h)}{2h}$. The problem with central difference is that since $\hat{V}$ is piecewise constant if $h$ is chosen too small, the difference is identically $0$ for most $J$.
The number of evaluations of $\hat{V}$ should be kept small as it is naturally quite expensive to compute. The approximation should be consistent in the sense that it becomes more accurate as $N$ increases.
Is there a standard way to solve this problem? Is there a limit on how accurate the approximation could get? Does it exist a theory for this kind of problems and where can I read more about it?