I'm interested in maximizing a function $f(\mathbf \theta)$, where $\theta \in \mathbb R^p$.
The problem is that I don't know the analytic form of the function, or of its derivatives. The only thing that I can do is to evaluate the function point-wise, by plugging in a value $\theta_*$ and get a NOISY estimate $\hat{f}(\theta_*)$ at that point. If I want I can decrease the variability of this estimate, but I have to pay increasing computational costs.
Here is what I have tried so far:
- Stochastic steepest descent with finite differences: it can work but it requires a lot of tuning (ex. gain sequence, scaling factor) and it is often very unstable. 
- Simulated annealing: it works and it is reliable, but it requires lots of function evaluations so I found it quite slow. 
So I'm asking for suggestions/idea about possible alternative optimization method that can work under this conditions. I'm keeping the problem as general as possible in order to encourage suggestions from research areas different mine.
