Consider a strictly convex function $f: [0; 1]^n \rightarrow \mathbb{R}$. The question is why people (especially experts in machine learning) use gradient descent in order to find a global minimum of this function instead of using n-dimensional ternary search or golden section search?
Here is a list of disadvantages:
- It is required for gradient descent to experimentally choose a value of step size $\alpha$.
- It is also required to calculate partial derivatives of $f$. Furthermore, it is not always possible, for example, the «black box» function.
- Ternary search guaranteed to converge in $\Theta(\lg^n \epsilon^{-1})$ iterations, where $\epsilon$ is our required absolute precision. However, for gradient descent we should choose number of iterations experimentally.
Maybe I misunderstand a bit?
Thanks in advance.