I have trouble in understanding a simple example following No Free Lunch theorem in James Spall's Introduction to stochastic search and optimization:
My understanding is that a cost function is a mapping from a finite set $\{\theta_i\}$ to a finite set $\{L_j\}$. A search algorithm, when applied to a cost function, is to search for a $\theta_i$ such that the value of the cost function at it is the least.
If there are $N_\theta$ possible values in $\{\theta_i\}$ and $N_L$ possible values in $\{L_j\}$, then by direct enumeration there are $N_L^{N_\theta}$ possible mappings from $\{\theta_i\}$ to $\{L_j\}$.
The NFL theorems indicate that: When averaging over all $N_L^{N_\theta}$ possible mappings from from $\{\theta_i\}$ to $\{L_j\}$, all algorithms work the same (i.e., none can work better than a blind random search).
Example 1.7—NFL implications in a small problem. Suppose that $\Theta = \{\theta_i, \theta_2, \theta_3\}$ and that there are two possible outcomes for the noise-free loss measurements, $\{L_1, L_2\}$. Hence, $N_L^{N_\theta} =2^3 =8$. Table 1.1 summarizes the eight possible mappings.
Note that all rows in Table 1.1 have the same number of $L_1$ and $L_2$ values. Hence, the mean loss value across all eight possible problems is the same regardless of which $\theta_i$ is chosen. Because the algorithm chooses the $\theta_i$, all algorithms produce the same mean loss value when averaging across all possible problems. As a specific instance of the implication of the NFL theorems, suppose that $L_1 < L_2$. Then, an algorithm that puts priority on picking $\theta$ will work well on problems 1, 2, 3, and 7, but poorly on problems 4, 5, 6, and 8. The average performance is the same as an algorithm that puts priority on picking $\theta_2$ or $\theta_3$.
My questions are:
Does the example assume that an algorithm will always output the same element in $\Theta$, regardless of which of the eight loss function it is applied to? (This is the only way that I can make sense out of the example. Otherwise, how shall I understand it?)
If yes, isn't it contrary to what an algorithm is in reality? For example, for minimizing different cost functions, the gradient descent algorithm outputs different solutions instead of a fixed solution.
If no, I can construct an algorithm that outputs $\theta_1$ for cost function 1, 2, 3, 7 and 8, $\theta_2$ for cost function 4 and 5, and $\theta_3$ for cost function 6. This algorithm outputs the best solution for each cost function, so when considering its average performance over all cost functions, it is obviously better than many other algorithms. This violates NFL theorem. What is going wrong?
What exactly is a search algorithm? Is it a mapping from some set to another set? .
Thanks and regards!