The function $f_0$ is the function that is to be minimized (the objective function). All of the other $f_i$ are constraint functions. That's what makes $f_0$ special.
The point $x$ will be a minimizer if moving in the direction of any other point $y$ in the convex feasible region causes $f_0$ to increase or stay the same. In other words, the directional derivative of $f_0$ at $x$ in the direction of $y$ must be nonnegative. Mathematically, this is $\nabla f_0(x)^T \frac{y - x}{||y-x||} \geq 0$, which can be more simply expressed as $\nabla f_0(x)^T (y - x) \geq 0$ (since ||y-x|| > 0).
Also, given the usual geometric interpretation of the dot product, the requirement $\nabla f_0(x)^T (y - x) \geq 0$ is equivalent to requiring that the cosine of the angle between $\nabla f_0(x)$ and $y-x$ be nonnegative. So the angle between $\nabla f_0(x)$ and $y-x$ must be between $-\pi/2$ and $\pi/2$, which means that the angle between $-\nabla f_0(x)$ and $y-x$ must be between $\pi/2$ and $3\pi/2$. Hence the requirement $\nabla f_0(x)^T (y - x) \geq 0$ is equivalent to the existence of a supporting hyperplane for $S$ at $x$ such that $-\nabla f_0(x)$ is perpendicular to the hyperplane and points in the opposite direction from $S$.
And hopefully that last paragraph explains the reason for the "$-$" sign in the picture, too.
Added, in response to the new (enumerated) questions:
- This is what the second paragraph in my original answer is addressing.
- Yes. Just multiply the original equation by $-1$.
- I think the third paragraph in my original answer addresses this.
- No. First, $\nabla f_{0}(x)^T$ is not the tangent vector to the hyperplane; it's the transpose of the vector $\nabla f_{0}(x)$. (For example, $\begin{bmatrix} 1 \\ 2 \end{bmatrix}^T = \begin{bmatrix} 1 & 2 \end{bmatrix}$.) Second, if you take two orthogonal vectors $z$ and $w$ and calculate the dot product $z^T w$ you get $z^T w = 0$, not $-1$. (Maybe you're thinking of the fact that in 2D the slopes of perpendicular lines multiply to give $-1$? The operation $z^T w$ is not multiplying slopes; it's calculating the dot product.)
Added, in response to the comments below:
- The vector $y$ is not an arbitrary vector; it's a point in the set $S$. So $y-x$ gives the vector from the point $x$ to the point $y$.
- We're minimizing $f_0$ because 1) that's the only way the picture makes sense, and 2) it says so on slide 4-6 in the link you give.
- "Minimizing" here means that we're looking for the smallest value of the function $f_0(y)$ over all points $y$ in the set $S$. For example, if we were trying to find the smallest value of $f_0(y_1, y_2) = y_1^2 + y_2^2$ over the region $S$, where $S$ is the set $\{y_1,y_2\}$ satisfying $y_1^2 + y_2^2 \leq 4$, the minimum would occur at $(y_1,y_2) = (0,0)$. We're not trying to minimize the distance between $x$ and $y$. We're trying to minimize the function $f_0(y)$.
- The text of Bazarra, et al, is talking about something different. (And maybe this, actually, is the primary source of your confusion?) They're discussing the problem of minimizing the distance between a point and a convex set. The slide from Boyd and Vandenberghe is discussing the problem of minimizing the value of a function $f_0$ over a convex set. So, if your primary goal here is to understand the argument in the Bazarra text better, don't look at the picture you posted. It's addressing a different problem. And if your primary goal here is to understand the picture you posted, don't refer to p. 50 in the Bazarra text. They're talking about something different. The picture on p. 129 of the Bazarra text (a different chapter!) is the one that corresponds to the situation in the picture you posted.