I don't quite get how logistic loss works for binary classification:
$$\log(1+\exp(−y\cdot \mathbf{w}^T\mathbf{x})), \quad y\in\{−1,+1\}$$
Minimizing this function for $\mathbf{w}$ seems to me to simply make $\mathbf{w}^T\mathbf{x}$ as large as possible, meaning setting $w_i$ to infinity (negative or positive - depending on $x_i$).
What do I misunderstand?