I have difficulty understanding when minimizing expected squared prediction error:
$$\operatorname{EPE}(\beta)=\int (y-x^T \beta)^2 \Pr(dx, dy),$$
how to reach the solution that
$$\operatorname{E}[yx]-\operatorname{E}[xx^{T}\beta]=0.$$
From A Solution Manual and Notes for the Text: The Elements of Statistical Learning (page 2), I noticed the formula (2):
$$\frac{\partial \operatorname{EPE}}{\partial \beta}=\int 2(y-x^T\beta)(-1) x \Pr(dx, dy).$$
I understand chain rule is used here to solve the derivative, but why the last part is $x$ instead of $x^T$? As $x$ is a column vector and the Jacobian of a constant w.r.t. $x$ should be a row vector.
$$\frac{\partial x^T\beta}{\partial \beta}=x^T.$$
What is missing in my analysis?
According to the Jacobian matrix, shouldn't the $\frac{\partial x^T\beta}{\partial \beta}$ be a row vector instead of a column vector?
$$\frac{\partial EPE}{\partial \mathbf{\beta}}= \left (\frac{\partial EPE}{\partial \beta_{1}}, \frac{\partial EPE}{\partial \beta_{2}}, \dots, \frac{\partial EPE}{\partial \beta_{N}} \right )$$