I want to compute the gradient of the following function with respect to $\beta$
$$L(\beta) = \sum_{i=1}^n (y_i - \phi(x_i)^T \cdot \beta)^2$$
Where $\beta$, $y_i$ and $x_i$ are vectors. The $\phi(x_i)$ simply adds additional coefficients, with the result that $\beta$ and $\phi(x_i)$ are both $\in \mathbb{R}^d$
Here is my approach so far:
\begin{align*} \frac{\partial}{\partial \beta} L(\beta) &= \sum_{i=1}^n ( \frac{\partial}{\partial \beta} y_i - \frac{\partial}{\partial \beta}( \phi(x_i)^T \cdot \beta))^2\\ &= \sum_{i=1}^n ( 0 - \frac{\partial}{\partial \beta}( \phi(x_i)^T \cdot \beta))^2\\ &= - \sum_{i=1}^n ( \partial \phi(x_i)^T \cdot \beta + \phi(x_i)^T \cdot \partial \beta))^2\\ &= - \sum_{i=1}^n ( 0 \cdot \beta + \phi(x_i)^T \cdot \textbf{I}))^2\\ &= - \sum_{i=1}^n ( \phi(x_i)^T \cdot \textbf{I}))^2\\ \end{align*}
But what to do with the power of two? Have I made any mistakes? Because $\phi(x_i)^T \cdot \textbf I$ seems to be $\in \mathbb{R}^{1 \times d}$
$$= - 2 \sum_{i=1}^n \phi(x_i)^T\\$$
