It's stated that the gradient of:
$$\frac{1}{2}x^TAx - b^Tx +c$$
is
$$\frac{1}{2}A^Tx + \frac{1}{2}Ax - b$$
How do you grind out this equation? Or specifically, how do you get from $x^TAx$ to $A^Tx + Ax$?
 
            It's stated that the gradient of:
$$\frac{1}{2}x^TAx - b^Tx +c$$
is
$$\frac{1}{2}A^Tx + \frac{1}{2}Ax - b$$
How do you grind out this equation? Or specifically, how do you get from $x^TAx$ to $A^Tx + Ax$?
The only thing you need to remember/know is that $$\dfrac{\partial (x^Ty)}{\partial x} = y$$ and the chain rule, which goes as $$\dfrac{d(f(x,y))}{d x} = \dfrac{\partial (f(x,y))}{\partial x} + \dfrac{d( y^T(x))}{d x} \dfrac{\partial (f(x,y))}{\partial y}$$ Hence, $$\dfrac{d(b^Tx)}{d x} = \dfrac{d (x^Tb)}{d x} = b$$
$$\dfrac{d (x^TAx)}{d x} = \dfrac{\partial (x^Ty)}{\partial x} + \dfrac{d (y(x)^T)}{d x} \dfrac{\partial (x^Ty)}{\partial y}$$ where $y = Ax$. And then, that is,
$$\dfrac{d (x^TAx)}{d x} = \dfrac{\partial (x^Ty)}{\partial x} + \dfrac{d( y(x)^T)}{d x} \dfrac{\partial (x^Ty)}{\partial y} = y + \dfrac{d (x^TA^T)}{d x} x = y + A^Tx = (A+A^T)x$$
There is another way to calculate the most complex one, $\frac{\partial}{\partial \theta_k} \mathbf{x}^T A \mathbf{x}$. It only requires nothing but partial derivative of a variable instead of a vector.
This answer is for those who are not very familiar with partial derivative and chain rule for vectors, for example, me. Therefore, although it seems long, it is actually because I write down all the details. :)
Firstly, expanding the quadratic form yields: $$ \begin{align} f = \frac{\partial}{\partial \theta_k} \mathbf{x}^T A \mathbf{x} &= \frac{\partial}{\partial \theta_k} \sum_{i=1}^N \sum_{j=1}^N a_{ij}\frac{\partial}{\partial \theta_k}(\mathbf{x}_i \mathbf{x}_j) \end{align} $$ Since $$ \frac{\partial}{\partial \theta_k}(\mathbf{x}_i \mathbf{x}_j) = \begin{cases} \mathbf{x}_j, && \text{if } k = i \\ \mathbf{x}_i, && \text{if } k = j \\ 0, && \text{otherwise} \end{cases} $$ The equation is nothing but $$ f = \sum_{j=1}^N a_{kj} \mathbf{x}_j + \sum_{i=1}^N a_{ik} \mathbf{x}_i $$ Almost done! Now we only need some simplification. Recall the very simple rule that $$ \sum_{i=1}^N x_i y_i = \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}^T \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix} = \mathbf{x}^T \mathbf{y} $$ Thus $$ \begin{align} f &= \text{(k-th row of A) } \mathbf{x} + \text{(k-th column of A)}^T \mathbf{x} \end{align} $$ Now it is time to compute the gradient from partial derivative! $$ \begin{align} \nabla_\mathbf{x} \mathbf{x}^T A \mathbf{x} & = \begin{bmatrix} \frac{\partial \mathbf{x}^T A \mathbf{x}}{\partial x_1} \\ \vdots \\ \frac{\partial \mathbf{x}^T A \mathbf{x}}{\partial x_k} \\ \vdots \\ \frac{\partial \mathbf{x}^T A \mathbf{x}}{\partial x_N} \\ \end{bmatrix} = \begin{bmatrix} \vdots \\ \text{(k-th row of A) } \mathbf{x} + \text{(k-th column of A)}^T \mathbf{x} \\ \vdots \end{bmatrix} \\ &= \left( \begin{bmatrix} \vdots \\ \text{(k-th row of A) } \\ \vdots \end{bmatrix} + \begin{bmatrix} \vdots \\ \text{(k-th column of A) }^T \\ \vdots \end{bmatrix} \right) \mathbf{x} \\ &= (A + A^T)\mathbf{x} \end{align} $$ So we are done!! The answer is: $$ \nabla_\mathbf{x} \mathbf{x}^T A \mathbf{x} = (A + A^T)\mathbf{x} $$