Notice first that you might guess this formula because if $X$ and $\hat{\beta}$ are $1 \times 1$ then it reduces to the formula for the derivative of $x^2$ from calculus.
Let's derive a multivariable product rule that will help us here. Suppose $f:\mathbb{R}^n \to \mathbb{R}$ and suppose $f(x) = \langle g(x), h(x) \rangle$ for some functions $g:\mathbb{R}^n \to \mathbb{R}^m$ and $h:\mathbb{R}^n \to \mathbb{R}^m$. Then if $\Delta x \in \mathbb{R}^n$ is small (and $g$ and $h$ are differentiable at $x$), we have $\begin{align*} f(x + \Delta x) &\approx \langle g(x) + g'(x) \Delta x, h(x) + h'(x) \Delta x \rangle \\ &= \langle g(x),h(x) \rangle + \langle g(x), h'(x) \Delta x \rangle + \langle g'(x) \Delta x, h(x) \rangle + \langle g'(x) \Delta x, h'(x) \Delta x \rangle \\ &\approx \langle g(x),h(x) \rangle + \langle g(x), h'(x) \Delta x \rangle +\langle g'(x) \Delta x, h(x) \rangle \\ &= \langle g(x),h(x) \rangle + \langle h'(x)^T g(x), \Delta x \rangle + \langle g'(x)^T h(x), \Delta x \rangle \\ &= f(x) + \langle h'(x)^T g(x) + g'(x)^T h(x), \Delta x \rangle. \end{align*}$ Comparing this result with $f(x + \Delta x) \approx f(x) + \langle \nabla f(x), \Delta x \rangle$ we discover that $\nabla f(x) = h'(x)^T g(x) + g'(x)^T h(x).$ This is our product rule. (I'm using the convention that the gradient is a column vector, which is not completely standard.)
Now let $g(x) = Ax$ for some matrix $A$. So $g'(x) = A$. What's the gradient of the function $\begin{align*} f(x) &= \langle g(x),g(x) \rangle \\ &= \langle Ax, Ax \rangle \\ &= x^T A^T A x \quad \text{?} \end{align*}$
By our product rule the answer is $\begin{align*} \nabla f(x) &= 2g'(x)^T g(x) \\ &= 2 A^T A x. \end{align*}$
This is the result that you wanted to derive.