Notice first that you might guess this formula because if $X$ and $\hat{\beta}$ are $1 \times 1$ then it reduces to the formula for the derivative of $x^2$ from calculus.
Let's derive a multivariable product rule that will help us here. Suppose
$f:\mathbb{R}^n \to \mathbb{R}$ and suppose
$$f(x) = \langle g(x), h(x) \rangle$$ for some functions $g:\mathbb{R}^n \to \mathbb{R}^m$ and $h:\mathbb{R}^n \to \mathbb{R}^m$. Then if $\Delta x \in \mathbb{R}^n$ is small (and $g$ and $h$ are differentiable at $x$), we have
$$\begin{align*}
f(x + \Delta x) &\approx \langle g(x) + g'(x) \Delta x, h(x) + h'(x) \Delta x \rangle \\
&= \langle g(x),h(x) \rangle + \langle g(x), h'(x) \Delta x \rangle
+ \langle g'(x) \Delta x, h(x) \rangle + \langle g'(x) \Delta x, h'(x) \Delta x \rangle \\
&\approx \langle g(x),h(x) \rangle + \langle g(x), h'(x) \Delta x \rangle
+\langle g'(x) \Delta x, h(x) \rangle \\
&= \langle g(x),h(x) \rangle + \langle h'(x)^T g(x), \Delta x \rangle
+ \langle g'(x)^T h(x), \Delta x \rangle \\
&= f(x) + \langle h'(x)^T g(x) + g'(x)^T h(x), \Delta x \rangle.
\end{align*}$$
Comparing this result with
$$f(x + \Delta x) \approx f(x) + \langle \nabla f(x), \Delta x \rangle$$
we discover that
$$\nabla f(x) = h'(x)^T g(x) + g'(x)^T h(x).$$
This is our product rule. (I'm using the convention that the gradient is a column vector, which is not completely standard.)
Now let $g(x) = Ax$ for some matrix $A$. So $g'(x) = A$.
What's the gradient of the function
$$\begin{align*}
f(x) &= \langle g(x),g(x) \rangle \\
&= \langle Ax, Ax \rangle \\
&= x^T A^T A x \quad \text{?}
\end{align*}$$
By our product rule the answer is
$$\begin{align*}
\nabla f(x) &= 2g'(x)^T g(x) \\
&= 2 A^T A x.
\end{align*}$$
This is the result that you wanted to derive.