This is incredibly easy to prove if you have the following result:
  If a function $f$ is differentiable at $a$ then there exists a
  continuous function $\varphi$ defined on an interval
  $[-\epsilon,\epsilon]$ such that $\varphi(0)=0$ and
  
  $$ f(a+h) = f(a) + f'(a)h + \varphi(h)h, $$
  
  for all $h \in (-\epsilon,\epsilon)$.
  
  And if such a continuous $\varphi$ exists such that 
  
  $$ f(a+h) = b + \alpha h + \varphi(h)h, $$
  
  for all $h \in (-\epsilon,\epsilon)$, then $f$ is differentiable in
  $a$ with $f'(a) = \alpha$.
The chain rule follows by direct computation: $(g \circ f)(a+h) = g(f(a+h))$, use that $f$ is differentiable to write $f(a+h)$ as $f(a) + f'(a)h + \varphi_f(h)h$, and then call "$f'(a)h + \varphi_f(h)h$" for $k$ and use that $g$ is differentiable. 
There's a little bit of bookkeeping needed to make sure that there do exist appropriate intervals around $0$ for the auxillary continuous functions, but it's not too bad.
The best part about this proof is that it immediately generalizes to functions from $\mathbb R^m$ to $\mathbb R^n$.