Here is another approach in 4 steps (just one of them is hard):
1) Verify that $\mathcal{L}_X(Y+Z)=\mathcal{L}_X(Y)+\mathcal{L}_X(Z)$ for any fields $Y,Z$;
2) Verify that $\mathcal{L}_X(fY)=f\mathcal{L}_X(Y)+X(f)Y$ for any function $f$;
3) Show that $\mathcal{L}_X\left(\frac{\partial}{\partial x_i}\right)=\left[X,\frac{\partial}{\partial x_i}\right]$;
4) Conclude $\mathcal{L}_X(Y)=[X,Y]$.
1) Obvious by $\mathbb{R}$-linearity of $d\phi_{-t}$ and $\frac{d}{dt}$. $_\blacksquare$
2) Just notice that $(d\phi_{-t})(f\,Y)=f(d\phi_{-t})(Y)$ and use Leibnitz rule. $_\blacksquare$
3) This is the delicate part. Using coordinates, write $X=\sum_ia_i\frac{\partial}{\partial x_i}$ for functions $a_i$ and $\phi(t,x)=(\phi_1(t,x),...,\phi_n(t,x))$ where $x=(x_1,...,x_n)$. Because $\phi(0,x)=x$ we have $\frac{\partial \phi_k}{\partial x_j}(0,x)=\delta_{jk}$ and $\frac{\partial^2 \phi_k}{\partial x_\ell\partial x_j}(0,x)=0$. So: $\begin{align*} \mathcal{L}_X\left(\frac{\partial}{\partial x_i}\right)_p&=\left.\frac{d}{dt}\right|_{t=0}(d\phi_{-t})_{\varphi_t(p)}\left(\left.\frac{\partial}{\partial x_i}\right|_{\phi_t(p)}\right)\\ &=\left.\frac{d}{dt}\right|_{t=0}\sum_k\frac{\partial \phi_k}{\partial x_i}(-t,\phi_t(p))\left.\frac{\partial}{\partial x_i}\right|_p\\ &=\sum_k\left(\left.\frac{d}{dt}\right|_{t=0}\frac{\partial \phi_k}{\partial x_i}(-t,\phi_t(p))\right)\left.\frac{\partial}{\partial x_i}\right|_p \end{align*}$
We will apply the chain rule to calculate the limit inside the sum. Since, $\frac{\partial^2\phi_k}{\partial x_j\partial x_i}=0$, we only need to worry about the derivative of $\frac{\partial \phi_k}{\partial x_i}$ with respect to the time coordinate. With that in mind, we see that $\left.\frac{d}{dt}\right|_{t=0}\frac{\partial \phi_k}{\partial x_i}(-t,\phi_t(p))=\left(\left.\frac{d}{dt}\right|_{t=0}\frac{\partial \phi_k}{\partial x_i}(t,p)\right)\left(\left.\frac{d}{dt}\right|_{t=0}-t\right)=-\left.\frac{d}{dt}\right|_{t=0}\frac{\partial \phi_k}{\partial x_i}(t,p)$. Now:
$\begin{align*} \left.\frac{d}{dt}\right|_{t=0}\frac{\partial \phi_k}{\partial x_i}(t,p)&=\left.\frac{d}{dt}\right|_{t=0}\left.\frac{\partial}{\partial x_i}\right|_p\phi_k\\ &=\left.\frac{\partial}{\partial x_i}\right|_p\underbrace{\left.\frac{d}{dt}\right|_{t=0}\phi_k}_{=a_k}\\ &=\frac{\partial a_k}{\partial x_i}(p)\\ \end{align*}$ Therefore $\mathcal{L}_X\left(\frac{\partial}{\partial x_i}\right)=\sum_k-\frac{\partial a_k}{\partial x_i}\frac{\partial}{\partial x_k}=\sum_k\left[a_k\frac{\partial}{\partial x_k},\frac{\partial}{\partial x_i}\right]=\left[\sum_ka_k\frac{\partial}{\partial x_k},\frac{\partial}{\partial x_i}\right]=\left[X,\frac{\partial}{\partial x_i}\right]_\blacksquare$
4) For $Y=\sum_kb_k\frac{\partial}{\partial x_k}$ use 1), 2), 3) and the fact that $\left[X,b_k\frac{\partial}{\partial x_k}\right]=b_k\left[X,\frac{\partial}{\partial x_k}\right]+X(b_k)\frac{\partial}{\partial x_k}$. $_\blacksquare$