According to Lee, the pushforward was invented to give a coordinate independent definition of the total derivative of a smooth function between two smooth manifolds. To each smooth map $F:M \to N$ and each point $p \in M$ we associate a linear map $F_*:T_pM\to T_{F(p)}N$ defined by $F_*X(f) = X(f \circ F)$ where $X$ is any derivation in $T_pM$ and $f:N \to \mathbb{R}$ is any smooth function. Given a smooth chart $\phi:M \to \mathbb{R}^m$ near $p$ we have a basis for $T_pM$ given by $\frac{\partial}{\partial x^i}|_{p} = ({\phi ^{-1}}_*)\frac{\partial}{\partial x^i}|_{\phi(p)}$. Similarly given a smooth chart $\psi$ near $F(p)$ we have a basis for $T_{F(p)}N$ given by $\frac{\partial}{\partial y^i}|_{F(p)} = ({\psi ^{-1}}_*)\frac{\partial}{\partial y^i}|_{\psi(p)}$. A calculation in Lee shows that the matrix representation of $F_*$ with respect to these bases is the total derivative of the coordinate representation $\hat{F} = \psi \circ F \circ \phi ^{-1}$ evaluated at $\phi(p)$.
My question is, is there some intuitive reason why we would expect this to be true? This all seems very abstract to me. I can't tell if it is supposed to be obvious that this definition should be a coordinate independent way of encoding the total derivative of $F$ and I am just missing something, or if it is just difficult to understand. How should I think about the pushforward?