I can show you intuitively where the result comes from. In the case where $f:U \subset \mathbb{R} \to \mathbb{R}$, we know that the derivative of $f$ at some $p \in U$ satisfies,
$$\lim_{h \to 0} \ \frac{f(p+h)- f(p) - f'(a)h}{h} = 0$$
i.e $d_pf(h):=f'(a)h$ is a linear map which takes vectors in $\mathbb{R}$ to vectors in $\mathbb{R}$. The definition of vector in $\mathbb{R}$ isn't geometrically appealing since vectors are just points. This is since $h-0$ which is defined to be the vector extending from $0$ to $h$, is just $h$. Hence, when transitioning to higher dimensions, we will say that $f: U \subset\mathbb{R}^n \to \mathbb{R}^m$ is differentiable at $p \in U$ if there exists a linear map $d_pf$ such that:
$$\lim_{h \to 0} \ \frac{\|f(p+h) - f(p) - d_pf(h)\|}{\|h\|} = 0$$
i.e since $v=f(p+h) - f(p)$ is a vector between $f(p)$ and $f(p+h)$, the above definition shows that $d_pf(h)$ (if it exists) is extremely close to the vector $v$ at $f(p)$. Let us now transition back to the derivative in the one variable case. Observe that we can $\phi:(-\epsilon, \epsilon) \to U \subset \mathbb{R}$ to be a smooth map with $\phi(0) = p$ and $\phi'(0) = h$ then by the chain rule;
$$d_pf(h) = \frac{d}{dt}\Bigr|_{t=0} (f \circ \phi)(t) = f'(p) \cdot \phi'(0) = f'(p) h$$
i.e the vector $h \mapsto (f \circ \phi)'(0):=d_pf(h)$. To stumble upon the derivative of $f: \mathbb{R}^n \to \mathbb{R}^m$, let us suppose that $d_pf$ is derived in the same fashion. Hence, we will define $d_pf(w) = (f \circ \phi)'(0)$ where $\phi:(-\epsilon, \epsilon) \to U \subset \mathbb{R}^n, \phi(0) = p, \phi'(0) = w$. For simplicity, let us take $n = 2, m = 3$. I will show you that $d_pf$ is independent of choice $\phi$ and it is actually a linear map.
Let $(u,v)$ be the coordinates in $\mathbb{R}^2$ and $(x,y,z)$ coordinates in $\mathbb{R}^3$. Finally, let $e_1 = (1,0), e_2 = (0,1)$ denote the basis in $\mathbb{R}^2$ and $f_1 = (1,0,0), f_2 = (0,1,0)$ and $f_3 = (0,0,1)$ be the basis in $\mathbb{R}^3$. Then given $\alpha:(-\epsilon, \epsilon) \to U$, we can write $\alpha(t) = (u(t), v(t))$ and so we have $\alpha'(0) = u'(0)e_1 + v'(0)e_2 = w$. Since we have $f(u,v) = (x(u,v),y(u,v), z(u,v))$ then:
$$(f \circ \alpha)(t) = (x(u(t),v(t)), y(u(t),v(t)), z(u(t), v(t)))$$
Thus, using the chain rule and taking derivatives at $t= 0$ we get:
$$ (f \circ \alpha)'(0) = \left(\frac{\partial x}{\partial u} \frac{du}{dt} +\frac{\partial x}{\partial v} \frac{dv}{dt}\right) f_1+ \left(\frac{\partial y}{\partial u} \frac{du}{dt} +\frac{\partial y}{\partial v} \frac{dv}{dt}\right) f_2+\left(\frac{\partial z}{\partial u} \frac{du}{dt} +\frac{\partial z}{\partial v} \frac{dv}{dt}\right) f_3$$
$$\hspace{-2.2in} =\begin{pmatrix} \dfrac{\partial x}{\partial u} & \dfrac{\partial x}{\partial v} \\ \\ \dfrac{\partial y}{\partial u} & \dfrac{\partial y}{\partial v} \\ \\ \dfrac{\partial z}{\partial u} & \dfrac{\partial z}{\partial v} \end{pmatrix} \begin{pmatrix} \dfrac{du}{dt} \\ \\ \dfrac{dv}{dt} \end{pmatrix} = d_pf (w)$$
It is up to you now to show that $d_pf$ satisfies the limit definition. My last remark is that by definition, $d_pf$ becomes the unique linear map with property satisfying the limit condition. I hope this helped.