First think about why it's true in the $n=1$ case. A linear function is $f(x)=ax$. The derivative at any point is $f'(x)=a$. You can think of $a$ as a $1\times 1$ matrix satisfying $\lim_{h\to 0}\frac{f(x+h)-f(x)-a\cdot h}{h}=0$.
The case for higher dimensions is analogous. Suppose $f(x)=Ax:\mathbb{R}^n\to\mathbb{R}^n$ is linear and represented by the matrix $A$. Fix $x_0\in\mathbb{R}^n$. Then to find the differential at $x_0$, we need to find a matrix $D$ satisfying
$$\lim_{h \rightarrow 0} \frac{\|A(x_0+h)-Ax - Dh\|}{\|h\|} = 0$$
where $h$ is also a vector in $\mathbb{R}^n$. But choosing $D=A$, this simplifies to
$$\lim_{h \rightarrow 0} \frac{\|A(x_0+h)-Ax_0 - Ah\|}{\|h\|} $$
$$\lim_{h \rightarrow 0} \frac{\|Ax_0+Ah-Ax_0 - Ah\|}{\|h\|}$$
$$\lim_{h \rightarrow 0} \frac{\|0\|}{\|h\|}$$
$$=0$$
So $A$ is the differential at $x_0$. Since $x_0$ was arbitrary, $A$ must be the differential at every point.
Another way to see this is to realize that the differential is just the matrix of partial derivatives. For simplicity I'll use the $2\times 2$ case. If the matrix of $f$ is
$$A=\begin{bmatrix}a&b\\c&d\end{bmatrix}$$
Then $f$ can be written
$$f\binom{x}{y} = \binom{f_1(x,y)}{f_2(x,y)} = \binom{ax+by}{cx+dy}$$
Computing the matrix of partial derivatives at any point gives
$$D=\begin{bmatrix}\frac{\partial f_1}{\partial x}&\frac{\partial f_1}{\partial y}\\\frac{\partial f_2}{\partial x}&\frac{\partial f_2}{\partial y}\end{bmatrix} = \begin{bmatrix}a&b\\c&d\end{bmatrix} = A$$
Which again shows that $A$ is the differential at every point.
If $g:U\to\mathbb{R}^n$ is the inclusion, then locally the map is just
$$g\begin{pmatrix}x_1\\\vdots\\x_n\end{pmatrix}=\begin{pmatrix}g_1(x_1,...,x_n)\\\vdots\\g_n(x_1,...,x_n)\end{pmatrix}=\begin{pmatrix}x_1\\\vdots\\x_n\end{pmatrix}$$
This is a linear function represented by the matrix $I_n$, so by the argument above the differential is also $I_n$. Alternatively, computing the partials at any point gives
$$D=\begin{pmatrix}
\frac{\partial g_1}{\partial x_1} & \cdots & \frac{\partial g_1}{\partial x_n} \\
\vdots & \ddots & \vdots \\
\frac{\partial g_n}{\partial x_1} & \cdots & \frac{\partial g_n}{\partial x_n} \\
\end{pmatrix} = \begin{pmatrix}
1 & 0 & 0 & \cdots & 0 \\
0 & 1 & 0 & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & \cdots & 1
\end{pmatrix} = I_n$$
Part of the confusion arises from the fact that in the $1\times 1$ case $f:\mathbb{R}\to\mathbb{R}$, the derivative (or differential) is two things: at each point it's the linear transformation $d$ that satisfies $\lim_{h\to 0}\frac{f(x+h)-f(x)-d\cdot h}{h}=0$; in this case $d$ is just a $1\times 1$ matrix AKA a real number, and thus $f'(x)$ is also a function $\mathbb{R}\to\mathbb{R}$. The analogous statement in $\mathbb{R}^n$ is that the differential is an $n\times n$ matrix at each point and thus a function $f'(x):\mathbb{R}^n\to\mathbb{R}^{n^2}$. For linear maps $f$, $f'$ is a constant function whose constant value is the matrix representing $f$.