The basic problem here is a confusion of terminology around "derivative".
The basic idea of differentiation in one dimension is finding a tangent to a graph at the point. A tangent can be viewed as a linear approximation of the graph, in the following (not necessarily obvious) sense: $f(x + h) - f(x) \approx hf^\prime(x)$ The function that you are really linearly approximating is not $f$ as such but the difference between $f(x)$ and $f$ near $x$. Since this will be zero when $h$ is zero, it seems much more plausible that we could approximate it in this way (rather than $f$ which can be whatever it likes).
In higher dimensions, the derivative of $f$ at $x$ is defined to be a linear map $A$ such that $f(x + h) - f(x) \approx A(h)$ And in fact, the 1D case fits here, since $h\mapsto ah$ is linear for any constant $a$. The problem is that in fact every linear map in 1D is of the form $h\mapsto ah$, so there is a direct correspondence, in the one-dimensional case, between scalar constants and linear maps. So there ends up being confusion in our definitions whether the derivative of $f$ at $x$ is a number or a function, and I think it is that confusion that you are coming up against.
In this context, don't think of $f^\prime$ as the derivative of $f$, think of it as a function that takes an argument $x$ and gives you the $a$ such that the derivative of $f$ at $x$ is "multiply by $a$". This is a linear map, which is injective exactly when $a\not=0$. The key is to understand that $f^\prime$ isn't a linear map itself, it produces linear maps for each $x$.