Since you wanted a conceptual explanation:
the reason is that the action of "scaling" on the circle, and the action of "scaling" on the real line, are two different things.
On the circle, the mapping $\theta \mapsto a\theta$ "wraps around", and for $a$ a nonzero integer, the image of the circle under this map wraps around $|a|$ times.
On the real line, the mapping $x\mapsto ax$ is one-to-one.
And this makes a huge difference.
When you do the rescaling $\cos \theta \to \cos 2\theta$ on the circle, you are not just compressing the function by making the characteristic length-scale smaller, you are also cramming two copies of the rescaled function into the same circle. In fact, this works for any periodic function: the mapping $g(\theta) \to g(a\theta)$ scales spatially and also makes $|a|$ copies of the function.
When you do the rescaling $f(x) \to f(ax)$ on the real line, you are only compressing the function spatially. There still is only one copy of the function. This difference in the number of copies is, morally speaking, why there is a factor of $|a|$ difference in the two formulae.
Another way to think about it: a "better" (in some cases) way of thinking about the Fourier series (in the context of as a special case of the Fourier transform; this way is not necessarily better for other applications) is, instead of extending the function on $[0,2\pi]$ periodically, extend it by the $0$ function outside of $[0,2\pi]$. Then you see immediately that the evaluate of the Fourier transform at integer values gives you precisely the Fourier coefficients for the series on the circle. So define the function $g(x) = \cos(x)$ if $x\in [0,2\pi]$ and $0$ elsewhere. The rescaling of $g(x)$ is the function $g(ax) = \cos(ax)$ if $x\in [0,2\pi/a]$ and $0$ elsewhere. This is very different from the function $\cos(ax)$ on $[0,2\pi]$.
This is all to say that what you thought of as rescaling on the circle is not just rescaling: but scaling and copying.