As already pointed out, Itō's formula is a type of chain rule. Let $W_t$ be a Brownian motion (with no drift, starting in 0) Naively, you would guess that the following simple chain rule from analysis holds, $ df(W_t)=f'(W_t)dW_t. $ However, integrating and taking expectations on both sides would then yield $ E[f(W_t)]-f(0)=0\quad\forall t\geq 0 $ (I used that $E[\int_{0}^{t}g(s)dW_s]=0$ for any adapted process $g(s)$. Intuitively, this holds since in the left-endpoint-rule approximations of the integral, each term has zero expectation because $E[W(s)-W(r)]=0$) This cannot be true in general. For example, if $f(x)=x^2$ as in your question, this would imply that for any $t\geq 0$ you have $W_t=0$ almost surely, which clearly does not hold.
The above motivates an extra term in the chain rule. The correct form, $\frac{1}{2}f''(W_t)dt$, of this extra term at least makes sense in the above example: Since the Wiener process spreads out, the expected value $E[f(W_t)]$ should be increasing for small $t$ when the second derivative $f''(W_0)$ is positive, and decreasing when it is negative.
To arrive at the exact form of Itō's formula (still not rigorously though) one can use a formal Taylor expansion, $ f(W_t)-f(W_0)=f'(W_0)dW_t+\frac{1}{2}f''(W_0)(dW_t)^2+\mathcal{O}(dW_t^3), $ together with the intuitive insight that "$(dW_t)^2=dt$" (the variance of $W_t-W_0$ is $t$ by definition of the Wiener process) and "$|dW_t|^r=dt^{r/2}$" for $r>2$, i.e. higher moments are negligible as $dt\to 0$ (again, use that $dW_t$ has normal distribution).
Maybe one can paraphrase the last paragraph as follows:
Itō's formula is the chain rule that you get when higher order terms in the Taylor expansion of $f(X_t)$ become important due to excessive variations of $X_t$.