The proof of this result is often shown in elementary numerical analysis book. Here is a sketch.
You first show there exists a unique fixed point for $x = g(x) = Ax + b$. The fixed point $x$ satisfies $x = g(x)$ if and only if $(I-A)x = b$, which gives you existence and uniqueness if $I-A$ is invertible. The eigenvalues $\mu$ of $I-T$ satisfies $ \det((I-T) - \mu I) = 0 \quad \Longleftrightarrow \quad \det(T - (1-\mu) I) = 0 $ so that $\rho(T) = \max_{i=1}^n |\lambda_i| < 1$ implies $|1-\mu| < 1$, and we're done.
Now that we have existence and uniqueness, $x_{k+1} = Ax_k + b$ means $x_{k+1} - x = A(x_k - x)$. Suppose at this point that $A$ is diagonalizable, so that the error can be written as $ x_0 - x = \sum_{i=1}^n \gamma_i v_i $ which means $ x_{k+1} - x = A(x_k - x) = \sum_{i=1}^n \gamma_i \lambda_i^k v_i $ by induction on $k$. Therefore since $|\lambda_i^k| = |\lambda_i|^k \to 0$ as $k \to \infty$ (because $|\lambda_i| < 1$), we know that the "error term" $x_k - x$ goes to $0$, thus your algorithm converges.
I must say though I have no proof when $A$ is not diagonalizable.
Hope that helps,