Two classic books where you can find all the details about this stuff:
- Gantmacher, Matrix theory, Chelsea.
- Lancaster-Titsmenesky, The theory of matrices, Academic Press.
Actually, for "hand" calculations, this works through the Jordan canonical form: you find the Jordan canonical form of your matrix, together with the change of basis matrix
$$
A = SJS^{-1} \ .
$$
Then you prove that, for any polynomial $p(t)$, you have
$$
p(A) = S p(J) S^{-1} \ .
$$
Hence,
$$
f(A) = S f(J) S^{-1}
$$
and you only need to compute $p(J)$ for Jordan matrices.
Which you do as follows: first, if you have a diagonal block matrix
$$
J =
\begin{pmatrix}
J_1 & 0 & \dots & 0 \\
0 & J_2 & \dots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \dots & J_r
\end{pmatrix}
$$
you can easily prove that
$$
p(J) =
\begin{pmatrix}
p(J_1) & 0 & \dots & 0 \\
0 & p(J_2) & \dots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \dots & p(J_r)
\end{pmatrix}
$$
So again, on one hand,
$$
f(J) =
\begin{pmatrix}
f(J_1) & 0 & \dots & 0 \\
0 & f(J_2) & \dots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \dots & f(J_r)
\end{pmatrix}
$$
and, on the other hand, you just need to know $p(J)$ when $J$ is a Jordan block. If :
$$
J =
\begin{pmatrix}
\lambda & 0 & 0 & \dots & 0 & 0 \\
1 & \lambda & 0 & \dots & 0 & 0 \\
0 & 1 & \lambda & \dots & 0 & 0 \\
\vdots & \vdots & \ddots & \ddots & \vdots & \vdots \\
0 & 0 & \dots & 1 & \lambda & 0 \\
0 & 0 & \dots & 0 & 1 & \lambda
\end{pmatrix}
$$
is an $r\times r$ Jordan block, then
$$
p(J) =
\begin{pmatrix}
p(\lambda ) & 0 & 0 & \dots & 0 & 0 \\
p'(\lambda)/ 1! & p(\lambda) & 0 & \dots & 0 & 0 \\
p''(\lambda)/ 2! & p'(\lambda)/ 1! & p(\lambda) & \dots & 0 & 0 \\
\vdots & \vdots & \ddots & \ddots & \vdots & \vdots \\
p^{(r-2)}(\lambda)/(r-2)! &p^{(r-3)}(\lambda)/(r-3)! & \dots & p'(\lambda)/ 1! & p(\lambda) & 0 \\
p^{(r-1)}(\lambda)/(r-1)! &p^{(r-2)}(\lambda)/(r-2)! & \dots & p''(\lambda)/2! & p'(\lambda)/ 1! & p(\lambda)
\end{pmatrix}
$$
Hence, again you have all in terms of $f$ in fact:
$$
f(J) =
\begin{pmatrix}
f(\lambda ) & 0 & 0 & \dots & 0 & 0 \\
f'(\lambda)/ 1! & f(\lambda) & 0 & \dots & 0 & 0 \\
f''(\lambda)/ 2! & f'(\lambda)/ 1! & f(\lambda) & \dots & 0 & 0 \\
\vdots & \vdots & \ddots & \ddots & \vdots & \vdots \\
f^{(r-2)}(\lambda)/(r-2)! &f^{(r-3)}(\lambda)/(r-3)! & \dots & f'(\lambda)/ 1! & f(\lambda) & 0 \\
f^{(r-1)}(\lambda)/(r-1)! &f^{(r-2)}(\lambda)/(r-2)! & \dots & f''(\lambda)/2! & f'(\lambda)/ 1! & f(\lambda)
\end{pmatrix}
$$
And, in this version of the story, you actually don't need to know your polynomial $p(t)$ for your function $f(t)$ and matrix $A$ -but it's not difficult to find it, anyway: it's called the Lagrange-Sylvester polynomial, which is some sort of mixture between the classic Lagrange interpolation polynomial and a Taylor series.
EDIT
Nevertheless, seems that I forgot to answer the most important question: "Why does this whole thing actually work?"
That is, why defining
$$
f(A) = p(A)
$$
for some polynomial $p(t)$ that agrees with $f(t)$ on the spectrum of $A$ all this makes sense? That is, for what reason can we actually call $p(A)$ (the computable thing) the "value" of $f(t)$ on the matrix $A$?
Because of the following:
Theorem. (Gantmacher, chapter V, $\S 4$, theorem 2). If the function $f(t)$ can be expanded in a power series in the circle $\vert t - \lambda \vert < r$,
$$
f(t) = \sum_{n=0}^\infty a_n(t-\lambda)^n \ ,
$$
then this expansion remains valid when the scalar argument $t$ is replaced by a matrix $A$ whose eigenvalues lie within the circle of convergence.
That is, under the hypotheses of the theorem, you have
$$
f(A) = \sum_{n=0}^\infty a_n(A-\lambda I)^n \ ,
$$
where the $f(A)$ in the left hand-side means $p(A)$, the value of the Lagrange-Sylvester polynomial on $A$.
So, why not define $f(A)$ as this last power series (i.e., the Taylor series of $f(t)$)? Well, because then you would have to talk a long time about convergence issues of matrix series... And you would end, finally, at the same point: relying on the Jordan canonical form for actual computations. So, the Lagrange-Sylvester device allows you to get rid of convergence issues -if you're willing to accept $f(A) = p(A)$ as a sound definition.