One way to see this is to put together the following two pieces:
- Unitary matrices are (individually) diagonalizable.
- A commuting set $S$ of diagnolizable matrices is simultaneously diagonalizable. IOW there is a single matrix $P$ such that all the matrices $PAP^{-1}$, with $A$ ranging over the set $S$, are diagonal.
Edit (item #1 clear, sketching an argument for item #2):
The question is equivalent to finding an orthogonal basis consisting of eigenvectors of all the matrices in the set $S$. The proof goes by induction on $n$ = the size of the matrices. If you are only interested in $U(2)$, then you only need to worry about the case $n=2$, but the argument for general $U(n)$ works the same way, so we might as well.
If $n=1$, there is nothing to prove, because all the matrices are scalars. Thus the base case is clear.
Then to the general case. A unitary matrix with a single eigenvalue $\lambda$ is necessarily $\lambda I_n$, so if all the matrices in the set $S$ have only a single eigenvalue, then we are done. A more interesting case occurs, when at least one of the matrices in the set has two or more eigenvalues. Say that matrix $A\in S$ has distinct eigenvalues $\lambda_1,\lambda_2,\ldots,\lambda_k$ (some of these may be repeated). In this case the ambient space $V=\mathbf{C}^n$ splits into an orthogonal sum of eigenspaces: $ V=V_1\oplus V_2\oplus \cdots \oplus V_k, $ where $V_i$ is the eigenspace of the eigenvalue $\lambda_i$. Note that because we assumed $k>1$, the dimensions of the spaces $V_i$ are all $.
The key observation is that all the matrices in the set $S$ 'respect' this direct sum decomposition in the following set. Let's pick one of the subspaces, say $V_i$. If $B$ is any matrix from the set $S$, and $v$ is a vector from the subspace $V_i$, then $ A(Bv)=(AB)v=(BA)v=B(Av)=B(\lambda_i v)=\lambda_i(Bv), $ because $A$ and $B$ commute, and $v$ is an eigenvector of $A$. This calculation shows that $Bv$ is then another eigenvector of $A$ belonging to the eigenvalue $\lambda_i$. IOW $Bv_i\in V_i$. Thus the restriction of the linear mappings of the set $S$ to the subspace $V_i$ form a commuting family of unitary matrices of size $\dim V_i$. By induction hypothesis, the space $V_i$ has an orthogonal basis consisting of eigenvectors of all the matrices of the set $S$. The same argument works for all the subspaces $V_i$, so putting their bases together we get a basis of the desired form.