There may be a quick way to see this without appealing to any general theorems, but here is a useful perspective on the problem that is rather general. All of what I am about to do appears in textbooks and papers, but I do not have any references on hand at the moment, so I am reconstructing the proofs on the fly.
I will write $t(C)$ for the usual trace (sum of diagonal entries) of a matrix $C$ and $\langle x, y\rangle = \sum_{j=1}^n x_j y_j^*$ for the usual inner product in $\mathbb{C}^n$. It helps to know that $t$ is unitarily invariant, so that for any orthonormal basis $v_1, \dots, v_n$ of $\mathbb{C}^n$ and any $C \in M_n$ one has that $t(C) = \sum_{j=1}^n \langle C v_j, v_j\rangle$.
First an observation that I think is often attributed to Weyl: that if $A \in M_n$ is a self-adjoint matrix with eigenvalues $\lambda_1 \geq \dots \geq \lambda_n$, then for any $1 \leq r < n$ $ \sum_{j=1}^r \lambda_j = \sup \{t(A P): P \in M_n, P = P^* = P^2, \operatorname{rank} P = r\}. $ To see this, fix $P \in M_n$ with $P = P^* = P^2$ and rank $r$ and let $e_1, \dots, e_n$ be an orthonormal basis of $\mathbb{C}^n$ satisfying $Ae_j = \lambda_j e_j$ for all $1 \leq j \leq n$. Then $ \begin{align} t(AP) & = \sum_{j=1}^n \langle AP e_j, e_j\rangle = \sum_{j=1}^n \langle P e_j, A e_j\rangle = \sum_{j=1}^n \lambda_j \langle P e_j, e_j\rangle \\ & = \sum_{j=1}^r \lambda_j \|P e_j\|^2 + \sum_{j > r} \lambda_j \|P e_j\|^2 \\ & \leq \sum_{j=1}^r \lambda_j \|P e_j\|^2 + \sum_{j > r} \lambda_{r+1} \|P e_j\|^2 \\ & = \sum_{j=1}^r \lambda_j \|P e_j\|^2 + \lambda_{r+1} \sum_{j > r} \langle P e_j, e_j\rangle \\ & = \sum_{j=1}^r \lambda_j \|P e_j\|^2 + \lambda_{r+1} \left(t(P) - \sum_{j=1}^r \langle P e_j, e_j\rangle \right) \\ & = \sum_{j=1}^r \lambda_j \|P e_j\|^2 + \sum_{j=1}^r \lambda_{r+1} (1 - \langle P e_j, e_j\rangle) \\ & \leq \sum_{j=1}^r \lambda_j \|P e_j\|^2 + \sum_{j=1}^r \lambda_j (1 - \langle P e_j, e_j\rangle) \\ & = \sum_{j=1}^r \lambda_j, \end{align} $ proving the observation.
We now make a second observation: if $P$ is a rank $r$ projection for which equality holds in the above, then $P$ must commute with $A$.
In proving this it helps to let $E$ denote the orthogonal projection onto the span of $\{e_1, \dots, e_r\}$ (note that $E$ commutes with $A$).
Looking at the first $\leq$ above we see that equality holding forces $ \sum_{j > r} (\lambda_{r+1} - \lambda_j) \|P e_j\|^2 = 0 $ which since the numbers $\lambda_{r+1} - \lambda_j$ and $\|P e_j\|^2$ are nonnegative forces $ (\lambda_{r+1} - \lambda_j) \|P e_j\|^2 = 0, \qquad j > r. $ Suppose there is no $j > r$ with $\lambda_{r+1} = \lambda_j$. In this case we conclude from the above that $\|P e_j\|^2 = 0$ for all $j > r$ and hence $P e_j = 0$ for all $j > r$, implying that the range of $P$ is contained in the orthocomplement of the span of $\{e_j: j > r\}$, which is the range of $E$. But the range of $P$ has dimension $r$ by hypothesis, and we conclude that the range of $P$ is precisely the range of $E$, and so $P = E$, and hence $P$ commutes with $A$.
Looking at the second $\leq$ above we see that equality holding also forces $ \sum_{j=1}^r (\lambda_j - \lambda_{r+1}) (1 - \langle P e_j, e_j\rangle) = 0. $ Since $\lambda_j - \lambda_{r + 1} \geq 0$ and $1 - \langle P e_j, e_j\rangle = \|(I - P) e_j\|^2 \geq 0$ we conclude that $ (\lambda_j - \lambda_{r+1}) \|(I - P) e_j\|^2 = 0, \qquad 1 \leq j \leq r. $ Suppose there is no $1 \leq j \leq r$ with $\lambda_j = \lambda_{r+1}$ then we deduce that $\|(I - P) e_j\|^2 = 0$ for all $1 \leq j \leq r$ and hence that the range of $I - P$ is contained in the orthocomplement of the span of $\{e_1, \dots, e_r\}$, which is precisely the range of $I - E$. But again a comparison of dimensions shows that $I - P = I- E$ and hence $P = E$. So again $P$ commutes with $A$.
If we are not in one of the cases where have been able to conclude that $P = E$, then letting $J$ be the largest integer $j$ satisfying $r \leq j \leq n$ and $\lambda_{r+1} = \lambda_j$, and let $K$ be the smallest integer $k$ satisfying $1 \leq k \leq r$ and $\lambda_j = \lambda_{r+1}$, and letting $\lambda$ denote the value of $\lambda_{r+1}$ for short, we see that $\lambda_j = \lambda$ holds for all $K \leq j \leq J$, and arguments similar to the ones just given show that $P e_j = 0$ for all $j > J$ and that $(I - P) e_j = 0$ for all $j < K$. This implies that $P e_j = e_j$ for all $j < K$ and hence that the span of the set of $e_j$ for which $j < K$ or $j > J$ is invariant under $P$. Since $P$ is an orthogonal projection it follows that the orthocomplement of this set, namely the span of $\{e_j: K \leq j \leq J\}$, is also invariant under $P$. Thus with respect to the basis $\{e_1, \dots, e_n\}$ of $\mathbb{C}^n$, the projection $P$ takes the form of a block diagonal matrix P = \begin{pmatrix} I & 0 & 0 \\ 0 & P' & 0 \\ 0 & 0 & 0 \end{pmatrix} where the upper left $I$ is a (possibly empty) $(K-1) \times (K-1)$ identity block and the lower right $0$ is a (possibly empty) $(n-J) \times (n - J)$ block, and P' is a $(J - K + 1) \times (J-K + 1)$ projection. Now the matrix $A$ takes the block form \begin{pmatrix} D & 0 & 0 \\ 0 & \lambda I & 0 \\ 0 & 0 & D' \end{pmatrix} where $D$ and D' are diagonal matrices of the appropriate sizes and $I$ denotes the $(J-K+1) \times (J - K + 1)$ identity matrix. Since whatever the projection P' is it must commute with $\lambda I$ it follows that $PA = AP$ in this case also.
Note that whenever $\lambda_{r+1}$ is not a repeated eigenvalue (so that "the span of the eigenvectors corresponding to the largest $r$ eigenvalues" is uniquely determined), we are in one of the first two nice cases above, and $P = E$ is actually forced to be the projection onto the span of the eigenvectors corresponding to the largest $r$ eigenvalues. This happens in particular if $A$ has no repeated eigenvalues.
So, in your question, fixing $1 \leq r < n$, the hypothesis is that the orthogonal projection $P$ onto the span of $e_1, \dots, e_r$ attains the supremum on the right hand side of $ \sum_{j=1}^r \lambda_j = \sup \{t(A P): P \in M_n, P = P^* = P^2, \operatorname{rank} P = r\}, $ so by the above discussion it must commute with $A$. This gives you the decomposition of $A$ into an $r \times r$ block $A_1$ and a $(n - r) \times (n - r)$ block $A_2$. Since the set of eigenvalues of $A$ is the union of the sets of eigenvalues of $A_1$ and $A_2$ and $t(A) = t(A_1) + t(A_2)$ I think it is easy to see that the eigenvalues of $A_1$ must be $\lambda_1, \dots \lambda_r$ and those of $A_2$ must be the rest. Otherwise $t(A_1)$ will be too small.