I'll have a go at this:
To see that the number of pairs of equal letters (colliding letters) in the ciphertext is actually given by
$$D_{\rm c} = \sum_{i=1}^{p}\sum_{α={\rm A}}^{\rm Z}\frac{M_{α}^{(i)}(M_{α}^{(i)}-1)}{2} + \sum_{i=1}^{p}\sum_{j=i+1}^{p}\sum_{α={\rm A}}^{\rm Z}M_{α}^{(i)}M_{α}^{(j)},$$
note that:
If the letter $\alpha$ occurs $M_{\alpha}^{(i)}$ times in the $i^{th}$ row
then the number of times $\alpha$ collides with itself in the $i^{th}$ row is simply $$\binom{ M_{\alpha}^{(i)}}{2}=\frac{M_{α}^{(i)}(M_{α}^{(i)}-1)}{2},$$
The number of times $\alpha$ occuring in the $i^{th}$ row and the $j^{th}$ row collides with itself is simply $$M_{\alpha}^{(i)}M_{\alpha}^{(j)},$$
and then sum appropriately to obtain the totals.
Now, the probability of collision of two independent random variables with the same probability distribution is $\sum_{x \in \mathcal{X}} P_x^2$ where here the sum would range over $\mathcal{X}=\{A,B,\ldots,Z\}$
When the row length is correct, i.e., matches the keyword period, the coincidence distribution is that of English letters, which is experimentally determined to be approximately 0.065 based on the frequency of the letters in English with $P_x$ largest when $x=E$ etc.
When the row length is not equal to the keyword period, the coincidence distribution is that of a ``random language'', which is modelled as having a uniform distribution over 26 letters, so each $P_x=1/26,$ and the sum gives $$\sum_{x \in \mathcal{X}} P_x^2=26\times(1/26^2)\approx 0.038.$$
Now, I go back to the equation for $2D_c$
$$2 D_{\rm c} = \sum_{i=1}^{p}\sum_{α={\rm A}}^{\rm Z} M_{α}^{(i)}(M_{α}^{(i)}-1) + 2 \sum_{i=1}^{p}\sum_{j=i+1}^{p}\sum_{α={\rm A}}^{\rm Z}M_{α}^{(i)}M_{α}^{(j)}\quad (1)$$
We're looking at the case that the rowlength is $p.$
Then letter collisions within columns obey the statistics for English while letter collisions between columns obey the random statistics, since the alphabet has (in general) been shifted by a different letter in each column.
This means that the analysis assumes the keyword has distinct letters, which may in general not be the case, hopefully it doesn't have too many occurences of the same letter! If it's randomly chosen and $p$ is not too large this need not be a problem.
The second assumption we need to make is that the ciphertext fragment is long enough so that random variables have converged to their expectations. For simplicity assume $p|N,$ so that $N=Mp,$ otherwise the result is an approximation.
The second term on the right hand side of (1) counts $M^2p(p-1)$ pairs of possible collisions accross distinct columns, where $M$ is the number of entries in each column. There are $M^2$ collisions for each pair of columns chosen. These collisions occur at rate $0.038$ by our probabilistic assumption so the second term in (1) achieves its mean $0.038\times M^2 p(p-1)$ as in the last term in the claimed equation at the bottom of the question.
The first term on the right hand side of (1) counts collisions within the same column, which obey English statistics. There are $M(M-1)$ slots for collisions to occur in a given column on distinct rows, and there are $p$ columns thus we get $0.065\times M(M-1) p$. But letter collides with itself at rate 1 so we have $pM$.
Adding these gives
$$0.065 \times M(M-1) p + M p$$
which I think is the correct expression for the rest of the claimed expression instead of
$$0.065 \times M^2 p - M p$$ as claimed in the question.