Was doing some review of discrete time stochastic processes and I have a question about where the non-negativity assumption is used in the proof of the following standard theorem:
Let $(M_n)_{n=0,1,2\ldots}$ be a non-negative submartingale. Define $M^*_n = \max_{m\le n}M_m$ and let $\lambda > 0.$ Then $$P(M_n^*\ge\lambda) \le\frac{1}{\lambda } E(M_n 1(M_n^*\ge \lambda)).$$
The proof, which I've seen in a few places, goes as follows:
Proof: Let $\tau = \min\{m\ge 0\mid M_m\ge\lambda\}$ be the first passage time to $\lambda.$ Observe that $\{M^*_n\ge\lambda\} = \{\tau \le n\}$ since the maximum is greater than $\lambda$ if and only if $M$ has passed $\lambda$ at some time in the past. Then, we can write $$ P(M_n^*\ge\lambda) = P(\tau\le n) = \sum_{k=0}^n P(\tau=k).$$ Since $M_n$ is a submartingale, we have $$ E(M_n\mid\mathcal F_k) \ge M_k$$ for all $0\le k\le n.$ We have $\{\tau=k\}\in\mathcal F_k,$ so the properties of conditional expectation give $$E(M_n1(\tau = k)) \ge E(M_k1(\tau=k)).$$ Finally, since $M_k\ge\lambda$ on $\{\tau=k\},$ we can write $$ E(M_n1(\tau = k))\ge E(M_k1(\tau=k)) \ge \lambda P(\tau=k).$$ Then we plug in this inequality to the first equation, giving $$P(M_n^*\ge\lambda) = \sum_{k=0}^n P(\tau=k) \le \sum_{k=0}^n\frac{1}{\lambda}E(M_n1(\tau=k)) = \frac{1}{\lambda } E(M_n 1(M_n^*\ge \lambda))$$ where in the last step we used the fact that the $\{\tau=k\}$ sets are disjoint for different $k$'s and that $$\bigcup_{k=1}^n\{\tau=k\} = \{\tau\le n\} = \{M^*_n \ge \lambda\}.$$
I cannot see where the assumption that $M_n$ is non-negative is used. I figure I must be missing something (probably something obvious) since all the statements of the theorem I've seen make the assumption.