I'm reading a paper in which the following argument is made (in the proof of Theorem 7). I will try to provide just the essentials necessary to ask my question, in particular omitting the computability & computable analysis parts. If I've not been fully specific after edit number $n$, and the argument as it stands is verifiably false, I will (in edit number $n+1$) refine the argument so it's closer to what the authors actually said.
Let $(X, \mu, \Sigma)$ be a probability space. Let $T \colon X \to X$ be measurable, measure-preserving (i.e. $\mu \circ T^{-1} = \mu$) and ergodic (i.e. $\mu(A \Delta T^{-1} A)=0 \Longrightarrow \mu A =0$ or $\mu A = 1$). If $B$ is measurable, then by von Neumann's mean ergodic theorem
$a_{n} :=\frac{1}{n+1}\sum_{i\leq n} \chi_{T^{-i}B} \to \mu(B),$
where convergence is with respect to the $L^{2}(X)$ norm. If $U\subset X$ is measurable, then they say that—and this is where my confusion lies—by the Cauchy-Schwarz inequality
$\label{conv:1}\langle \chi_{T^{-n}U}, a_{n} \rangle \to \mu(U)\cdot\mu(B).\tag{1}$
I'm confused because a direct appliction of the Cauchy-Schwarz inequality implies less:
$ \begin{align*} \langle \chi_{T^{-n}U}, a_{n} \rangle &\leq \lVert \chi_{T^{-n}U}\rVert_{L^{2}(X)} \cdot \lVert a_{n} \rVert_{L^{2}(x)} \\ &= \mu(T^{-n}(U)) \cdot \lVert a_{n} \rVert_{L^{2}(x)} \\ &= \mu(U) \cdot \lVert a_{n} \rVert_{L^{2}(x)} \quad \text{(Since } T \text{ is measure-preserving.)}\\ &\to \mu(U) \mu(B). \end{align*} $
In the given setting, does the convergence at (1) hold? Why?