First, Witsenhaus's counterexample is intended to be the simplest example of a particular phenomenon - distributed control without communication is hard. You have a simple input, all the variables are observable. And the question is: Even in this ridiculously simple example, is linear control as good as we can do? Answer: No, nonlinear control is better even here. This is a bit of a surprise -- in very simple systems, linear control is frequently optimal. As Witsenhausen observes in the Introduction to the relevant paper,
"Considering in particular unconstrained control of linear systems with Gaussian noise and quadratic criteria, it is well known that the search for an optimum can safely be confined to the class of affine (linear plus constant) functions. This is the case for both discrete and continuous time systems"
[...]
"A counterexample is presented for which it is established that an optimal design exists and that no affine design is optimal. There does not appear to exist any counterexample involving fewer variables than the one presented here."
In the counterexample, the first input is $0 + x_0$, the zero signal with added Gaussian noise, $x_0 \sim N(0, \sigma_0^2)$. The job of the control system is to reproduce the $0$ as accurately as possible. The input noise, $x_0$ has no memory and is not a continuous process, so memory/internal state in the first controller, $C_1$, is of no use. The second controller, $C_2$, only sees a noisy copy of the output of the first controller and its job is to exactly cancel the true output of the first controller. It's noise is another Gaussian, $N(0,1)$, so memory is again useless.
So with this setup, we turn on time. The noise in the input starts flailing, the noise on the second observation starts flailing and we measure our objective function, some sort of power for the first control signal and penalty for output power. When we compute the expectation over the input noise and the second observation noise, we get an average objective power. There are nonlinear controllers that have less average power than linear controllers. Consequently, we get the surprise that even this very simple system is not optimized by linear control.