Task: Suppose we model a variable $y = Wx + \mu$ as a linear transformation of $x$ plus some Gaussian noise $\mu\sim\mathcal N(0,\sigma I)$. Our aim is to minimize the estimation error of $x$ given $y$ in terms of $W$, i.e. we want to minimize the entropy $H(x|y,W)$ as a function of $W$. Suppose that, during learning, we know $x$ for every observed state $y$: what is the optimal supervised update of the model parameters $W$?
The problem is: I can't wrap my head around the question how I come - in a rigorous way - from the estimation problem that I want to solve to the parameter updates that depend on the variable I want to estimate. It's probably a standard procedure but the problem is hard to nail down to Google-friendly buzzwords.
Thanks in advance!! blue2script