The oblique projection matrix on $\text{range}(X)$ orthogonal to $\text{range}(Y)$ is given by $P = X (Y^\top X)^\dagger Y^\top$.
Prove that the above definition is right, i.e. it holds that $Xw^\star = Py$ where $w^\star = \text{argmin}_{w: (Xw - y)^\top Y = 0} \|Xw - y\|_2$.
Edit: originally, the formula above erroneously had a min instead of an argmin.
I do recognize this has the feel of very standard material, but for some reason I can't find it. It's OK if you could just name a reference. I don't know if this makes a huge difference, but I need the case where the matrix $Y^\top X$ need not be invertible (hence the pseudo-inverse).
Ok, I do accept that calling the matrix $P$ an oblique projection requires that $\text{range}(X)$ and $\text{range}(Y)^\perp$ are complementary subspaces. Even given this, $Y^\top X$ may still not be invertible (consider $X=Y$ and $X$ has linearly dependent columns).
Also, if we forget the geometry, it is still a legitimate question to ask, in terms of pure algebra, if $Xw^\star = Py$ for the given definition of $w^\star$, for any matrices $X$, $Y$.
I am interested in this case because the question comes from an application where the matrices involved are estimated from samples and the task is to show that even when the inverse doesn't exist, the formula $X (Y^\top X)^\dagger Y^\top$ still solves the optimization as mentioned above.
I have realized that the optimization given above may not be the obvious way to ask the question - the solution $X w^\star$ is already defined by just the fact that $(Xw^\star - y)^\top Y$ has to vanish, even without any optimization.
Now, I'm trying to prove the following things suggested by Marc van Leeuwen below.
- $Pw = w$ for $w \in \text{range}(X)$. This is equivalent to saying $\forall b. PXb = Xb$, or $X(Y^\top X)^\dagger (X^\top Y) = X$. But where do I go from here? I can't just cancel the terms because $(Y^\top X)^\dagger (X^\top Y)$ isn't necessarily the identity.
- $Pw=0$ for $w \perp \text{range}(Y)$. This is obvious since $Y^\top w = 0$.
Also, I don't see how 1 and 2 should imply the result.
