4
$\begingroup$

In the paper

Bach, F. R., & Jordan, M. I. (2002). Kernel Independent Component Analysis. Journal of Machine Learning Research, 3(1), 1-48. doi:10.1162/153244303768966085

I stumpled upon the following claim involving a correlation measure the authors define, the $\mathcal F$-correlation of two univariate random variables $x_1,x_2$ relative to a vector space $\mathcal F$ of functions from $\mathbb R$ to $\mathbb R$, $ \rho_{\mathcal F}= \sup_{f_1,f_2\in\mathcal F} \text{corr}\left(f_1(x_1),f_2(x_2)\right)= \sup_{f_1,f_2\in\mathcal F} \frac{ \text{cov}\left(f_1(x_1),f_2(x_2)\right) }{ \text{var}\left(f_1(x_1)\right)^{1/2} \text{var}\left(f_1(x_1)\right)^{1/2} }. $

The authors state that if $x_1,x_2$ are independent, then $\rho_\mathcal{F}(x_1,x_2)=0$, but they also claim that the converse ($\rho_{\mathcal F}=0~\implies$ $x_1,x_2$ are independent) also holds when $\mathcal F$ is large enough.

My question: As an example, they say that it is well known that if $\mathcal F$ contains the Fourier basis (i.e. functions $f_\omega(x) = \exp(i\omega x)$ with $\omega \in \mathbb R$) then $\rho_{\mathcal F}=0~\implies$ $x_1\bot\!\!\!\bot x_2$. My problem is, that I do not see how this is obviously true and I also failed at proving it. Unfortunately, there is no reference or proof for that claim in the paper. When I tried to prove it myself, I could not find a good starting point. First, I thought that the proof could be done via properties of the characteristic function, but I did not get far with that.

I am explicitly interested in the claim for the Fourier basis and not so much in the more general claim of Bach and Jordan. If anyone could show me how to prove it (or point at a reference) I would be grateful?

2 Answers 2

4

You said it yourself: if $\mathcal F$ contains every complex exponential function and if $X$ and $Y$ are such that $\rho_\mathcal F(X,Y)=0$, then $\mathrm E(\mathrm e^{\mathrm i(xX+yY)})=\mathrm E(\mathrm e^{\mathrm ixX})\mathrm E(\mathrm e^{\mathrm iyY})$ for every $(x,y)$ in $\mathbb R^2$. This means the Fourier transform of the distribution $\mathrm P_{(X,Y)}$ of $(X,Y)$ coincides with the Fourier transform of the product distribution $\mathrm P_X\otimes\mathrm P_Y$, hence $\mathrm P_{(X,Y)}=\mathrm P_X\otimes\mathrm P_Y$, which means exactly that $X$ and $Y$ are independent.

  • 0
    Hmm, I guess my comment was misleading. I didn't take it as criticism. I just thought it was funny that I almost got it (like last time in "How does a function acting on a random variable change the probability density function of that random variable?"), but needed your answer to see the last bit of it, and you even started it with the same phrase. So my comment was meant as "next time I try not to ask something that I almost solved myself" :). In any case: thanks.2012-04-02
3

Here's the background. We know that the Pearson correlation coefficient, the quantity we are taking the max or supremum of, is defined iff the transformed random variables (continuing with the authors' use of lowercase) $y_i=f_i(x_i)$ both have a well-defined finite and nonzero variance, and that when it is defined, it lies in $[-1,1]$ and is zero for independent $y_i$, since the numerator is $ E[(y_1-E[y_1])(y_2-E[y_2])]= E[y_1y_2]-E[y_1]E[y_2]. $ We also know that the converse is not in general true because of limitations of the measure, namely, that it measures linear dependence (a classic example being $x_2=x_1^2$) and positive correlation (the latter limitation is fully circumvented by this definition because $f_1\in\mathcal F\iff-f_1\in\mathcal F$). But this is where our transformation space $\mathcal F$ comes in handy. For example, the dependence of $x_2=x_1^2$ will always be detected if $\mathcal F$ contains $f_1:x\mapsto x$ and $f_2:x\mapsto x^2$. Well, the Fourier basis above is dense in $L^1(\mathbb{R})$, so... do you see where this is going? See @Didier's post.

  • 0
    Yeah, they go together pretty nicely, don't they! No worries :-)2012-04-02