I'm reading the 2012-version “The Matrix Cookbook”. On Page 43 Section 8.2.4 “Mean of Quartic Forms”
there is a formula that really confuses me:
$E[x^TAxx^TBx]=Tr[A\Sigma(B+B^T)\Sigma]+m^T(A+A^T)\Sigma(B+B^T)m+(Tr(A\Sigma)+m^TAm)(Tr(B\Sigma)+m^TBm)$
I do not check formulae of $E[xx^Txx^T]$, $E[xx^TAxx^T]$, or $E[x^Txx^Tx]$ in Cookbook, as they are the special cases of this one. Note that $x\sim N(m,\Sigma)$.
My question is how to prove it?
============================
The following is my attempt:
Suppose consider the simpler situation: a quadratic form with $A$, $B$ symmetric. I use the same technique appeared on Page 11 of Seber and Lee's book "Linear Regression Analysis"
to re-organize these two quadratic forms, then
$E[x^TAxx^TBx]=E\Big\{[(x-m)^TA(x-m)+2m^TA(x-m)+m^TAm)] *[(x-m)^TB(x-m)+2m^TB(x-m)+m^TBm)]\Big\}$
$=E\Big\{(x-m)^TA(x-m)(x-m)^TB(x-m)\Big\}+E\Big\{(x-m)^TA(x-m)m^TBm\Big\}+E\Big\{m^TAm(x-m)^TB(x-m)\Big\}+E\Big\{m^TAmm^TBm\Big\}+2E\Big\{m^TA(x-m)m^TBm\Big\}+2E\Big\{m^TAmm^TB(x-m)\Big\}+2E\Big\{(x-m)^TA(x-m)m^TB(x-m)\Big\}+2E\Big\{m^TA(x-m)(x-m)^TB(x-m)\Big\}+4E\Big\{m^TA(x-m)m^TB(x-m)\Big\}$
in which the fifth and sixth items are zeros and the first four items can be further rearranged such that they equate $(Tr(A\Sigma)+m^TAm)(Tr(B\Sigma)+m^TBm)$ by the fact $E\Big\{(x-m)^TQ(x-m)\Big\}=Tr(Q\Sigma)$
The last item can be reduced if we notice $(scalar)^T=scalar$:
$E\Big\{m^TA(x-m)\cdot m^TB(x-m)\Big\}=E\Big\{m^TA(x-m)\cdot (x-m)^TBm\Big\}=m^TAE\Big\{(x-m)\cdot (x-m)^T\Big\}Bm=m^TA\Sigma Bm$
Consequently, (This is my detailed question, how to show it?) it suffices to show that the seventh and eighth items are equal to the first item in RHS of the initial formula, that is, $2E\Big\{(x-m)^TA(x-m)m^TB(x-m)\Big\}+2E\Big\{m^TA(x-m)(x-m)^TB(x-m)\Big\}=Tr[A\Sigma(B+B^T)\Sigma]=2Tr(A\Sigma B\Sigma)$
I searched it in this website, the link is useful: Expected value using the Kronecker Product
The RHS of my above equation, by the formula $Tr(ABCD)=vec(B^T)^T(A^T\otimes C)vec(D)$ can be re-written as:
$Tr(A\Sigma B\Sigma)=vec(\Sigma)^T(A\otimes B)vec(\Sigma)$
The LHS, by the link, by the fact $x\otimes y=vec(yx^T)$ for two column vectors $x$ and $y$, and by $scalar=Tr(scalar)$, are equivalent to:
$E\Big\{(x-m)^TA(x-m)\cdot m^TB(x-m)\Big\}+E\Big\{m^TA(x-m)\cdot (x-m)^TB(x-m)\Big\}=E\Big\{\Big((x-m)^TA(x-m)\Big)\otimes\Big(m^TB(x-m)\Big)\Big\}+E\Big\{\Big(m^TA(x-m)\Big)\otimes\Big((x-m)^TB(x-m)\Big)\Big\}=E\Big\{\Big((x-m)^T\otimes m^T\Big)\Big(A\otimes B\Big)\Big((x-m)\otimes(x-m)\Big)\Big\}+E\Big\{\Big(m^T\otimes (x-m)^T\Big)\Big(A\otimes B\Big)\Big((x-m)\otimes(x-m)\Big)\Big\}=E\Big\{vec\Big((x-m)m^T+m(x-m)^T\Big)^T\Big(A\otimes B\Big)vec\Big((x-m)(x-m)^T\Big)\Big\}$
In normal distribution assumption, $vec(E[(x-m)(x-m)^T])=vec(\Sigma)$. Therefore it seems to be really near the final equating relation (it ought to be):
$vec(\Sigma)^T(A\otimes B)vec(\Sigma)=E\Big\{vec\Big((x-m)m^T+m(x-m)^T\Big)^T\Big(A\otimes B\Big)vec\Big((x-m)(x-m)^T\Big)\Big\}$
Who can help me prove the above equation? Or are there any mistakes in derivations before?
=============================
Here is my second trial:
Inspired by Seber and Lee's book again, on the same page 11,
who can help me expand $E[x^TAxx^TBx]$ into such coordinate forms to get same solution? though I estimate the working time is huge.
A natural question is, based on my first attempt, how to coordinately expand
$2E\Big\{(x-m)^TA(x-m)m^TB(x-m)\Big\}+2E\Big\{m^TA(x-m)(x-m)^TB(x-m)\Big\}=Tr[A\Sigma(B+B^T)\Sigma]=2Tr(A\Sigma B\Sigma)$
from one side then simplified to the other side? It's relatively easy than expanding $E[x^TAxx^TBx]$ for sure, but how to expand $Tr(A\Sigma B\Sigma)$? There are 4 matrices rather than vectors.
==============================
My last question: are there any books which not only collect matrix formulae but provide proofs and solutions as well? I need your recommendation!
Honestly thank you in advance!