I have a scalar-valued function $y$ and a vector-valued function $\textbf{x}$:
\begin{equation} y = \underset{(1 \times D_h)}{\textbf{w}^T}\underset{(D_h \times 1)}{\textbf{x}} \end{equation} \begin{equation} \underset{(D_h \times 1)}{\textbf{x}} = \underset{(D_h \times D_z)}{\textbf{B}}\underset{(D_z \times 1)}{\textbf{z}} \end{equation}
My problem is to compute $\nabla_{\textbf{B}}y$. These are the steps I understand so far:
\begin{equation} \nabla_{\textbf{B}}y = \frac{\partial{y}}{\partial\textbf{x}}\frac{\partial\textbf{x}}{\partial\textbf{B}} = \textbf{w}^T\frac{\partial\textbf{x}}{\partial\textbf{B}} \end{equation}
$\frac{\partial\textbf{x}}{\partial\textbf{B}}$ seems to be 3-dimensional.
I took a wild leap at finishing this derivation and got:
\begin{equation} = \underset{(D_z \times 1)}{\textbf{z}}\underset{(1\times D_h)}{\textbf{w}^T} \end{equation}
I computed some two-point approximations of the gradient and this solution seems to be working.
My question is, why does this last step work? What identity is this? It seems very strange to me that the $\textbf{z}$ would show up on the left. I'm probably skipping some steps..