When you write $\frac{\partial}{\partial A}B$ where $A$ and $B$ are matrices, what you are understood to mean is
$\frac{\partial}{\partial A_{ij}}B_{kl}$
which is a rank-4 tensor. It is common to contract over one or more of those indices, but it's not necessary.
Going to index notation, $(W^TW)_{kl}=W_{mk}W_{ml}$ and therefore
$ \begin{align} \left[\frac{\partial}{\partial W}(W^TW)\right]_{ijkl} & = \frac{\partial}{\partial W_{ij}}(W_{mk}W_{ml}) \\ & = \frac{\partial W_{mk}}{\partial W_{ij}} W_{ml} + W_{mk} \frac{\partial W_{ml}}{\partial W_{ij}} \\ & = \delta_{im} \delta_{jk} W_{ml} + \delta_{im}\delta_{jl}W_{mk} \\ & = \delta_{jk} W_{il} + \delta_{jl} W_{ik} \end{align} $
If you now chose to contract over a pair of indices you would get a rank 2 tensor (a matrix). For example, if you contracted over $j$ and $k$ you end up with
$ \begin{align} \delta_{jj} W_{il} + \delta_{jl} W_{ij} & = (n+1) W_{il} \end{align} $
where $n=\delta_{jj}$ is the dimension of the space your tensors are defined over.
If you need to read up about index notation you might want to take a look at this set of example questions and answers, which I found very helpful when I was learning it for the first time.
To apply this to the second part of your question you apply the multivariable chain rule as normal.