I am having trouble understanding the derivation of some seemingly simple matrix derivatives and am wondering if there is an intuitive (perhaps geometric) explanation. I am reasonably well-versed in multivariate calculus and linear algebra, but am not comfortable with tensor math.
The function I am interested in is $f(t)=\mathbf{B}^T(\mathbf{X}+t\mathbf{Y})^{-1}\mathbf{A}$, where $t$ is a scalar, and $\mathbf{A},\mathbf{B},\mathbf{X},\mathbf{Y}$ are matrices with conformant dimensions.
On the page 24 of the pdf of the appendix on matrix calculus in the book by Jon Dattorro (page 600 of the book), I find the formula for the first derivative of $f(t)$:
$$\frac{df}{dt}=-\mathbf{B}^T(\mathbf{X}+t\mathbf{Y})^{-1}\mathbf{Y}(\mathbf{X}+t\mathbf{Y})^{-1}\mathbf{A}$$
This sort of makes sense to me from my knowledge of calculus of functions of single variable: if you have $g(t)=a(x+ty)^{-1}b=ab(x+ty)^{-1}$, then $\frac{dg}{dt}=-ab(x+ty)^{-2}y=-a(x+ty)^{-1}y(x+ty)^{-1}b$ (from the chain rule and the power rule). That is, there is a clear similarity in the form.
What I don't understand is why the matrix equation for $\frac{df}{dt}$ looks the way it does. Is it due to non-commutativity of matrix multiplication? But how does that come in to this problem exactly? I've found the chain rule for matrix-valued function in the same pdf on page 8 (eq 1749) but I am not sure how to apply it here. Maybe I don't understand something about the calculus of the single-variable functions.
I guess I am asking if there is a way to derive the equation for $\frac{df}{dt}$ "from first principles" without using tensors.
