I never really understood basic Gaussian elimination & solving systems of equations once I learned some actual linear algebra. I thought this was due to me missing some fundamental aspect of the subject that some book would eventually illuminate for me or that things would just click but no they haven't & I can't stand being told something along the lines of Kaplansky quote "we think basis-free, we write basis-free, but when the chips are down we close the office door and compute with matrices like fury" as a rationale for the apparent disconnect between the theory & application of linear algebra when I view things as I'll describe below.
Lets say I have this square system:
$ax + by = e$
$cx + dy = f$
I think there are four ways we can geometrically understand this picture, & I have questions about all of them (note that nothing will be said about bigger or non-square systems in this post).
$01:$ VECTORS & LINEAR MAPS
If I want to understand this exclusively in terms of vectors & linear maps I can write this system as a linear combination:
$x(a,c) + y(b,d) = (e,f)$
$xT(\hat{e_{1}}) + yT(\hat{e_{2}}) = (e,f)$
$T(x\hat{e_{1}} + y\hat{e_{2}}) = (e,f)$
$T(\vec{v}) = \vec{z}$
Now we can see that solving this system of linear equations is equivalent to determining which vector in the domain of $T$ is mapped to the vector $(e,f)$. Furthermore, using the fact that a linear map on a FDVS is completely determined by it's action on a basis, if we arrange things such that T acts on the standard basis then we can use linearity to determine the scalar multiples x & y.
I think that's the general gist of what's going on (this is all correct so far, right?), & from a distance this is very geometric & conceptually intuitive. In the best case scenario (unique solution to the system) this is the image I think most people have.
The thing I don't like about this perspective is how divorced it is from all computations that I know of, it basically has nothing to do with Gaussian or Gauss-Jordan elimination as far as I can tell.
My first question is whether or not you can use this interpretation, i.e. linear maps, in a computational sense because it seems to me you have to revert to another interpretation I'll outline below & I'm wondering whether the concepts are actually so apparently divorced or whether I'm missing something, maybe I just don't see how all of this is actually related to basic linear algebra. Also it just seems strange to me to whip out new vectors that, while admittedly contain something from both equations, geometrically has no obvious connection with the lines.
02: NORMAL VECTORS
This interpretation uses the fact that the vector $(a,b)$ is the normal vector to $ax + by = e$ (i.e. $(a,b)\cdot(x - x_0,y - y_0) = 0$ such that $ax_0 + by_0 = e$) & is basically a geometric interpretation of (every step of) both Gaussian & Gauss-Jordan elimination, giving some soul & feeling to the algebraic computations. Here you're using the second most obvious vectors associated with the lines (the normal, with the first most obvious vector being that one parallel to the line). Thus when you have
$ax + by = e$
$cx + dy = f$
& you add a scalar multiple of one to the other you get
$(a + \lambda c)x + (b + \lambda d)y = e + \lambda f$,
you can interpret this as nothing other than adding normal vectors to end up with a new 'normal vector' $(a + \lambda c,b + \lambda d)$ (what it is 'normal' to I don't know but I think it just a convenient vector we use as a means to eliminate coefficients, as done next) & end up with:
$(a + \lambda c,b + \lambda d)\cdot(x - x_0,y - y_0) = 0$ s.t.
$(a + \lambda c)(x - x_0) + (b + \lambda d)(y - y_0) = 0$
$a(x - x_0) + \lambda c(x - x_0) + b(y - y_0) + \lambda d(y - y_0) = 0$
$ax + \lambda cx + by + \lambda dy - ax_0 - \lambda cx_0 - by_0 - \lambda dy_0 = 0$
$(a + \lambda c)x + (b + \lambda d)y = ax_0 + \lambda cx_0 + by_0 + \lambda dy_0 $
Thus as long as $(a,b)$ & $(c,d)$ are not linearly dependent you can't choose $\lambda$ such that the above becomes $(0,0)\cdot(x - x_0,y - y_0) = 0$. Now the standard route is to choose $\lambda$ such that you eliminate one of the variables & solve for the other, say $\lambda = - \frac{a}{c}$, gives
$(a + \lambda c,b + \lambda d)\cdot(x - x_0,y - y_0) = 0$
$(a - \frac{a}{c} c,b - \frac{a}{c} d)\cdot(x - x_0,y - y_0) = 0$
$(0,b - \frac{ad}{c})\cdot(x - x_0,y - y_0) = 0$
$(b - \frac{ad}{c})(y - y_0) = 0$
$bc(y - y_0) - ad(y - y_0) = 0$
$bcy - bcy_0 - ady + ady_0 = 0$
$(ad - bc)y_0 = (ad - bc)y$
$y_0 = y$
which can also be done using:
$(a + \lambda c)x + (b + \lambda d)y = ax_0 + \lambda cx_0 + by_0 + \lambda dy_0 $
since you get
$(a - \frac{a}{c} c)x + (b - \frac{a}{c}d)y = ax_0 - \frac{a}{c}(cx_0) + by_0 - \frac{a}{c} dy_0 $
$(b - \frac{a}{c}d)y = (b - \frac{a}{c} d)y_0 $
$y = y_0 $
Similarly for finding $x = x_0$, however we want to understand this geometrically.
My second question is as to whether it right to interpret the above as saying that we're going to take $(x_0,y_0)$ as the hypothetical point of intersection of the two lines & in the situation that no $\lambda$ can be chosen such that the dot product will contain a zero vector (i.e. if we can be sure the normal vectors are linearly independent) we know it uniquely exists & from then on we are doing nothing other than choosing $\lambda$ such that, say when we're solving for $y = y_0 $, the vector $(a + \lambda c,b + \lambda d)$ points in the y axis direction, i.e. it's a vertical vector in the cartesian plane, of the form $(0,y_0)$, i.e. pointing to the y component of the intersection of the two lines? Similarly for finding the $x_0$ term, we just use vector addition to eliminate a coefficient then find $(x_0,0)$, then through finding both $(x_0,0)$ & $(0,y_0)$ we simultaneously find $(x_0,y_0)$. Unless I'm deluded I'm pretty sure all of the above is a geometric way to understand every step of those furious computations with matrices so I don't see how this can be wrong...
My third question is to how any of this discussion relates to linear maps? It seems to me that interpreting a system of linear equations in terms of normal vectors is far superior to interpreting them in terms of linear maps, at least in the square $n x n$ case. Am I missing something?
03: DETERMINANTS & LINEAR MAPS:
Let $\Psi$ be an alternating bilinear form such that $\Psi(e_1,e_2) = 1$. For an operator $T$ we note the number $\lambda$ such that $\Psi(T(e_1),T(e_2)) = \lambda\Psi(e_1,e_2)$ is known as the determinant, i.e. $\Psi(T(e_1),T(e_2)) = det(T)\Psi(e_1,e_2)$. Again this way of looking at things is very intuitive from a distance, the determinant of an operator is nothing but the number such that the area between $T(e_1)$ & $T(e_2)$, i.e. $\Psi(T(e_1),T(e_2))$, is just a multiple of the area between $e_1$ & $e_2$, i.e.$\Psi(e_1,e_2)$ (disregarding signs). In fact we have no problem in more generally writing $\Psi(T(u),T(v)) = det(T)\Psi(u,v)$ for arbitrary vectors $u$ & $v$.
Note that $\Psi$ has nothing to do with normal vectors here, it's exploiting the properties of the first way of looking at this system (in terms of matrices we're dealing with the determinant as a linear function of the columns basically). The reason I bring this topic up here is to find out about how to relate these concepts to the geometry of the situation. Again we are introducing seemingly arbitrary vectors $T(e_1)$ & $T(e_2)$ that don't relate to the geometry of the lines (though of course the vectors contain algebraic information).
With that said my fourth question comes from solution's determined via Cramer's rule. If you use this notation, $\Psi(T(e_1),T(e_2)) = det(T)\Psi(e_1,e_2)$, you see $\Psi(\vec{z},T(e_2)) = \Psi(xT(e_1) + yT(e_2),e_2)$ implies $x = \frac{\Psi(\vec{z},T(e_2))}{\Psi(T(e_1),T(e_2))}$. This term simply must have some fascinating interpretation... I would love to know what it means to say that the $x$ component of the point of intersection of two lines is the ratio of
- the area between the vectors whose components are the solutions to both of the equations (I can't see a nice way to talk about or interpret this) & the vector $T(e_1)$ (whatever this vector is supposed to be interpreted as)
- to the area contained within $T(e_1)$ & $T(e_2)$.
My fifth question is almost the same as the above except it modifies the interpretaton of the last sentence "to the area contained within $T(e_1)$ & $T(e_2)$". If we exploit the fact that for matrices: $det(T) = det(T^t)$ we can interpret the determinant in a whole new manner intimately related to the geometry of the lines, we can now interpret the determinant as containing the (signed) area between the normal vectors to the lines (which immediately gives meaning to the situations of either a zero or non-zero determinant). To restate the question I would love to know what it means to say that the $x$ component of the point of intersection of two lines is the ratio of
- the area between the vectors whose components are the solutions to both of the equations & the vector $T(e_1)$ (whatever this vector is supposed to be interpreted as)
- to the area contained within the normal vectors to the two lines.
My sixth question is whether I'm right to make all these distinctions. I don't know whether I should be going so far as to even delineate between two separate interpretations of the denominator in the solution to cramer's rule & asking for two different interpretations but it really seems like you have to be able to think about this in two different ways, one extremely geometric on every level (normal vectors), the other geometric only at the start. I am just not sure, I think you just have no intuitive geometric interpretation in terms of linear maps, you have to use these almost arbitrary vectors $T(e_1)$ divorced from the geometry of the lines if you think in terms of linear maps whereas when you do it in terms of normal vectors you get something nice.
04 LINEAR FUNCTIONALS & LINEAR MAPS
My seventh & final question is about the relationship of linear functionals to solving systems of linear equations. Given the system:
$ax + by = e$
$cx + dy = f$
i.e. $xT(\hat{e_{1}}) + yT(\hat{e_{2}}) = (e,f)$`
we ask how linear functionals interact with this setup. By introducing $\psi_1(xe_1 + ye_2) = e$ & $\psi_2(xe_1 + ye_2) = f$ we see
$\psi_1(xe_1 + ye_2) = x\psi_1(e_1) + y\psi_1(e_2) = ax + by = e$
$\psi_2(xe_1 + ye_2) = x\psi_2(e_1) + y\psi_2(e_2) = cx + dy = f$
I really don't know how to interpret this or fit it into the general scheme of things. It seems to be saying that a linear functional maps the solution vector to a line, & that the action of a linear functional on a basis results in coefficients of the normal vectors (i.e. in some way you're mapping the solution of the system to the normal vectors) but I don't know what you're supposed to do with this & would appreciate any help on how to interpret this in light of everything I've asked.
I really appreciate any help with this, I know it's a long post but the questions are, in my mind, all tied together so I sincerely appreciate any help.
