Specifically, if we know that one solution is $y_1(t)$, then why is the second solution in the form $y_2(t) = v(t) y_1(t)$? Where $v(t)$ is the function that you need to solve for. Why does this assumption always work?
Why does reduction of order work for linear ODEs?
2 Answers
This is a differential analog of the Factor Theorem. Let's recall the latter first. If $\rm\:x = r\:$ is a root of the polynomial $\rm\:p(x)\:$ over a ring R, then by the Division Algorithm we have
$\rm p(x) = (x-r)\:q(x) + c,\ \ for\ \ c\in R,\ \ \ so\ \ \ p(r)=0\ \iff\ \ c = 0$
Now consider a linear differential equation presented in operator form using $\rm\:D = \frac{d}{dx}\:$
$\rm\begin{eqnarray} &&\rm\ a_n f^{(n)} +\,\cdots\,+ a_1 f' + a_0\ f\ =\ 0\quad\ where\ \ a_i\ may\ depend\ on\ x\\ &\to\ &\rm (a_n D^n +\, \cdots\, + a_1 D + a_0)(f) = 0\end{eqnarray}$
Let $\rm\:L = L(D)\:$ denote the above polynomial in $\rm\:D.\:$ Products of such polynomials generally do not commute because $\rm\:D\:$ does not commute with $\rm\:x,\:$ indeed $\rm\: Dx = xD + 1\:$ as operators, since $\rm\: (D\cdot x)f = D(xf) = x(Df) + f = (x\cdot D+1)f.\:$ However, there are certain specialized type of division algorithms available for these noncommutative polynomials, as Oystein Ore worked out in detail. In particular, if you work out the product of $\rm\:L(fg)\:$ in general as in Robert's answer you will obtain
$\begin{eqnarray} \rm (L\cdot f)g &=&\rm L(fg) = (\hat L\cdot D) g + (Lf) g\\ \rm i.e.\quad L\cdot f &=&\rm \hat L\cdot D + Lf\ \ \ for\ \ \hat L\ \ of\ smaller\ degree\ (order)\ in\ D\end{eqnarray} $
In effect we right-divided $\rm\:L\cdot f\:$ by $\rm\:D\:$ with $\rm\:Lf = $ remainder. Thus when $\rm\:Lf = 0,\:$ i.e. when $\rm\:f\:$ is a solution of $\rm\:L,\:$ we deduce $\rm\:L(fg) = 0 \iff \hat L(Dg) = 0,\:$ yielding the reduction of order.
Reduction of order is sometimes called D'Alembert's method. You can find modern algorithmic work on such by searching on that name and Abramov and Petkovsek - two of many researchers who have generalized Ore's work to effective algorithms employed in computer algebra systems.
Any function $y_2(t)$ can be written as $v(t) y_1(t)$, at least on an interval where $y_1(t) \ne 0$: you just take $v(t) = y_2(t)/y_1(t)$. The real question is, how does this substitution help? The answer to that comes from linearity and Leibniz's rule for differentiation: $ (v y_1)^{(n)} = v y_1^{(n)} + \text{terms in $v', v'', \ldots, v^{(n)}$}$ If you have a linear differential equation is $L(y) = 0$ of order $m$, the first terms give you $v L(y_1) = 0$, and you are left with a linear differential equation involving $v', \ldots, v^{(m)}$ from the other terms; this is still a linear differential equation in $v$ of order $m$, but since there is no term in $v$ (without any differentiation) it can be written as a linear differential equation in $v'$ of order $m-1$.