Short answer: Yes, you can always take an invertible matrix to REF using swaps at the beginning, followed only by adds and multiplies. This is called the PLU decomposition or the Bruhat decomposition of GL(n,F).
Let's write the elementary row operations in a slightly different form:
- Switch: One may switch row i and row j by multiplying the matrix on the left by P(i,j), the matrix which is the identity matrix with its *i*th and *j*th rows switched.
- Multiply: One may multiply row i by a non-zero number x by multiplying the matrix on the left by D(i,x), the matrix which is the identity matrix with its *i*th row multiplied by x.
- Add: One may add row j to row i (changing row i) by multiplying the matrix on the left by E(i,j), the matrix which is the identity matrix with its *i*th row containing an extra 1 in the j column. We generally require i > j.
Applying a sequence of elementary row operations is thus equivalent to multiplying on the left by a sequence of these P(i,j), D(i,x), and E(i,j) matrices. This is our first big idea: elementary row operations are themselves matrices!
Note that each of these operations can be undone:
- Apply P(i,j) again immediately to undo P(i,j)
- Apply D(i,1/x) immediately to undo D(i,x)
- Apply D(j,-1), then E(i,j), then D(j,-1) to undo E(i,j)
Our second big idea is that if we allow a bunch of elementary row operations in a row, we get very nice collection of row operations. Let's see how they relate to the group G = GL(n,F) of all invertible matrices.
- The set of all products of D(i,x) is the subgroup D of all diagonal matrices. In terms of row ops, we can multiply each row by its own non-zero number. This subgroup is called a maximally split maximal torus of G.
- The set of all products of D(i,x) and E(i,j) under the requirement that if one uses D(i,x) one must also use D(i,1/x) is the subgroup U of all lower triangular matrices with ones on the diagonal. This subgroup is called a maximal unipotent subgroup of G. If is a finite field of characteristic p, then U is a Sylow p-subgroup of G.
- When the field F has size p, a prime, then the subgroup generated by just the adds is already U, but in characteristic 0, the adds by themselves don't even generate a group, and except for fields of size p, they are not normalized by D.
- What about unrestricted products of adds and multiplies? The subgroups D and U intersect in only the identity, and D normalizes U, so the subgroup generated by both D and U is a semi-direct product B consisting of exactly the lower triangular matrices with non-zero diagonal entries. This subgroup is called a Borel subgroup of G.
- The set of all products of P(i,j) is the subgroup P of all permutations matrices, matrices with exactly one 1 per row and column, and the others 0. This subgroup is naturally isomorphic to the Weyl group of G.
Now we can examine the two sets of operations you wanted to allow:
- If we only allow permutations at the beginning we get the union of cosets BP.
- If we allow permutations whenever we want, we get the subgroup generated by B and P, which is in fact the entire group G of invertible matrices.
In particular, given a starting matrix M, you can change M into strictly more forms using B and P willy-nilly (you can take the matrix into RREF!). However, REF can always be achieved just using BP.
Here is the proof using the PLU decomposition of linear algebra:
An invertible matrix is in REF if and only if it is upper triangular (and since we are allowed to use D(i,x), we can require the diagonal entries to be 1s if we want). Now use the PLU decomposition to write our matrix M as M = P*L*U for P in P, L in U, and U upper triangular. Then U=REF(M) is a row echelon form of M, and U = L−1 P−1 M is gotten from M first by applying the row swaps indicated by the permutation matrix P−1, and then by applying the adds and multiplies indicated by the lower triangular matrix L−1.
The proof using Bruhat decomposition is pretty similar, but uses even more notation, so it seems a little silly to keep going after PLU.
You should see PLU in any linear algebra course. Bruhat decompositions for a second group theory class are covered in Alperin–Bell, Groups and Representations.