I can't say I have much experience coming up with shortcuts for producing similarity transformations, but I can speculate how they might look. You can think of building them up by doing a row operation on a matrix followed by the inverse column operation. e.g.
subtract the first row from the second, then add the second column to the first
(to see that this is right, look at the corresponding elementary matrices)
I could easily imagine a person with much practice in such calculations would notice the relationship between the four terms, and the possibility that you could combine three of them to cancel the fourth. Or, they might be armed with a general-purpose algorithm for canceling the bottom-left entry in any 2x2 square using simultaneous elementary operations as I described above, and it just happened to have a very simple form in this case.
Armed with this idea, the first thing I thought to try was actually the operation I described above in the block quote, which also is enough to reduce the matrix to upper-triangular. The difference is that I have a $b$ in the upper-right corner, rather than a $-b$.
As a meta-mathematical point, I think one should separate the idea of "a proof I understand" from the idea of "a proof I could have come up with". (and further, to separate the ideas of "a proof I could have come up with because I already knew some things about $A$ and $B$" versus "a proof I could have come up with if I saw $A$ without any context", which may or may not be relevant)