Please forgive me a really simple question: Am I calculating the relative error reduction correctly?
Example: Model 1 gets 80/100 examples wrong, model 2 gets only 20/100 examples wrong, so the 80 errors are reduced by 60, which is an error reduction of 75%.
So the relative error reduction formula seems to be
(m1_error - m2_error) / m1_error * 100.0
What if it's vice versa? Model 1 gets 20/100 wrong, but model 2 gets 80/100 wrong. By what amount does model 2 "reduce" the error? The formula gives -300%, so it increases the error by 300%.
Does that make sense?