3
$\begingroup$

When using logistic regression on a Casio or Texas Instruments calculator, the output is of the form $f(x) = \frac{c}{1+ae^{-bx}} $ The problem I have (when teaching in a class where both types of calculators are used) is that for a given data set, the two calculators will sometimes give different answers, i.e. different values for $a$, $b$ and $c$.

I know the algorithms for polynomial and exponential regression, but am not sure about how logistic regression is actually implemented. The Wikipedia article on the logistic regression apparently describes only the situation where $c=1$, or where $c$ is given beforehand - and this is much simpler!

For an example where the calculators give different answers, take the $x$-values 10, 11, 12, 13, 14, with corresponding $y$-values 140, 153, 162, 169, 173. Here Casio gives $a=1.8432$, $b=0.0842987$, $c=407.35$, while TI gives $a=30.7$, $b = 0.465$, $c=181$. The latter answer looks a better fit to the data points, but the first is not completely nonsensical.

My two questions are:

1) Can someone describe an algorithm for determining the "best possible" values of $a$, $b$, and $c$ from a given set of data points $\{(x_i, y_i)\}_{i=1}^{n}$?

2) Does anyone know why there is a discrepancy between Casio and TI?

  • 0
    TI calculators internally use the Levenberg-Marquardt algorithm for logistic regression. See [this](ftp://ftp.ti.com/pub/graph-ti/calc-apps/info/logistic.txt) for instance.2012-11-08

2 Answers 2

1

The model being highly nonlinear with respect to all parameters, the minimization of the sum of squared errors is iterative. Then, the results depend on the convergence criteria. But more important are the starting values selected for the parameters (I suspect that here is the problem).

Using your data and a rigorous nonlinear regression, what I obtained as "best" fit is $y=\frac{180.966}{1+30.6677 e^{-0.46527 x}}$ which is indeed very close to what the TI calculator gave. According to the model, the predisted values would be $140.0$, $152.9$, $162.3$, $168.7$ and $173.1$ extremely close to those given by JJacquelin in his answer.

If you compute the predicted values using the numbers from the Casio, you almost get a constant value around $227$ !

By the way, I hope that you appreciate the non iterative procedure given by JJacquelin in his answer.

If you are interested by the generation of good starting values, let me know and I shall elaborate.

0

What means "the best possible" values of the parameters ? They are many "best" values, depending on the criteria for optimisation : mean square or absolue or relative error or others...

I don't know what exactly are the algorithms in the Casio or in TI.

In order to compare, I give you the procedure below (The symbols are changed in interest of consistency with the paper where this procedure comes from : pages 16-17 in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales ).

enter image description here

enter image description here

The above discrepancy is negligible in practice. Certainly a more sophisticated algorithm of non-linear regression involving iterative calculus would give even "better" result insofar the criteria for fitting be clearly defined.

Note : I forgot to mention that the data must be ranked in increasing order of $x_k$ (as it is in the data of the numerical example).