The model is nonlinear, but one of the parameters, $T_{s}$ is linear, which means we can 'remove' it.
Start with a crisp set of definitions: a set of $m$ measurements $\left\{ t_{k}, T_{k} \right\}_{k=1}^{m}.$ The trial function, as pointed out by @Yves Daust, is $ T(t) = T_{s} \left( 1 - e^{-\alpha t}\right). $ The $2-$norm minimum solution is defined as $ \left( T_{s}, \alpha \right)_{LS} = \left\{ \left( T_{s}, \alpha \right) \in \mathbb{R}_{+}^{2} \colon r^{2} \left( T_{s}, \alpha \right) = \sum_{k=1}^{m} \left( T_{k} - T(t_{k}) \right)^{2} \text{ is minimized} \right\}. $
The minimization criterion $ \frac{\partial} {\partial \alpha} r^{2} = 0 $ leads to $ T_{s^{*}} % = \frac{\sum T_{k} \left( 1 - e^{-\alpha t_{k}} \right)} {\sum \left( 1 - e^{-\alpha t_{k}} \right)^{2}}. $
Now the total error can be written is terms of the remaining parameter $\alpha$: $ r^{2}\left( T_{s^{*}}, \alpha \right) = r_{*}^{2} ( \alpha ) = \sum_{k=1}^{m} \frac{\sum T_{k} \left( 1 - e^{-\alpha t_{k}} \right)} {\sum \left( 1 - e^{-\alpha t_{k}} \right)^{2}} \left( 1 - e^{-\alpha t_{k}} \right). $
This function is an absolute joy to minimize. It decreases monotonically to the lone minimum, then increases monotonically.