170
$\begingroup$

In my AI textbook there is this paragraph, without any explanation.

The sigmoid function is defined as follows

$\sigma (x) = \frac{1}{1+e^{-x}}.$

This function is easy to differentiate because

$\frac{d\sigma (x)}{d(x)} = \sigma (x)\cdot (1-\sigma(x)).$

It has been a long time since I've taken differential equations, so could anyone tell me how they got from the first equation to the second?

  • 0
    One of the reasons they use the sigmoid is that it is easy to differentiate and facilitates backpropagation. Not so for other candidates like sign(x), arctangent(x), sinh(x), etc.2018-12-21

6 Answers 6

292

Let's denote the sigmoid function as $\sigma(x) = \dfrac{1}{1 + e^{-x}}$.

The derivative of the sigmoid is $\dfrac{d}{dx}\sigma(x) = \sigma(x)(1 - \sigma(x))$.

Here's a detailed derivation:

$ \begin{align} \dfrac{d}{dx} \sigma(x) &= \dfrac{d}{dx} \left[ \dfrac{1}{1 + e^{-x}} \right] \\ &= \dfrac{d}{dx} \left( 1 + \mathrm{e}^{-x} \right)^{-1} \\ &= -(1 + e^{-x})^{-2}(-e^{-x}) \\ &= \dfrac{e^{-x}}{\left(1 + e^{-x}\right)^2} \\ &= \dfrac{1}{1 + e^{-x}\ } \cdot \dfrac{e^{-x}}{1 + e^{-x}} \\ &= \dfrac{1}{1 + e^{-x}\ } \cdot \dfrac{(1 + e^{-x}) - 1}{1 + e^{-x}} \\ &= \dfrac{1}{1 + e^{-x}\ } \cdot \left( \dfrac{1 + e^{-x}}{1 + e^{-x}} - \dfrac{1}{1 + e^{-x}} \right) \\ &= \dfrac{1}{1 + e^{-x}\ } \cdot \left( 1 - \dfrac{1}{1 + e^{-x}} \right) \\ &= \sigma(x) \cdot (1 - \sigma(x)) \end{align} $

  • 0
    @ChoudhuryA.M. The reason why d/dx[(1+e^-x)^-1] = −(1+e^−x)^−2(−e^−x) has to do with the chain rule for derivatives. See https://www.khanacademy.org/math/differential-calculus/dc-chain2018-07-18
106

Consider $ f(x)=\dfrac{1}{\sigma(x)} = 1+e^{-x} . $ Then, on the one hand, the chain rule gives $ f'(x) = \frac{d}{dx} \biggl( \frac{1}{\sigma(x)} \biggr) = -\frac{\sigma'(x)}{\sigma(x)^2} , $ and on the other hand, $ f'(x) = \frac{d}{dx} \bigl( 1+e^{-x} \bigr) = -e^{-x} = 1-f(x) = 1 - \frac{1}{\sigma(x)} = \frac{\sigma(x)-1}{\sigma(x)} . $ Equate the two expressions, and voilà!

(Cf. also this answer to a very recent question.)

17

Note that from your given equation,

$(1+e^{-x})\sigma=1$

$\Rightarrow -e^{-x}\sigma+(1+e^{-x})\frac{d\sigma}{dx}=0$ (differentiating using product rule)

$\Rightarrow \frac{d\sigma}{dx}=\sigma.\frac{e^{-x}}{(1+e^{-x})}=\sigma.\frac{(1+e^{-x})-1}{(1+e^{-x})}=\sigma.\left[1-\frac{1}{(1+e^{-x})}\right]=\sigma.(1-\sigma)$

7

Since $\sigma(x)$ is a composite function, firstly we need to use chain rule to dig down to the x term, then we can factor back to the $\sigma(x)$ fuction: $ \begin{align} \frac{d}{dx}\sigma(x) &= (\frac{1}{1+e^{-x}})' \\ &= -\frac{1}{(1+e^{-x})^{2}} \cdot (1) \cdot -e^{-x} \\ &= \frac{e^{-x}}{(1+e^{-x})^{2}}, \\ \because \sigma(x) &= \frac{1}{1+e^{-x}}, \\ e^{-x} &= \frac{1 - \sigma(x)}{\sigma(x)}, \\ 1+e^{-x} &= \frac{1}{\sigma(x)}; \\ \therefore \frac{d}{dx}\sigma(x) &= \frac{\frac{1 - \sigma(x)}{\sigma(x)}}{(\frac{1}{\sigma(x)})^{2}} \\ &= (1 - \sigma(x)) \cdot \sigma(x) \end{align}$

5

Let's say we want to find the derivative of $y=σ(x)=(1+\exp(−x))^{−1}$. So we have:

$ \begin{align} \frac{dy}{dx} & = (-1)(1 + \exp(-x))^{-2} \frac{d}{dx}(1 + \exp(-x)) \\ \\ & = (-1)(1 + \exp(-x))^{-2}(0 + \frac{d}{dx}\exp(-x)) \\ \\ & = (-1)(1 + \exp(-x))^{-2}(\exp(-x)) \frac{d}{dx}(-x) \\ \\ & = (-1)(1 + \exp(-x))^{-2}(\exp(-x))(-1) \\ \\ & = \frac{\exp(-x)} {(1 + \exp(-x))^2} \\ \\ & = \frac{1 + \exp(-x) -1} {(1 + \exp(-x))^2} \\ \\ & = \frac{1 + \exp(-x)} {(1 + \exp(-x))^2} - \frac{1} {(1 + \exp(-x))^2} \\ \\ & = \sigma(x) - (\sigma(x))^2 \\ \\ & = \sigma(x) \cdot (1 - \sigma(x)) \end{align} $