129
$\begingroup$

In my AI textbook there is this paragraph, without any explanation.

The sigmoid function is defined as follows

$$\sigma (x) = \frac{1}{1+e^{-x}}.$$

This function is easy to differentiate because

$$\frac{d\sigma (x)}{d(x)} = \sigma (x)\cdot (1-\sigma(x)).$$

It has been a long time since I've taken differential equations, so could anyone tell me how they got from the first equation to the second?

  • 2
    What AI textbook is that?2017-09-29
  • 3
    @frog1944: It seems to be *Artificial Intelligence Illuminated* by Ben Coppin, page 302 ([Google Books link](https://books.google.com/books?id=LcOLqodW28EC&pg=PA302&lpg=PA302)).2017-11-06
  • 1
    @HansLundmark thank you very much!2017-11-06
  • 1
    Any book on neural networks will deal with the sigmoid function. It is useful because of the simple way backpropagation works; a lot of computing work is saved when training a network from a set of results. In nature, other functions are possible, like arctan, rational functions, and more.2018-01-18
  • 0
    One of the reasons they use the sigmoid is that it is easy to differentiate and facilitates backpropagation. Not so for other candidates like sign(x), arctangent(x), sinh(x), etc.2018-12-21

6 Answers 6

5

Let's say we want to find the derivative of $y=σ(x)=(1+\exp(−x))^{−1}$. So we have:

$$ \begin{align} \frac{dy}{dx} & = (-1)(1 + \exp(-x))^{-2} \frac{d}{dx}(1 + \exp(-x)) \\ \\ & = (-1)(1 + \exp(-x))^{-2}(0 + \frac{d}{dx}\exp(-x)) \\ \\ & = (-1)(1 + \exp(-x))^{-2}(\exp(-x)) \frac{d}{dx}(-x) \\ \\ & = (-1)(1 + \exp(-x))^{-2}(\exp(-x))(-1) \\ \\ & = \frac{\exp(-x)} {(1 + \exp(-x))^2} \\ \\ & = \frac{1 + \exp(-x) -1} {(1 + \exp(-x))^2} \\ \\ & = \frac{1 + \exp(-x)} {(1 + \exp(-x))^2} - \frac{1} {(1 + \exp(-x))^2} \\ \\ & = \sigma(x) - (\sigma(x))^2 \\ \\ & = \sigma(x) \cdot (1 - \sigma(x)) \end{align} $$

228

Let's denote the sigmoid function as $\sigma(x) = \dfrac{1}{1 + e^{-x}}$.

The derivative of the sigmoid is $\dfrac{d}{dx}\sigma(x) = \sigma(x)(1 - \sigma(x))$.

Here's a detailed derivation:

$$ \begin{align} \dfrac{d}{dx} \sigma(x) &= \dfrac{d}{dx} \left[ \dfrac{1}{1 + e^{-x}} \right] \\ &= \dfrac{d}{dx} \left( 1 + \mathrm{e}^{-x} \right)^{-1} \\ &= -(1 + e^{-x})^{-2}(-e^{-x}) \\ &= \dfrac{e^{-x}}{\left(1 + e^{-x}\right)^2} \\ &= \dfrac{1}{1 + e^{-x}\ } \cdot \dfrac{e^{-x}}{1 + e^{-x}} \\ &= \dfrac{1}{1 + e^{-x}\ } \cdot \dfrac{(1 + e^{-x}) - 1}{1 + e^{-x}} \\ &= \dfrac{1}{1 + e^{-x}\ } \cdot \left( \dfrac{1 + e^{-x}}{1 + e^{-x}} - \dfrac{1}{1 + e^{-x}} \right) \\ &= \dfrac{1}{1 + e^{-x}\ } \cdot \left( 1 - \dfrac{1}{1 + e^{-x}} \right) \\ &= \sigma(x) \cdot (1 - \sigma(x)) \end{align} $$

  • 3
    Sir, Is d(e^x)=e^x?2017-03-06
  • 3
    @RavinderPayal: Yes, d/dx(e^x) = e^x; If you want a proof of that see: https://www.khanacademy.org/math/ap-calculus-ab/advanced-differentiation-ab/proofs-for-derivatives-of-ex-and-lnx-ab/v/proof-d-dx-e-x-e-x2017-03-06
  • 1
    Where does the (1 + e^-x) - 1 suddenly come from in third row up from bottom?2017-07-11
  • 0
    for my untrained eyes this explanation runs rather smoothly except pre-last => last line. What happens there?2017-08-07
  • 3
    @Jarad: e^-x == 1 + e^-x - 1; we are just adding 1 and subtracting 1 from the same term, which changes nothing.2017-08-17
  • 0
    @VladimirIgnatov: Sorry, I'm not sure which line you are referring to.2017-08-17
  • 11
    Thank you. Your explanation is much better than the answer.2017-09-21
  • 0
    It has been a long time since I've taken differential equations, so Its little complex to understand for me. I understand everything in line 1,2,4-9 except 3. From where there is an extra (-e^-x) ? An explanation would be very helpful.2018-07-15
  • 0
    @ChoudhuryA.M. The reason why d/dx[(1+e^-x)^-1] = −(1+e^−x)^−2(−e^−x) has to do with the chain rule for derivatives. See https://www.khanacademy.org/math/differential-calculus/dc-chain2018-07-18
92

Consider $$ f(x)=\dfrac{1}{\sigma(x)} = 1+e^{-x} . $$ Then, on the one hand, the chain rule gives $$ f'(x) = \frac{d}{dx} \biggl( \frac{1}{\sigma(x)} \biggr) = -\frac{\sigma'(x)}{\sigma(x)^2} , $$ and on the other hand, $$ f'(x) = \frac{d}{dx} \bigl( 1+e^{-x} \bigr) = -e^{-x} = 1-f(x) = 1 - \frac{1}{\sigma(x)} = \frac{\sigma(x)-1}{\sigma(x)} . $$ Equate the two expressions, and voilà!

(Cf. also this answer to a very recent question.)

  • 0
    How do you derive 1 + e^-x as -e^-x? (Update: I think it's because the derivative of e^x = e^x) https://en.wikipedia.org/wiki/Derivative#Rules_for_basic_functions2017-08-04
  • 1
    @AdamGrant: Yes, since then the chain rule gives $e^{kx}=k e^{kx}$ for any constant $k$. (In this case, we have $k=-1$.)2017-08-05
16

Note that from your given equation,

$(1+e^{-x})\sigma=1$

$\Rightarrow -e^{-x}\sigma+(1+e^{-x})\frac{d\sigma}{dx}=0$ (differentiating using product rule)

$\Rightarrow \frac{d\sigma}{dx}=\sigma.\frac{e^{-x}}{(1+e^{-x})}=\sigma.\frac{(1+e^{-x})-1}{(1+e^{-x})}=\sigma.\left[1-\frac{1}{(1+e^{-x})}\right]=\sigma.(1-\sigma)$

6

Since $\sigma(x)$ is a composite function, firstly we need to use chain rule to dig down to the x term, then we can factor back to the $\sigma(x)$ fuction: $$ \begin{align} \frac{d}{dx}\sigma(x) &= (\frac{1}{1+e^{-x}})' \\ &= -\frac{1}{(1+e^{-x})^{2}} \cdot (1) \cdot -e^{-x} \\ &= \frac{e^{-x}}{(1+e^{-x})^{2}}, \\ \because \sigma(x) &= \frac{1}{1+e^{-x}}, \\ e^{-x} &= \frac{1 - \sigma(x)}{\sigma(x)}, \\ 1+e^{-x} &= \frac{1}{\sigma(x)}; \\ \therefore \frac{d}{dx}\sigma(x) &= \frac{\frac{1 - \sigma(x)}{\sigma(x)}}{(\frac{1}{\sigma(x)})^{2}} \\ &= (1 - \sigma(x)) \cdot \sigma(x) \end{align}$$