16
$\begingroup$

Short version:

I would like to calculate the expected value if you apply the sigmoid function $\frac{1}{1+e^{-x}}$ to a normal distribution with expected value $\mu$ and standard deviation $\sigma$.

If I'm correct this corresponds to the following integral:

$$\int_{-\infty}^\infty \frac{1}{1+e^{-x}} \frac{1}{\sigma\sqrt{2\pi}}\ e^{ -\frac{(x-\mu)^2}{2\sigma^2} } dx$$

However, I can't solve this integral. I've tried manually, with Maple and with Wolfram|Alpha, but didn't get anywhere.

Some background info (why I want to do this):

Sigmoid functions are used in artificial neural networks as an activation function, mapping a value of $(-\infty,\infty)$ to $(0,1)$. Often this value is used directly in further calculations but sometimes (e.g. in RBM's) it's first stochastically rounded to a 0 or a 1, with the probabililty of a 1 being that value. The stochasticity helps the learning, but is sometimes not desired when you finally use the network. Just using the normal non-stochastic methods on a network that you trained stochastically doesn't work though. It changes the expected result, because (in short):

$$\operatorname{E}[S(X)] \neq S(\operatorname{E}[X])$$

for most X. However, if you approximate X as a normal distribution and could somehow calculate this expected value, you could eliminate most of the bias. That's what I'm trying to do.

4 Answers 4

0

Here's a python function implementing the approximation. Requires numpy and scipy. I believe it's correct, but please do point out any errors. I found the probit approximation to be best, though I didn't properly compare over the space of means/variances.

def expected_sigm_of_norm(mean, std, method = 'probit'):     """     Approximate the expected value of the sigmoid of a normal distribution.      Thanks go to this guy:     http://math.stackexchange.com/questions/207861/expected-value-of-applying-the-sigmoid-function-to-a-normal-distribution      :param mean: Mean of the normal distribution     :param std: Standard Deviation of the normal distribution     :return: An approximation to Expectation(sigm(N(mu, sigma**2)))     """     if method == 'maclauren-2':         eu = np.exp(-mean)         approx_exp = 1/(eu+1) + 0.5*(eu-1)*eu/((eu+1)**3) * std**2         return np.minimum(np.maximum(approx_exp, 0), 1)     elif method == 'maclauren-3':         eu = np.exp(-mean)         approx_exp = 1/(eu+1) + \             0.5*(eu-1)*eu/((eu+1)**3) * std**2 + \             (eu**3-11*eu**2+57*eu-1)/((8*(eu+1))**5) * std**4         return np.minimum(np.maximum(approx_exp, 0), 1)     elif method == 'probit':         return norm.cdf(mean/np.sqrt(2.892 + std**2))     else:         raise Exception('Method "%s" not known' % method) 
9

Apart from the the MacLaurin approximation, the usual way to compute that integral in Statistics is to approximate the sigmoid with a probit function. More specifically $\mathrm{sigm}(a) \approx \Phi(\lambda a)$ with $\lambda^2=\pi/8$. Then the result would be: $$\int \mathrm{sigm}(x) \, N(x \mid \mu,\sigma^2) \, dx \approx \int \Phi(\lambda x) \, N(x \mid \mu,\sigma^2) \, dx = \Phi\left(\frac{\mu}{\sqrt{\lambda^{-2} + \sigma^2}}\right).$$

  • 1
    Unless I'm mistaken somewhere, $\lambda = \pi / 8$ [isn't a particularly good value of $\lambda$ to use](http://nbviewer.ipython.org/gist/dougalsutherland/8509978). The best value, in terms of max-norm convergence, seems to be about 0.588 (about $\pi / 5.35$).2014-01-19
  • 0
    I should also note for posterity that, since the equality wasn't obvious to me, [here's a nice proof](http://stats.stackexchange.com/a/61098/9964).2014-01-19
  • 3
    "x" should be "mu" at the end there, correct?2015-04-20
  • 0
    Could you post the result of the more general form $\int \text{sigm}(ax+b) N(x \vert \mu, \sigma^2) dx$ ?2018-07-08
5

I doubt that there's a closed-form solution. However, here's a series in powers of $\sigma$:

$$ \left( {{\rm e}^{-{\mu}}}+1 \right) ^{-1}+{\frac { \left( { {\rm e}^{-{\mu}}}-1 \right) {{\rm e}^{-{\mu}}}}{2\, \left( {{\rm e} ^{-{\mu}}}+1 \right) ^{3}}}{{\sigma}}^{2}+{\frac { \left( { {\rm e}^{-3\,{\mu}}}-11\,{{\rm e}^{-2\,{\mu}}}+11\,{{\rm e}^{-{ \mu}}}-1 \right) {{\rm e}^{-{\mu}}}}{8\, \left( {{\rm e}^{-{\mu} }}+1 \right) ^{5}}}{{\sigma}}^{4}+{\frac {{{\rm e}^{-{\mu} }} \left( {{\rm e}^{-5\,{\mu}}}-57\,{{\rm e}^{-4\,{\mu}}}+302\,{ {\rm e}^{-3\,{\mu}}}-302\,{{\rm e}^{-2\,{\mu}}}+57\,{{\rm e}^{-{ \mu}}}-1 \right) }{48\, \left( {{\rm e}^{-{\mu}}}+1 \right) ^{7}}}{{ \sigma}}^{6}+{\frac {{{\rm e}^{-{\mu}}} \left( {{\rm e}^{-7\,{\mu}}}-247\,{{\rm e}^{-6\,{\mu}}}+4293\,{ {\rm e}^{-5\,{\mu}}}-15619\,{{\rm e}^{-4\,{\mu}}}+15619\,{ {\rm e}^{-3\,{\mu}}}-4293\,{{\rm e}^{-2\,{\mu}}}+247\,{{\rm e}^{ -{\mu}}}-1 \right) }{384\, \left( {{\rm e}^{-{\mu}}}+1 \right) ^{9}}} {{\sigma}}^{8}+O \left( {{\sigma}}^{10} \right) $$

EDIT: To obtain this, first do the change of variables $x = \mu + \sigma t$. The integral becomes $$ \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty \dfrac{e^{-t^2/2}}{1 + e^{-\mu - \sigma t}}\ dt $$ Now take the Maclaurin series $$\frac{1}{1+e^{-\mu - \sigma t}} = \frac{1}{1+e^{-\mu}} + \frac{e^{-\mu} \sigma t}{(1+e^{-\mu})^2} + \frac{e^{-\mu} ( e^{-\mu} - 1) \sigma^2 t^2}{(1+e^{-\mu})^3} + \ldots$$ and integrate term by term.

  • 2
    Hi Robert, could you comment on the derivation of that series? Where does it come from?2012-12-18
5

Since I do not have enough reputation to comment, I'll instead add a new answer. @korkinof's answer is almost correct. The final integral evaluates to the following: \begin{equation} \int_x sigmoid(x) \mathcal{N}(x; \mu, \sigma^2) \approx \int_x \Phi(\lambda x) \mathcal{N}(x; \mu, \sigma^2) = \Phi\left(\frac{\lambda \mu}{\sqrt{1 + \lambda^2 \sigma^2}}\right). \end{equation} I verified my answer through simulation.

  • 0
    This is an old post with an accepted answer. So I don't think this post was necessary.2016-11-01
  • 0
    @Smoke did explain his / her reasons for posting the answer, and the seem compelling. For example, the answer fixes a bug in korkinof's.2017-11-21
  • 0
    Could you post the result of the more general form $\int \text{sigm}(ax+b) N(x \vert \mu, \sigma^2) dx$ ?2018-07-08