5
$\begingroup$

I'm doing a homework assignment about neural networks and it suggests that it is somehow possible to merge 2 summations. In which, one contains the other. I thought that this was not possible..

Also I tried finding an appropriate tag but there were none :(.

Here's the question:

Suppose you had a neural network with linear activation functions. That is, for each unit the output is some constant c times the weighted sum of the inputs.

Assume that the network has one hidden layer. For a given assignment to the weights w, write down equations for the value of the units in the output layer as a function of w and the input layer x, without any explicit mention of theb output of the hidden layer. Show that there is a network with no hidden units that computes the same function.

To explain this for those who don't know neural networks. A neural network has a set if inputs and a set of hidden layers. Those layers have a number of nodes. The first hidden layer is connect to the second and the second to the third and finally the last layer is connected to the output layer.

The input layer has a certain number of nodes. Every input is connected to each node in the first hidden layer so that each node in the hidden layer takes all the inputs as an input.

Each input to a node on a hidden layer has a weight determining how much influence an input has on a node in the hidden layer.

The node in the hidden layer then applies an activation function (in my case a linear one) to the summation of the weighted inputs.

This linear activation function would look as follows:

$ c \sum\limits_{i=1}^n w x_{i} $

Where c is the constant, n is the number of nodes in the previous layer, w is the weight of the input x.

This is what the neural network would look like if there where no hidden layers but just the input and the output layer. Where x denotes an input from the input layer.

Now if we would take a neural network with one hidden layer it would look like this:

$ c \sum\limits_{j=1}^m w_{2} (c \sum\limits_{i=1}^n w_{1} x_{i})_{j} $

As you can see this applies the linear activation function of a hidden layer and sets it as the input x of the output layer.

I hope this is clear so far.

Now for my question. It is suggested that these summations are some how merge-able into one summation because it is suggested that there is a neural network NO hidden units that achieves the same.

I have virtually no experience using summation (I know what they do but not how they interact with other summations).

This is what I got using my basic understanding:

$ c^2 \sum w_{2} (\sum w_{1} x) $

The only constant in the equation is of course c so that is the only one I can move through the summations as far as I know.

So basically how do I rewrite equation two into the form of equation one?

Thanks in advance, Rope.

  • 0
    What do you mean by that? Is it a hint towards an answer or just a question? Because the weights functions indeed do not do anything special.2012-06-07

1 Answers 1

0

My knowledge of neural networks is minimal, so take this with a grain of salt. Also, I may have done it in more generality than the problem requires.

I’m assuming $n$ inputs and $m$ nodes in the hidden layer. I’m assuming that each node in the hidden layer has its own linear activation function, with its own constant and weights; the function for node $k$ is

$y_k=c_{1k}\sum_{i=1}^nw_{1ki}x_i\;,\tag{1}$

with constant $c_{1k}$ and weight $w_{1ki}$ for input $x_i$. Similarly, I’m assuming that each node in the output layer has its own linear activation function, with its own constant and weights; the function for node $j$ is

$z_k=c_{2j}\sum_{k=1}^nw_{2jk}y_k\;,\tag{2}$

with constant $c_{2j}$ and weight $w_{2jk}$ for input $y_k$. Substituting $(1)$ into $(2)$, we get

$\begin{align*} z_j=c_{2j}\sum_{k=1}^mw_{2jk}y_k&=c_{2j}\sum_{k=1}^mc_{1k}w_{2jk}\sum_{i=1}^nw_{1ki}x_i\\ &=c_{2j}\sum_{i=1}^n\left(\sum_{k=1}^mc_{1k}w_{2jk}w_{1ki}\right)x_i\;; \end{align*}$

this expresses the output $z_j$ directly as a linear activation function of the inputs $x_i$, with constant $c_{2j}$ and weight

$\sum_{k=1}^mc_{1k}w_{2jk}w_{1ki}$

for input $x_i$.

This can all be done more easily with matrices. If there are $r$ output nodes, let

$W_1=\pmatrix{c_{11}w_{111}&\dots&c_{11}w_{11n}\\\vdots&\ddots&\vdots\\c_{1m}w_{1m1}&\dots&c_{1m}w_{1mn}}\text{ and }W_2=\pmatrix{c_{21}w_{211}&\dots&c_{21}w_{21m}\\\vdots&\ddots&\vdots\\c_{2r}w_{2r1}&\dots&c_{2r}w_{2rm}}\;,$ and let $X=\pmatrix{x_1\\\vdots\\x_n},Y=\pmatrix{y_1\\\vdots\\y_m},\text{ and }Z=\pmatrix{z_1\\\vdots\\z_r}\;.$ The matrices $W_1$ and $W_2$ incorporate all of the information contained in the activation functions. Then $Y=W_1X$ and $Z=W_2Y$, so $Z=W_2(W_1)X=(W_2W_1)X$; i.e., the product matrix $W_2W_1$ similarly incorporates the constants and weights expressing the output $Z$ in terms of the input $X$.

  • 0
    I get it. Thank you :).2019-02-09