Chapter 1: Basic functioning of a simple feed-forward neural network
I hope you liked my previous post: Artificial neural networks and the magic behind-Introductory Chapter
The understanding of the different architectures of artificial neural networks like hopfield networks, recurrent networks, bidirectional associative memory networks, etc, it is always helpful to have a complete knowledge of how a simple feed forward network functions. In this chapter I will go through a short demonstration of a small neural network with back-propagation based supervised learning.
There are two ways of making any neural network understand the problem, supervised and unsupervised. For supervised learning, the network is taught to adapt to a solution on the basis of training dataset. Suppose our task is to recognize faces, and this will be the problem statement considered all over this chapter. It can be understood that we have to classify an image into either one of the “face” or “non-face” class, so the number of neurons in the output layer comes out to be two. To keep things simple lets keep the outputs in such a way that if the image is “face” my neural network should output a quantity greater for the “face” neuron than than other.
Let’s say we have this network with us (click on it to enlarge)
Our network consists of three layers, input, hidden and output. A hidden layer can be considered as a layer present inside the network to enhance, or rather manipulate, the working of the network. The number of hidden layers and the number of neurons in them is a matter subject to the application being developed. Our sample network here has 4 neurons in input layer, 3 in the hidden and 2 in the output. Though not drawn in the figure, each neuron in the input layer is connected to every neuron and in the hidden layer, and similarly the connections are designed between hidden and output layers. This is a small example of a feed forward neural network, as the name suggests, all the data flows in the forward direction.
Now the process will be explained with a training example. Suppose for a particular set of input vector, say [0.1,0.2,0.3,0.4] I need an output vector of [0.9,0.1]. For that, only thing that we will be modifying is the weights here, there are a few more stuffs called as threshold, bias, etc which will be dealt later. For every connection a weight exists, so it not that tough to come up to the conclusion that the weight matrix between input layer to hidden is of size 4×3 and the one between hidden to output is of size 3×2. And for simplification lest say that all these weights at the start of training had a value of 0.5.
weight matrix 1: [ (0.5, 0.5, 0.5),
(0.5, 0.5, 0.5),
(0.5, 0.5, 0.5),
(0.5, 0.5, 0.5) ]
weight matrix 2: [ (0.5, 0.5),
Take row1 of matrix 1 and select column1 of that row1, that value is the weight between neuron i1 to neuron h1, similarly the others.
Now take the matter between input and hidden layers, we multiply our input vector with the weight matrix 1 to get a summing vector( remember the nucleus from previous post) as [0.5, 0.5, 0.5], 3 members for the 3 neurons of the hidden layer . Now comes the role of activation function. For this tutorial I have taken it to be a sigmoid function ( all different types of activation functions will be mentioned later ) and the function goes as follows:
f(x) = 1/(1+exp(-x)), that gives me an output between 0 and 1. So when we pass this summing vector through the activation function we have the values as [0.622, 0.622 0.622]. This vector now becomes the input for the hidden layer. This vector will then be multiplied with weight matrix 2 and again passed though the same activation function to get the output vector, and here we get [0.717, 0.717]. But our desired output vector was [0.9, 0.1].
Here starts the back propagation algorithm for modifying the weights to get as close as possible to the desired output in a finite number of attempts. Lets understand this by targeting neuron h1, i.e., modifying the weights from h1 to o1 and h1 to o2.
a) Calculate the difference in desired output and actual output, [0.9-0.717, 0.1-0.717] = [0.183, -0.617]
b) Now design a new vector such that, the first component of the difference vector is multiplied by the first component of actual vector and its compliment, i.e, 0.183 * 0.717 * (1-0.717), thus we get this new vector as [0.037,-0.125]
c)Now, from our previous results we know, that h1 had an input of 0.622. There now, we multiply this with a learning_rate parameter( will be discussed in detail as it plays a very important role in training), say 0.2 here, we get 0.1244.
d) The above constant now is multiplied with the new vector obtained in (b), to get [0.0046, -0.0155] as the error vector for this hidden layer neuron h1. Thus the new weights popping out of h1 become [0.5+0.0045,0.5-0.0155] = [0.5045, 0.4845]
Similarly you repeat this for all the hidden layer neurons. Now this error has to propagate further backwards to input layer. Consider the same neuron h1,
First we calculate $error, representing output of h1,
a)Get the summation after multiplying each element of the vector in (b) of previous step sequence with each element of weight row matrix 2 corresponding to h1, i.e., row 1 of weight matrix 2. We get 0.5*0.037 + 0.5*(-0.125) = -0.044.
b) The $error for h1 = the value from (a) * (input of h1) * ( compliment of input of h1)
= -0.044*0.622*(1-0.622) = -0.0103
Second, choose an input neuron, say i1.
a) Get error in weight for the weight between i1 and h1, like, learning_rate*(input for i1)*($error of h1 calculated just above)
Thus error for input = 0.2*0.1*(-0.0103) = -0.0002
The new weight between i1 and h1 = 0.5-0.0002 = 0.4998.
Similarly it is done for other weights. Remember $error is for propagating the value “error” backwards.
Now after we modify all the weights, we calculate the new actual output and re-run the process for some finite number of times.
After that finite number of iterations, next sample is put into it with the first weights as the trained weights from the first sample and the process again continues for some finite number of inputs.
One such, a much bigger network, I trained gave me some decent results as shown below.
As it can be seen, the terminal says “Trained output for image = 0.872239 0.164415”, for the two ouput neurons of which the first one is a “face” neuron.
I hope you understood the basic working. Stay tuned for next chapters. 🙂