The Basics of Neural Networks - Part I
7 Steps To Build a Neural Network
By AcingAI |
Updated May 21th, 2021
Ever wondered how a Neural Network is made? Why is it even called like that? And when it comes to Machine Learning, how the hell do machines learn?!
This post is the first part of the Neural Network Basics series, in which we will answer all of the above questions!
We'll start by defining the terms used in the field, provide illustrations and cover the mathematics behind the different blocks which will construct our Neural Network
This post is the first part of the Neural Network Basics series, in which we will answer all of the above questions!
We'll start by defining the terms used in the field, provide illustrations and cover the mathematics behind the different blocks which will construct our Neural Network
Table Of Contents
- The Basic Building Block - a Neuron
- Larger Building Blocks – Walls?... Layers!
- Connecting The Dots – A Fully Connected Network
- A Key Ingredient – Non-linear Activation Function
The Basic Building Block - a Neuron
First thing first, in order to build a Neural Network, we would need to get a bunch of neurons. But what are those neurons exactly? The little cells in our brain that populate all that gray matter? Well, kind of.
A Biological Neuron
The most basic model of a biological neuron does 5 consecutive actions:
- The neuron first receives electrical signals from neighboring neurons
- The neuron will then increase/decrease the amount of neurotransmitters carrying these signals
- Then, the neuron sums over all of the received signals
- The neuron applies an activating function on that signal
- Finally, the neuron will release the output to other neurons
So how is it related to computer programs?
Well researchers have developed a mathematical model of the biological neuron, which is of course a simplification of the real process of how neurons communicate.
An Artificial Neuron
An artificial neuron works in a similar way, receiving an input vector from neighboring artificial neurons (1), multiplying those inputs in a weight vector (2), summing over the products (3), applying an activation function to that sum (4) and sending the signal to other artificial neurons (5).
The mathematical model of a neuron
The mathematical formula for a neuron will be
Where x is the input vector, w the weight vector, ϕ the activation function and y is the scalar output.
When the activation function is a simple threshold function as seen below:
then the simplified neuron is called a Perceptron.
Larger Building Blocks – Walls?... Layers!
Stacking a bunch of these Neurons together forms a layer, which is a larger building block of artificial neural networks.
Layers, a group of Neurons
A Layer in a neural network is the general name for all types of transformations done on the signal in a neural network, even when they are not built of neurons (we will learn about other types of layers in later posts, so stay tuned!).
Each neuron in the layer receives the same input vector, multiplies the input vector in its own weight vector, sums over the products, applies an activation function and sends the signal to the next layer.
Each neuron in the layer receives the same input vector, multiplies the input vector in its own weight vector, sums over the products, applies an activation function and sends the signal to the next layer.
So how does it look?
Layers in a neural network
Mathematically, we have an input vector, which is multiplied by multiple weight vectors, summed, and enters an activation function
Now y has become an output vector (scalar value for each neuron in the layer), W is now a weight matrix (a vector for each neuron in the layer), is a vector function, and x remains the same input vector.
Connecting The Dots – A Fully Connected Network
Now let’s arrange in parallel a few of these layers that we mentioned above, and in each layer connect every neuron to all other neurons in the adjacent layers.
A 3 layered neural network
How does it all work together?
The first layer of the network receives an input vector, which can represent anything from images to text or music.
The input vector enters the first layer where each neuron performs its own mathematical manipulation, and passes this data on to the next layer.
The next layer’s neurons do the same and the process is repeated until the very last layer, called the output layer (as it will calculate the network’s output).
The input vector enters the first layer where each neuron performs its own mathematical manipulation, and passes this data on to the next layer.
The next layer’s neurons do the same and the process is repeated until the very last layer, called the output layer (as it will calculate the network’s output).
In the end, does size matter?...
The size of the output layer, and therefore the amount of neurons it will contain, will be of the same size as the required output.
If for example we need to classify images with cats vs. images without cats, we'd like to receive only one number as an output. This number would correlate to the network’s certainty level whether a certain image contains a cat in it.
In this case we would need to build a binary classification network*.
*A Binary Classification Network - is a neural network whose purpose is to decide whether an input (like an image) can be classified as a certain class, or not.
If for example we need to classify images with cats vs. images without cats, we'd like to receive only one number as an output. This number would correlate to the network’s certainty level whether a certain image contains a cat in it.
In this case we would need to build a binary classification network*.
*A Binary Classification Network - is a neural network whose purpose is to decide whether an input (like an image) can be classified as a certain class, or not.
How many layers do I need?
The depth of a neural network is defined as the amount of layers it consists of. It's important to remember that this depth will change according to the problem we are aiming to solve.
A shallow network, a network with only a few layers, will be able to solve simpler problems, but will require little computational power to run and can provide an answer in a short time.
A deeper network, a network with more layers, will be able to solve much harder problems, but will take longer time to output an answer
Now that we’ve connected all those little neurons together into a large network (can you see the resemblance to the human brain yet?), is our network is finally ready to learn and solve problems? Almost!.
We have yet described what is exactly the aforementioned “activation function” which is the last piece needed before we are done constructing our network.
A shallow network, a network with only a few layers, will be able to solve simpler problems, but will require little computational power to run and can provide an answer in a short time.
A deeper network, a network with more layers, will be able to solve much harder problems, but will take longer time to output an answer
Now that we’ve connected all those little neurons together into a large network (can you see the resemblance to the human brain yet?), is our network is finally ready to learn and solve problems? Almost!.
We have yet described what is exactly the aforementioned “activation function” which is the last piece needed before we are done constructing our network.
A Key Ingredient – Non-linear Activation Function
So what is exactly an activation function and what should we bear in mind when adding this key ingredient to our Neural Network soup?
An activation function defines the output of a neuron, and is an essential concept in neural networks. As we explained above, the calculations a neuron or a layer will perform on the input vector are all linear, so if we were to forget adding an activation function, we would have performed a chain of linear calculations.
How would that look if we had for example a network of 3 layers?
An activation function defines the output of a neuron, and is an essential concept in neural networks. As we explained above, the calculations a neuron or a layer will perform on the input vector are all linear, so if we were to forget adding an activation function, we would have performed a chain of linear calculations.
How would that look if we had for example a network of 3 layers?
Which is still a linear operation. What does it mean? That all those layers were for nothing and we could have built a network that will act exactly the same, with only 1 layer.
Similarly, using a linear activation function will have the same effect.
To avoid such issue, and to allow the network to learn more complex functions, we will use a non-linear activation function at the end of each layer, then, all of them would contribute to the final output.
We will cover 2 types of activation functions - one to be used after the very last layer, and a second to be used after each and every layer that is not the last.
Say we are solving a classification problem, aiming to identify handwritten digits, and we have built a neural network for that purpose. We would like that eventually the network will output a digit which corresponds to the one in the image, that means that the output will be a single digit between 0-9.
However, what about a scenario where it is not clear whether the written digit is a 5 or a 6? Instead, we can have the output as the level of certainty (or the probability) that the picture contains any of the digits.
So we would basically need a vector with 10 cells (one for each digit), each containing the probability that the written digit is the same as the index of that cell.
For that purpose, we will use the Softmax function, defined as:
Similarly, using a linear activation function will have the same effect.
To avoid such issue, and to allow the network to learn more complex functions, we will use a non-linear activation function at the end of each layer, then, all of them would contribute to the final output.
We will cover 2 types of activation functions - one to be used after the very last layer, and a second to be used after each and every layer that is not the last.
Say we are solving a classification problem, aiming to identify handwritten digits, and we have built a neural network for that purpose. We would like that eventually the network will output a digit which corresponds to the one in the image, that means that the output will be a single digit between 0-9.
However, what about a scenario where it is not clear whether the written digit is a 5 or a 6? Instead, we can have the output as the level of certainty (or the probability) that the picture contains any of the digits.
So we would basically need a vector with 10 cells (one for each digit), each containing the probability that the written digit is the same as the index of that cell.
For that purpose, we will use the Softmax function, defined as:
where x is the vector input to the activation function, xi is the i-th component of the input.
Then, simply finding the index of the cell with the maximal value, will give us the digit whose most probably appearing in the image.
The second activation function we will discuss is the Hyperbolic Tangent, or tanh function. This function will be used between the layers, and is defined as:
Then, simply finding the index of the cell with the maximal value, will give us the digit whose most probably appearing in the image.
The second activation function we will discuss is the Hyperbolic Tangent, or tanh function. This function will be used between the layers, and is defined as:
The tanh function is non-linear, zero-centered, and has large gradients. It is not the best choice for our network, as we will cover in a later post about activation functions, but to keep things simple we will go with it.
Now we have finally built our first Artificial Neural Network! but it is still incapable of doing anything.. That is because we haven’t taught it what to do (remember, Machine Learning programs learn from examples).
We have covered how to build the architecture itself, and will be covering the rest in the second part of the post. Make sure you read it!
We have covered how to build the architecture itself, and will be covering the rest in the second part of the post. Make sure you read it!
Subscribe
Sign up with your email address to receive news and updates.