Different Activation Functions

As mentioned in one of the previous segments, in the case of ANNs, the activation functions are non-linear. In this segment, you will learn about these non-linear activation functions. But before you explore the different activation functions for ANNs, let’s watch the next video as Professor Srinivasaraghavan revises the concept of non-linearity.

The image provided below shows the graphical representation of a linear function and one of the possible representations of a non-linear function.

The activation functions introduce non-linearity in the network, thereby enabling the network to solve highly complex problems. Problems that take the help of neural networks require the ANN to recognise complex patterns and trends in the given data set. If we do not introduce non-linearity, the output will be a linear function of the input vector. This will not help us in understanding more complex patterns present in the data set.

For example, as we can see in the image below, we sometimes have data in non-linear shapes such as circular or elliptical. If you want to classify the two circles into two groups, a linear model will not be able to do this, but a neural network with multiple neurons and non-linear activation functions can help you achieve this.

Let’s learn about the various types and properties of common activation functions and understand how to choose the correct activation function.

While choosing activation functions, you need to ensure that they are:

Non-linear,
Continuous, and
Monotonically increasing.

The different commonly used activation functions are represented below.

The features of these activation functions are as follows:

Sigmoid: When this type of function is applied, the output from the activation function is bound between 0 and 1 and is not centred around zero. A sigmoid activation function is usually used when we want to regularise the magnitude of the outputs we get from a neural network and ensure that this magnitude does not blow up.
Tanh (Hyperbolic Tangent): When this type of function is applied, the output is centred around 0 and bound between -1 and 1, unlike a sigmoid function in which case, it is centred around 0.5 and will give only positive outputs. Hence, the output is centred around zero for tanh.
ReLU (Rectified Linear Unit): The output of this activation function is linear in nature when the input is positive and the output is zero when the input is negative. This activation function allows the network to converge very quickly, and hence, its usage is computationally efficient. However, its use in neural networks does not help the network to learn when the values are negative.
Leaky ReLU (Leaky Rectified Linear Unit): This activation function is similar to ReLU. However, it enables the neural network to learn even when the values are negative. When the input to the function is negative, it dampens the magnitude, i.e., the input is multiplied with an epsilon factor that is usually a number less than one. On the other hand, when the input is positive, the function is linear and gives the input value as the output. We can control the parameter to allow how much ‘learning emphasis’ should be given to the negative value.

In the next video, you will learn how to compute the output of a neuron, given the inputs, weights, biases and the sigmoid activation function.

Now that you have an idea of how to compute the output of a neuron using an activation function, try to answer the questions given below.

Having explored the key components in building the architecture of ANNs, let’s now understand how neural networks are trained and used to make predictions. In the next segment, you will learn about the hyperparameters and parameters of neural networks.

Report an error