Neural networks require rigorous training, but what does it mean to train neural networks? What are the parameters that the network learns during training, and what are the hyperparameters that you (as the network is designed) need to specify beforehand?
Recall that models such as linear regression and logistic regression are trained on their coefficients, i.e., the task is to find the optimal values of the coefficients to minimize a cost function.
Neural networks are no different; they are trained on weights and biases.
In this segment, you will be introduced to the parameters that are learned during neural network training. You will also develop a broad understanding of how the learning algorithm works. Let’s get started by watching the upcoming video.
During training, the neural network learning algorithm fits various models to the training data and selects the best prediction model. The learning algorithm is trained with a fixed set of hyperparameters associated with the network structure. Some of the important hyperparameters to consider to decide the network structure are given below:
- Number of layers
- Number of neurons in the input, hidden and output layers
- Learning rate (the step size taken each time we update the weights and biases of an ANN)
- Number of epochs (the number of times the entire training data set passes through the neural network)
The purpose of training the learning algorithm is to obtain optimum weights and biases that form the parameters of the network.
Note: You will learn about hyperparameters such as learning rate and the number of epochs in the subsequent session. In this session, we will focus on the number of layers and the number of neurons in each layer.
The notations that you will come across going forward are as follows:
- W represents the weight matrix.
- b stands for bias.
- x represents the input.
- y represents the ground truth label.
- p represents the probability vector of the predicted output for the classification problem. hL represents the predicted output for the regression problem (where L represents the number of layers).
- h also represents the output of the hidden layers with appropriate superscript. The output of the second neuron in the nth hidden layer is denoted by hn2.
- z represents the accumulated input to a layer. The accumulated input to the third neuron of the nth hidden layer is zn3.
- The bias of the first neuron of the third layer is represented as b31.
- The superscript represents the layer number. The weight matrix connecting the first hidden layer to the second hidden layer is denoted by W2.
- The subscript represents the index of the individual neuron in a given layer. The weight connecting the first neuron of the first hidden layer to the third neuron of the second hidden layer is denoted by w231.
Having understood these notations, let’s reinforce by answering the questions given below.
You might want to look at how the inputs of the first data point x1 are represented. This will help you in answering the questions.
So far, you have come across simple neural networks and have computed the outputs for them, but this is not the case with real-world applications. At times, the neural networks can be highly complex and large. Therefore, you will need some assumptions to make them easier to understand. You will learn about these assumptions in the next segment.
Report an error