Learning Rate

In the previous lecture, we learned that it’s easy to go for a closed form solution than using an iterative form solution (gradient descent) to optimise the cost function.

But the example we looked was a simple 1D example, what will happen if the cost function is in higher dimensions?

To summarise, though gradient descent looks complicated for a 1D function, it’s easier to compute the optimal minima using gradient descent for higher dimension function. We will be using it in logistic regression and even for neural networks.

In the next lecture, we will learn how the value of learning rate, i.e. η may affect the optimal solution. Let’s look at this in detail.

Note

At 2:05, the professor writes -1.2. Please note that it is a slight calculation error; it should be 0.4.

To summarise, a large value of learning rate may oscillate your solution, and you may skip the optimal solution (global minima). So it’s always a good practice to choose a small value of learning rate and slowly move towards the negative of the gradient.

Report an error