RMS Prop – IKH

In the previous segment, you learned about the Adagrad optimizer. Adagrad leads to shutting down the updates to some of the parameters if the gradients become too big. In this segment, you’ll understand why this happens and how RMS Prop tends to solve this problem.

RMS Prop is the abbreviated form of Root Mean Square Prop optimiser. It was proposed by Geoff Hinton. Let us hear about this optimised from Usha in the upcoming video.

RMS Prop aims to solve the problem faced in adagrad. Let us understand how it is done by revisiting the implementation you saw in the video above. The algorithm for RMS Prop can be written as follows:

On iteration t:
1. Compute dW, db for the given mini-batch
2. SdW=β∗SdW+(1−β)dW2
3. Sdb=β∗Sdb+(1−β)db2
4. Update the parameters: Wnew=Wold−α∗dWSdW and bnew=bold−α∗dbSdb

where α and β are hyperparameters.

Instead of directly storing the squares of gradients as you saw in Adagrad, the RMS Prop optimiser takes care of the diminishing learning rate problem by considering the average of previous gradient squares in the dbSdb term.

However, RMS Prop suffers from the problem of manually defining the hyperparameters which might lead to slow convergence. Let us see how to solve this issue by learning about our last optimiser: Adam.

Report an error