Artificial intelligence is growing rapidly, particularly in the domain of language understanding and will continue to expand. With the voluminous amount of data being generated daily, building a model that can give the best results is not a challenge now. However, inventing new techniques to understand the complexities of human language language itself is.
As discussed in the previous module, where you learnt about attention-based encoder-decoder architecture, the biggest challenge when if come to the human language is understanding the context. The attention models were able to decode it ; however , they are not efficient when we pass long sentences. In such a scenario, transformer models are a game changer! It can not only understand all the complexities of language but can process all the information faster than any other model. Therefore, the architecture of a transformer is both innovative and disruptive, leaving previously known models such as RNN(Recurrent Neural Network) and CNN(Convolutional Neural Network) in the past.
Natural language understanding is a core part of the NLP tasks you have already learnt about so far, such as text classification text generation and machine translation. The transformer architecture can solve all these problems and give you a state of art results. You may have heart of many of its variants, such as GPT-3 (Generative pre-trained Transformer) and BERT(Bidirectional Encoder Representations), and seen its mind-boggling performance. The architecture of transformers is path-breaking and, thus, has led to the development of many innovative models such as CLIP(Contrastive Language-Image Pretraining),and DALL.E, which, as their creators claim, is ‘a step toward systems with a deeper understananding of the world.
The architecture of a transformer is also modified to many modalities other than text, such as image, video and audio. However, in this module, you will learn about the said architecture in relation to the domain of language understanding only. The content of this module will help you understand transformer architecture in detail and will also help change your perspective on NLP entirely.
In the following video, Ankush will give you an overview of what this module has in store for you.
This module consists of two sessions. The first session will introduce you to the architecture of a transformer and will help you assemble the individual components inside it By the end of this session, you will be able to understand how the transformer model work and what makes it one of the most innovative architectures in the field of NLP.
The second session goes beyond the theoretical aspects of transformer architecture. It will help you understand how to use a model from the Hugging Face library and the various functions that the library provides. By the end of this session, you will be able to fine-tune a transformer model for a custom use case.
By the end of this module, you will be able to:
- Understand the inner working of the transformer architecture and its variants.
- Apply the pipeline() function to solve NLP tasks such as text generation and classification.
- Configure both tokenizer and transformer models to perform an NLP tast.
- Fine-tune a transformer model for a custom use-case related to a Quora question-pair similarity
In the next video, we will take a look at an interesting conversation between a human and a human and a transformer model to understand the extent of their development.
Question: What is your favourite animal?
Ans: My favourite animal is a dog .
Question: Why?
Ans: Because dogs are loyal and friendly.
Question: What are two reasons that a dog might be in a bad mood?
Ans: Two reasons that a dog might be in a bad mood are if is hungry or if it is hot.
Question: How many eyes does a giraffe have?
Ans: A giraffe has two eyes.
Question: How many legs does a frog have?
Ans: A frog has four legs.
Question: Are there any animals with three legs?
Ans: No, there are no animals with three legs.
Question: Why don’t animals have three legs?
Ans: Animals don’t have three legs because they would fall over.
A snippet of an entire conversation between a human and a transformer model
Pre-requisites of this module
This course will cover one of the advanced language models; therefore, it requires you to have a sound understanding of the following:
- Concepts taught earlier in the NLP course, such as textccleaning, tokenization, and text-preprocessing.
- Encoder-decoder architecture and how attention mechanisms help improve its performance.
Guidelines for in-module questions
The in-video and in-content questions for this module are not graded.
People you will hear from in this session
Subject Matter Expert
Ankushb Chopra
Sr Manager, Tredence
Ankush has 10+ years of hands-on Data Science experience delivering data and ML solutions for Fortune 500 clients across the Telecom, Retail, Healthcare, and Finance industries. He currently works as a Senior Manger(AI-Centre of Excellence) in Tredence.
To understand the architecture of Transformers, let’s go through the next segment and see how it was created.