Once you have transformed and formatted a data set to tf.dataset(), we apply it to the model to start training. However, we first need to configure the model to the desired standards. Let’s watch the next video to find out how to do this.
The model can be downloaded using the checkpoint defined earlier.
Example
model =
TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels = num_labels)
Output
After downloading the model, we define the hyperparameters for it.
num_epochs = 3
num_train_steps = len(tf_train_dataset) * num_epochs
lr_scheduler = PolynomialDecay(
initial_learning_rate=5e-5, end_learning_rate=0.0, decay_steps=num_train_steps, power = 2
)
opt = Adam(learning_rate=lr_scheduler)
loss = SparseCategoricalCrossentropy(from_logits=True)
Here, we have defined the total epochs and the optimizer using a custom learning rate scheduler.
The learning rate scheduler is a callback API that updates the learning rate at each interval with a specific decay rate. When combined with the optimizer, this callback tracks the current learning rate and returns a new learning rate to the optimizer.
You can see how the learning rate decays as each epoch progresses.
You may have noticed how a learning rate that started from 10^-5 decays to 0 at the end of the 35000 epoch. The degree of this decay can be customized using the power parameter, which is set to 2 in this case.
Once the optimizer is defined, we can compile the model with the desired performance metrics, as done below.
Example
model.compile(optimizer=opt, loss=loss, metrics=["accuracy"])
#Once the model is compiled, we can train it for 3 epochs.
model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=num_epochs)
Output
You can load the trained model using the .from_pretrained() function.
trained_model = TFAutoModelForSequenceClassification.from_pretrained('/content/drive/MyDrive/saved_model_epoch2/',num_labels = num_labels)
After loading the model, we can apply a custom function to infer the performance of the model on a custom input.
def check_similarity(question1, question2):
tokenizer_output = tokenizer(question1, question2, truncation=True, return_token_type_ids=True, max_length = 75, return_tensors = 'tf')
logits = trained_model(**tokenizer_output)["logits"]
predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0])
if predicted_class_id == 1:
return "Both questions mean the same"
else:
return "Both the questions are different."
Once the function is defined, we can pass in custom inputs.
Example
check_similarity("Why are people so obsessed with cricket?", "Why are people so obsessed with football?")
Output
Although both sentences are of the same length and mostly the same words, the context is of two different sports. The trained model still understands the context and returns the following output.
Both questions are different.
When we change the input to :
Example
check_similarity("Why are people so obsessed with cricket?", "Why do people like cricket?").
Output
The model understands that both questions talk about the same context and thus returns the following output.
Both the questions are same
Great! With this, we wrap up the use case for fine-tuning a BERT model for performing sentence-pair similarity.