IKH

Fine-Tuning BERT Model – Part 2

Once you have transformed and formatted a data set to tf.dataset(), we apply it to the model to start training. However, we first need to configure the model to the desired standards. Let’s watch the next video to find out how to do this.

The model can be downloaded using the checkpoint defined earlier.

Example

Python
model =
TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels = num_labels)

Output

After downloading the model, we define the hyperparameters for it.

PowerShell
num_epochs = 3
num_train_steps = len(tf_train_dataset) * num_epochs
lr_scheduler = PolynomialDecay(
    initial_learning_rate=5e-5, end_learning_rate=0.0, decay_steps=num_train_steps, power = 2
)
opt = Adam(learning_rate=lr_scheduler)
loss = SparseCategoricalCrossentropy(from_logits=True)

Here, we have defined the total epochs and the optimizer using a custom learning rate scheduler.

The learning rate scheduler is a callback API that updates the learning rate at each interval with a specific decay rate. When combined with the optimizer, this callback tracks the current learning rate and returns a new learning rate to the optimizer.

You can see how the learning rate decays as each epoch progresses.

You may have noticed how a learning rate that started from 10^-5 decays to 0 at the end of the 35000 epoch. The degree of this decay can be customized using the power parameter, which is set to 2 in this case.

Once the optimizer is defined, we can compile the model with the desired performance metrics, as done below.

Example

Python
model.compile(optimizer=opt, loss=loss, metrics=["accuracy"])
 
#Once the model is compiled, we can train it for 3 epochs.
 
model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=num_epochs)

Output

You can load the trained model using the .from_pretrained() function.

PowerShell
trained_model = TFAutoModelForSequenceClassification.from_pretrained('/content/drive/MyDrive/saved_model_epoch2/',num_labels = num_labels)

After loading the model, we can apply a custom function to infer the performance of the model on a custom input.

PowerShell
def check_similarity(question1, question2):
  tokenizer_output = tokenizer(question1, question2, truncation=True, return_token_type_ids=True, max_length = 75, return_tensors = 'tf')
  logits = trained_model(**tokenizer_output)["logits"]
  predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0])
  if predicted_class_id == 1:
    return "Both questions mean the same"
  else:
    return "Both the questions are different."

Once the function is defined, we can pass in custom inputs.

Example

Python
check_similarity("Why are people so obsessed with cricket?", "Why are people so obsessed with football?")

Output

Although both sentences are of the same length and mostly the same words, the context is of two different sports. The trained model still understands the context and returns the following output.

PowerShell
Both questions are different.

When we change the input to :

Example

Python
check_similarity("Why are people so obsessed with cricket?", "Why do people like cricket?").

Output

The model understands that both questions talk about the same context and thus returns the following output.

PowerShell
Both the questions are same

Great! With this, we wrap up the use case for fine-tuning a BERT model for performing sentence-pair similarity.

Report an error