For customising your model and how to act on your data set, you have to choose an NLU pipelines.
The two most important pipelines are:
- supervised_embeddings
- pretrained_embeddings_spacy
The biggest difference between them is that the pretrained_embeddings_spacy pipeline, as the name suggests, uses pre-trained word vectors from either GloVe or fastText, whereas the supervised_embeddings pipeline feeds specifically on your data set and doesn’t use any pre-trained word vectors.
So it is generally recommended that you use the pretrained_embeddings_spacy pipeline if you have less than 1,000 total training examples, and there is a spaCy model for your language. Or use the supervised_embeddings pipeline if you have more than 1,000 examples.
You can specify the pipeline by defining them in config.yml.
To use the supervised_embeddings pipeline, the template is:
language: "en"
pipeline: "supervised_embeddings"
To use the pretrained_embedding _spacy pipelines,the template is
language: "en"
pipeline: "pretrained_embeddings_spacy"
You can read more here on choosing the pipeline for your NLU model.