In the previous segment, you obtained the final data set. In this segment, you will build the model using this DataFrame. Let’s hear from Ajay how the model can be built:
The first step in the model building process is splitting the data set into train and test data. This can be done using the following code:
Example
training, test = model_data.randomSplit([.7, .3])Output
This way, you can split the 70% data for training the model and use the rest 30% to test the model. Now, in order to build the model, first, the logistic regression should be imported from the classification library provided in the pyspark.ml package.
Example
from pyspark.ml.classification import LogisticRegressionOutput
After importing, the logistic regression object can be created using the following code:
Example
lr = LogisticRegression(featuresCol='features', labelCol='label')Output
Now, the model can be trained on the training data using the following code:
Example
model = lr.fit(training)Output
So, now you have trained and obtained the model on the training data. In the next segment, Ajay will explain how to evaluate the model.