Model Building

In the previous segment, you obtained the final data set. In this segment, you will build the model using this DataFrame. Let’s hear from Ajay how the model can be built:

The first step in the model building process is splitting the data set into train and test data. This can be done using the following code:

Example

Python

training, test = model_data.randomSplit([.7, .3])

Output

This way, you can split the 70% data for training the model and use the rest 30% to test the model. Now, in order to build the model, first, the logistic regression should be imported from the classification library provided in the pyspark.ml package.

Example

Python

from pyspark.ml.classification import LogisticRegression

Output

After importing, the logistic regression object can be created using the following code:

Example

Python

lr = LogisticRegression(featuresCol='features', labelCol='label')

Output

Now, the model can be trained on the training data using the following code:

Example

Python

model = lr.fit(training)

Output

So, now you have trained and obtained the model on the training data. In the next segment, Ajay will explain how to evaluate the model.

Report an error