PK: Implement a Simple Machine Learning Model in Python
SK:
Machine learning Python code example
Machine learning algorithms in Python code
Machine learning with Python
machine learning model in Python
Implementing a Simple Machine Learning Model in Python
Introduction
So, you’ve decided to build a machine learning model in Python for your final year project or
just out of curiosity for Python and Artificial Intelligence. Now the next step in your journey
is to get thorough knowledge about Python and Machine Learning to implement a simple
yet useful machine learning model in Python. And to get familiar with such concepts, follow
this guide to know everything about machine learning models, google collab and much more
exciting things coming up!
What is Machine Learning? 150 words
We need to get our basics clear as to what is Machine Learning and why we are building an
ML model. So, let’s answer the burning question, what is machine learning? Machine
Learning is trying to mimic the way humans learn by making computers understand data
using statistics and find out patterns and recognize them in completely new data, thus
learning on the way. This field of AI is ever growing with new ML algorithms making boring
calculation easier, thus helping humans think better. Then what is this ML model we’re
talking about from some time. Let’s decode that.
Machine learning model is the program that takes decisions by recognizing data based upon
the training provided to it to understand unseen data. In this guide we are going to
implement a simple machine learning model using Python which is the most preferred
language to do so. But why? Let’s understand!
Why use Python for Machine Learning? 100 words
We always associate machine learning with python and say that we will build a machine
learning model using python, but why is that?
Because Python is simple and readable. It comes with a ton of useful ML libraries that makes
building ML models a piece of cake. Moreover, developer around the world find it easy to
express their ideas in Python as it’s very readable and easy to maintain. And this comes
handy when beginners are trying to enter the field of AI and they stat studying ML models
and being written in Python, it becomes easier to learn and implement it.
Furthermore, they can add certain features or fix bugs in starting of their journey as an ML
engineer as learning ML libraries by Python gives you the confidence you need to create
something new.
Implementing a Simple Machine Learning Model in Python
To implement a machine learning model using python, we need to follow some steps that
will ensure our model works and is accurate. So, let’s start with importing few libraries that
will make our work easier:
Importing the necessary libraries
We’ll import some Python libraries like pandas for reading the set of data, matplotlib to plot
some graphs to understand how our data is distributed and the scikit learn to use the
prebuilt programs of certain models whose source code might be trickier for us. Let’s ru the
below code to import all the necessary libraries.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
Loading the dataset
The next step is to learn as much as you can about the data so you can feed the right
amount of data on which your machine learning model will train itself. For that to happen
we have to load the data first. Run the code below to load the dataset of House Rent that
details some exciting trends.
df = pd.read_csv('House_Rent_Dataset.csv')
In the above code we use the read_csv method of Pandas to read our dataset file. Let’s see
some of the rows and columns using the code below:
df.head()
In the output you’ll find various factors that influence the House Rent and today we’ll be
studying some of the factors and train our model to predict the Rent based upon those
factors.
Understanding the dataset
This is a crucial step when building a simple machine learning model in python. The more
you understand the dataset yourself the better you can make your machine learning model
learn it. Let’s see how many rows and columns does the dataset contain using the shape
method.
df.shape
As you can see there are 4746 rows and 12 columns of data. To gain more information
about the dataset we’ll use the info method
df.info()
There is no missing values in the dataset and to be more sure of it you can search for all the
null values if present using the isnull() method
Here you can see the places where there is a data value present is showing false as it’s not
null. If you even have some missing data, to get the exact count of how much data is missing
from your dataset, you can use the following code
df.isnull().sum()
Since, it’s confirmed that there’s no missing data, we can finally focus on what type of data
is present in the columns as it’s very important to train an ML model effectively. We are
going to use the dtypes method of pandas to do that.
df.dtypes
Data preprocessing
To implement machine learning models in python effectively, we have to make the data
simpler for the model to learn from. For that we need to convert all the object types to
category types. In this model, we ae going to focus on how House Rent is influenced by
three factors, namely, Furnishing Status, Tenant Preferred and Area Type and hence we
convert only those to category types to train our model better.
df['Furnishing Status'] = df['Furnishing Status'].astype('category')
df['Tenant Preferred'] = df['Tenant Preferred'].astype('category')
df['Area Type'] = df['Area Type'].astype('category')
Let’s verify if we were actually able to change the types by using dtypes
df.dtypes
As you can see, we’ve successfully converted the three significant factors to category type.
Moreover, we don’t need other columns of object types as it might interfere woth the
training of our model so let’s remove all such irrelevant columns.
df.pop('Posted On')
df.pop('Floor')
df.pop('Area Locality')
df.pop('City')
df.pop('Point of Contact')
Now let’s know some trends inside the dataset itself using the describe method of pandas.
df.describe().T
We did the transpose to see the results better for each column.
Data visualization
This step is crucial to visualize the data to notice some underlying patterns and to
understand how the data is divided. We are going to plot some graphs using the Python
library matplotlib to visualize the data.
plt.figure(figsize = (8, 5))
colors = ['#FF1E00', '#A66CFF', '#EAE509', '#D61C4E', '#3CCF4E', '#3AB4F2']
df["Furnishing Status"].value_counts().plot(kind = 'bar', color = colors, rot = 0)
We have visualized how the data is divided under the Furnishing Status column and we can
see clear distinction between the three types of furnishing statuses, semi-furnished,
unfurnished and furnished. Before moving on to train our machine learning model in
python, we need to perform the last step of one hot encoding. This step is very crucial to
train our ML model.
One Hot Encoding
We are going to perform one hot encoding to categorical variables in the dataset using the
get_dummies method in Pandas.
df = pd.get_dummies(df)
Using the columns attribute we can see that columns have been divided into few categories.
df.columns
Building a regression model
We have finally reached to the step where we can build our simple machine learning model
in python. We are going to build a Linear Regression model but to do that we’ll have to
understand a little bit about input and output variables. When an ML model trains itself, it
accepts some features (columns) as input called input variables to recognize some
underlying patterns and then predict some features called output variables. So here we are
considering Rent as the output variable and other columns as input variables. So we need to
assign the Rent column to variable y as the output variable.
y = df["Rent"]
To create the input variable X, we’ll have to remove the Rent column from the main data
frame. To do that we’ll use the drop method of Pandas.
X = df.drop("Rent", axis = 1)
We also have to divide the dataset in a certain ratio to create some data for training on
which model trains itself and some data for testing which the model uses to evaluate itself.
And to do that we’ll use the train_test_split method provided with the scikit learn Python
library.
X_train,X_test,y_train,y_test=train_test_split( X,y, train_size = 0.6, random_state = 1)
Using the above code we have split the dataset int o 60 percent of training data and 40
percent of testing data. Now you can create the Linear Regression model by just creating an
instance of the Linear Regression class.
lr = LinearRegression()
Now, let’s train the model using the training and testing data.
lr.fit(X_train,y_train)
With the completion of this step, we have successfully built our first ever simple machine
learning model in python.
Model evaluation
It’s time to look at the performance of our first ever machine learning model. Aren’t you
excited? Let’s decode the results using the following code.
lr.score(X_test, y_test)
We can conclude that our model has an accuracy of 41.52% which is not very promising but
a good start towards building an accurate machine learning model in python.
Model prediction
Now let’s use our model in a real world scenario. We are going to find out if our model is
able to predict the first row of training data and the Rent present in it. To do that we need
to create a new data frame with following code to contain only the first row of training data
which we are going to predict.
df_new = X_train[:1]
Now let’s see if our model can predict the Rent of our new data frame.
lr.predict(df_new)
Let’s compare it with the actual value using the code below
y_train[:1]
As you can see we are pretty close with our prediction although it’s not accurate. Moving
forward we can build upon this knowledge and create more accurate models using other ML
algorithms that would work best in this case like the Random Forest algorithm.
Role of online learning platforms
Implementing simple machine learning models in python is easy as we did above but to
build real world accurate models that can make human lives better, we have to gain
sophisticated knowledge of Machine Learning. This is where Great Learning’s Machine
Learning comes in. If you want to have a thorough knowledge of how machine learning
algorithms work and how real life accurate models are built, this course is for you. You can
opt in for the free version called Basics of Machine Learning and then built upon that
knowledge with their paid course called PGP in Machine Learning.
In summary
We implemented a simple machine learning model in python from scratch and learnt a lot
about how machine learning works and how models are built. Machine learning is a great
field to build your career in. And with the right guidance, you can become an ML Engineer
and work towards a brighter future when humans and AI thrive together.