ML Polynomial Regression

Last updated on Dec 15 2021
Paresha Dudhedia

Table of Contents

ML Polynomial Regression

• Polynomial Regression is a regression algorithm that models the relationship between a dependent(y) and independent variable(x) as nth degree polynomial. The Polynomial Regression equation is given below:
y= b0+b1x1+ b2x12+ b2x13+…… bnx1n
• It is also called the special case of Multiple Linear Regression in ML. Because we add some polynomial terms to the Multiple Linear regression equation to convert it into Polynomial Regression.
• It is a linear model with some modification in order to increase the accuracy.
• The dataset used in Polynomial regression for training is of non-linear nature.
• It makes use of a linear regression model to fit the complicated and non-linear functions and datasets.
Hence, “In Polynomial regression, the original features are converted into Polynomial features of required degree (2,3,..,n) and then modeled using a linear model.”

Need for Polynomial Regression:

The need of Polynomial Regression in ML can be understood in the below points:
• If we apply a linear model on a linear dataset, then it provides us a good result as we have seen in Simple Linear Regression, but if we apply the same model without any modification on a non-linear dataset, then it will produce a drastic output. Due to which loss function will increase, the error rate will be high, and accuracy will be decreased.
• So for such cases, where data points are arranged in a non-linear fashion, we need the Polynomial Regression model. We can understand it in a better way using the below comparison diagram of the linear dataset and non-linear dataset.

data science 32
data science

• In the above image, we have taken a dataset which is arranged non-linearly. So if we try to cover it with a linear model, then we can clearly see that it hardly covers any data point. On the other hand, a curve is suitable to cover most of the data points, which is of the Polynomial model.
• Hence, if the datasets are arranged in a non-linear fashion, then we should use the Polynomial Regression model instead of Simple Linear Regression.
Note: A Polynomial Regression algorithm is also called Polynomial Linear Regression because it does not depend on the variables, instead, it depends on the coefficients, which are arranged in a linear fashion.

Equation of the Polynomial Regression Model:

Simple Linear Regression equation: y = b0+b1x ………(a)
Multiple Linear Regression equation: y= b0+b1x+ b2x2+ b3x3+….+ bnxn ………(b)
Polynomial Regression equation: y= b0+b1x + b2x2+ b3x3+….+ bnxn ……….(c)

When we compare the above three equations, we can clearly see that all three equations are Polynomial equations but differ by the degree of variables. The Simple and Multiple Linear equations are also Polynomial equations with a single degree, and the Polynomial regression equation is Linear equation with the nth degree. So if we add a degree to our linear equations, then it will be converted into Polynomial Linear equations.
Note: To better understand Polynomial Regression, you must have knowledge of Simple Linear Regression.

Implementation of Polynomial Regression using Python:

Here we will implement the Polynomial Regression using Python. We will understand it by comparing Polynomial Regression model with the Simple Linear Regression model. So first, let’s understand the problem for which we are going to build the model.
Problem Description: There is a Human Resource company, which is going to hire a new candidate. The candidate has told his previous salary 160K per annum, and the HR have to check whether he is telling the truth or bluff. So to identify this, they only have a dataset of his previous company in which the salaries of the top 10 positions are mentioned with their levels. By checking the dataset available, we have found that there is a non-linear relationship between the Position levels and the salaries. Our goal is to build a Bluffing detector regression model, so HR can hire an honest candidate. Below are the steps to build such a model.

data science 33
data science

Steps for Polynomial Regression:

The main steps involved in Polynomial Regression are given below:
• Data Pre-processing
• Build a Linear Regression model and fit it to the dataset
• Build a Polynomial Regression model and fit it to the dataset
• Visualize the result for Linear Regression and Polynomial Regression model.
• Predicting the output.
Note: Here, we will build the Linear regression model as well as Polynomial Regression to see the results between the predictions. And Linear regression model is for reference.

Data Pre-processing Step:

The data pre-processing step will remain the same as in previous regression models, except for some changes. In the Polynomial Regression model, we will not use feature scaling, and also we will not split our dataset into training and test set. It has two reasons:
• The dataset contains very less information which is not suitable to divide it into a test and training set, else our model will not be able to find the correlations between the salaries and levels.
• In this model, we want very accurate predictions for salary, so the model should have enough information.
The code for pre-processing step is given below:

1. # importing libraries 
2. import numpy as nm 
3. import matplotlib.pyplot as mtp 
4. import pandas as pd 
5. 
6. #importing datasets 
7. data_set= pd.read_csv('Position_Salaries.csv') 
8. 
9. #Extracting Independent and dependent Variable 
10. x= data_set.iloc[:, 1:2].values 
11. y= data_set.iloc[:, 2].values

Explanation:
• In the above lines of code, we have imported the important Python libraries to import dataset and operate on it.
• Next, we have imported the dataset ‘Position_Salaries.csv’, which contains three columns (Position, Levels, and Salary), but we will consider only two columns (Salary and Levels).
• After that, we have extracted the dependent(Y) and independent variable(X) from the dataset. For x-variable, we have taken parameters as [:,1:2], because we want 1 index(levels), and included :2 to make it as a matrix.
Output:
By executing the above code, we can read our dataset as:

data science 34
data science

As we can see in the above output, there are three columns present (Positions, Levels, and Salaries). But we are only considering two columns because Positions are equivalent to the levels or may be seen as the encoded form of Positions.
Here we will predict the output for level 6.5 because the candidate has 4+ years’ experience as a regional manager, so he must be somewhere between levels 7 and 6.
Building the Linear regression model:
Now, we will build and fit the Linear regression model to the dataset. In building polynomial regression, we will take the Linear regression model as reference and compare both the results. The code is given below:

1. #Fitting the Linear Regression to the dataset 
2. from sklearn.linear_model import LinearRegression 
3. lin_regs= LinearRegression() 
4. lin_regs.fit(x,y)

In the above code, we have created the Simple Linear model using lin_regs object of LinearRegression class and fitted it to the dataset variables (x and y).
Output:
Out[5]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
Building the Polynomial regression model:
Now we will build the Polynomial Regression model, but it will be a little different from the Simple Linear model. Because here we will use PolynomialFeatures class of preprocessing library. We are using this class to add some extra features to our dataset.

1. #Fitting the Polynomial regression to the dataset 
2. from sklearn.preprocessing import PolynomialFeatures 
3. poly_regs= PolynomialFeatures(degree= 2) 
4. x_poly= poly_regs.fit_transform(x) 
5. lin_reg_2 =LinearRegression() 
6. lin_reg_2.fit(x_poly, y)

In the above lines of code, we have used poly_regs.fit_transform(x), because first we are converting our feature matrix into polynomial feature matrix, and then fitting it to the Polynomial regression model. The parameter value(degree= 2) depends on our choice. We can choose it according to our Polynomial features.
After executing the code, we will get another matrix x_poly, which can be seen under the variable explorer option:

data science 35
data science

Next, we have used another LinearRegression object, namely lin_reg_2, to fit our x_poly vector to the linear model.
Output:
Out[11]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
Visualizing the result for Linear regression:
Now we will visualize the result for Linear regression model as we did in Simple Linear Regression. Below is the code for it:

1. #Visulaizing the result for Linear Regression model 
2. mtp.scatter(x,y,color="blue") 
3. mtp.plot(x,lin_regs.predict(x), color="red") 
4. mtp.title("Bluff detection model(Linear Regression)") 
5. mtp.xlabel("Position Levels") 
6. mtp.ylabel("Salary") 
7. mtp.show() 
Output:
data science 36
data science

In the above output image, we can clearly see that the regression line is so far from the datasets. Predictions are in a red straight line, and blue points are actual values. If we consider this output to predict the value of CEO, it will give a salary of approx. 600000$, which is far away from the real value.
So we need a curved model to fit the dataset other than a straight line.
Visualizing the result for Polynomial Regression
Here we will visualize the result of Polynomial regression model, code for which is little different from the above model.
Code for this is given below:

1. #Visulaizing the result for Polynomial Regression 
2. mtp.scatter(x,y,color="blue") 
3. mtp.plot(x, lin_reg_2.predict(poly_regs.fit_transform(x)), color="red") 
4. mtp.title("Bluff detection model(Polynomial Regression)") 
5. mtp.xlabel("Position Levels") 
6. mtp.ylabel("Salary") 
7. mtp.show()

In the above code, we have taken lin_reg_2.predict(poly_regs.fit_transform(x), instead of x_poly, because we want a Linear regressor object to predict the polynomial features matrix.
Output:

data science 37
data science

As we can see in the above output image, the predictions are close to the real values. The above plot will vary as we will change the degree.
For degree= 3:
If we change the degree=3, then we will give a more accurate plot, as shown in the below image.

data science 38
data science

SO as we can see here in the above output image, the predicted salary for level 6.5 is near to 170K$-190k$, which seems that future employee is saying the truth about his salary.
Degree= 4: Let’s again change the degree to 4, and now will get the most accurate plot. Hence we can get more accurate results by increasing the degree of Polynomial.

data science 39
data science

Predicting the final result with the Linear Regression model:
Now, we will predict the final output using the Linear regression model to see whether an employee is saying truth or bluff. So, for this, we will use the predict() method and will pass the value 6.5. Below is the code for it:

1. lin_pred = lin_regs.predict([[6.5]]) 
2. print(lin_pred)

Output:
[330378.78787879]
Predicting the final result with the Polynomial Regression model:
Now, we will predict the final output using the Polynomial Regression model to compare with Linear model. Below is the code for it:

1. poly_pred = lin_reg_2.predict(poly_regs.fit_transform([[6.5]])) 
2. print(poly_pred)

Output:
[158862.45265153]
As we can see, the predicted output for the Polynomial Regression is [158862.45265153], which is much closer to real value hence, we can say that future employee is saying true.
So, this brings us to the end of blog. This Tecklearn ‘ML Polynomial Regression’ blog helps you with commonly asked questions if you are looking out for a job in Data Science. If you wish to learn Data Science and build a career in Data Science domain, then check out our interactive, Data Science using R Language Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/data-science-training-using-r-language/

Data Science using R Language Training

About the Course

Tecklearn’s Data Science using R Language Training develops knowledge and skills to visualize, transform, and model data in R language. It helps you to master the Data Science with R concepts such as data visualization, data manipulation, machine learning algorithms, charts, hypothesis testing, etc. through industry use cases, and real-time examples. Data Science course certification training lets you master data analysis, R statistical computing, connecting R with Hadoop framework, Machine Learning algorithms, time-series analysis, K-Means Clustering, Naïve Bayes, business analytics and more. This course will help you gain hands-on experience in deploying Recommender using R, Evaluation, Data Transformation etc.

Why Should you take Data Science Using R Training?

• The Average salary of a Data Scientist in R is $123k per annum – Glassdoor.com
• A recent market study shows that the Data Analytics Market is expected to grow at a CAGR of 30.08% from 2020 to 2023, which would equate to $77.6 billion.
• IBM, Amazon, Apple, Google, Facebook, Microsoft, Oracle & other MNCs worldwide are using data science for their Data analysis.

What you will Learn in this Course?

Introduction to Data Science
• Need for Data Science
• What is Data Science
• Life Cycle of Data Science
• Applications of Data Science
• Introduction to Big Data
• Introduction to Machine Learning
• Introduction to Deep Learning
• Introduction to R&R-Studio
• Project Based Data Science
Introduction to R
• Introduction to R
• Data Exploration
• Operators in R
• Inbuilt Functions in R
• Flow Control Statements & User Defined Functions
• Data Structures in R
Data Manipulation
• Need for Data Manipulation
• Introduction to dplyr package
• Select (), filter(), mutate(), sample_n(), sample_frac() & count() functions
• Getting summarized results with the summarise() function,
• Combining different functions with the pipe operator
• Implementing sql like operations with sqldf()
Visualization of Data
• Loading different types of datasets in R
• Arranging the data
• Plotting the graphs
Introduction to Statistics
• Types of Data
• Probability
• Correlation and Co-variance
• Hypothesis Testing
• Standardization and Normalization
Introduction to Machine Learning
• What is Machine Learning?
• Machine Learning Use-Cases
• Machine Learning Process Flow
• Machine Learning Categories
• Supervised Learning algorithm: Linear Regression and Logistic Regression
Logistic Regression
• Intro to Logistic Regression
• Simple Logistic Regression in R
• Multiple Logistic Regression in R
• Confusion Matrix
• ROC Curve
Classification Techniques
• What are classification and its use cases?
• What is Decision Tree?
• Algorithm for Decision Tree Induction
• Creating a Perfect Decision Tree
• Confusion Matrix
• What is Random Forest?
• What is Naive Bayes?
• Support Vector Machine: Classification
Decision Tree
• Decision Tree in R
• Information Gain
• Gini Index
• Pruning
Recommender Engines
• What is Association Rules & its use cases?
• What is Recommendation Engine & it’s working?
• Types of Recommendations
• User-Based Recommendation
• Item-Based Recommendation
• Difference: User-Based and Item-Based Recommendation
• Recommendation use cases
Time Series Analysis
• What is Time Series data?
• Time Series variables
• Different components of Time Series data
• Visualize the data to identify Time Series Components
• Implement ARIMA model for forecasting
• Exponential smoothing models
• Identifying different time series scenario based on which different Exponential Smoothing model can be applied

Got a question for us? Please mention it in the comments section and we will get back to you.

0 responses on "ML Polynomial Regression"

Leave a Message

Your email address will not be published. Required fields are marked *