• Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in building the fast machine learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and classifying articles.

Why is it called Naïve Bayes?

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, which can be described as:
• Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of the occurrence of other features. Such as if the fruit is identified on the bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature individually contributes to identify that it is an apple without depending on each other.
• Bayes: It is called Bayes because it depends on the principle of Bayes’ Theorem.

Bayes’ Theorem:

• Bayes’ theorem is also known as Bayes’ Rule or Bayes’ law, which is used to determine the probability of a hypothesis with prior knowledge. It depends on the conditional probability.
• The formula for Bayes’ theorem is given as:

Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes’ Classifier:

Working of Naïve Bayes’ Classifier can be understood with the help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable “Play”. So using this dataset we need to decide that whether we should play or not on a particular day according to the weather conditions. So to solve this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:

		Outlook	Play
0	Rainy		Yes
1	Sunny		Yes
2	Overcast		Yes
3	Overcast		Yes
4	Sunny		No
5	Rainy		Yes
6	Sunny		Yes
7	Overcast		Yes
8	Rainy		No
9	Sunny		No
10	Sunny		Yes
11	Rainy		No
12	Overcast		Yes
13	Overcast		Yes

Frequency table for the Weather Conditions:

Weather	Yes	No
Overcast	5	0
Rainy	2	2
Sunny	3	2
Total	10	5

Likelihood table weather condition:

Weather	No	Yes
Overcast	0	5	5/14= 0.35
Rainy	2	2	4/14=0.29
Sunny	2	3	5/14=0.35
All	4/14=0.29	10/14=0.71

Applying Bayes’theorem:

Advantages of Naïve Bayes Classifier:
• Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
• It can be used for Binary as well as Multi-class Classifications.
• It performs well in Multi-class predictions as compared to the other Algorithms.
• It is the most popular choice for text classification problems.
Disadvantages of Naïve Bayes Classifier:
• Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship between features.
Applications of Naïve Bayes Classifier:
• It is used for Credit Scoring.
• It is used in medical data classification.
• It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
• It is used in Text classification such as Spam filtering and Sentiment analysis.

Types of Naïve Bayes Model:

There are three types of Naive Bayes Model, which are given below:
• Gaussian: The Gaussian model assumes that features follow a normal distribution. This means if predictors take continuous values instead of discrete, then the model assumes that these values are sampled from the Gaussian distribution.
• Multinomial: The Multinomial Naïve Bayes classifier is used when the data is multinomial distributed. It is primarily used for document classification problems, it means a particular document belongs to which category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
• Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor variables are the independent Booleans variables. Such as if a particular word is present or not in a document. This model is also famous for document classification tasks.

Python Implementation of the Naïve Bayes algorithm:

Now we will implement a Naive Bayes Algorithm using Python. So for this, we will use the “user_data” dataset, which we have used in our other classification model. Therefore we can easily compare the Naive Bayes model with the other models.
Steps to implement:
• Data Pre-processing step
• Fitting Naive Bayes to the Training set
• Predicting the test result
• Test accuracy of the result(Creation of Confusion matrix)
• Visualizing the test set result.
1) Data Pre-processing step:
In this step, we will pre-process/prepare the data so that we can use it efficiently in our code. It is similar as we did in data-pre-processing. The code for this is given below:

1. Importing the libraries 
2. import numpy as nm 
3. import matplotlib.pyplot as mtp 
4. import pandas as pd 
5. 
6. # Importing the dataset 
7. dataset = pd.read_csv('user_data.csv') 
8. x = dataset.iloc[:, [2, 3]].values 
9. y = dataset.iloc[:, 4].values 
10. 
11. # Splitting the dataset into the Training set and Test set 
12. from sklearn.model_selection import train_test_split 
13. x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0) 
14. 
15. # Feature Scaling 
16. from sklearn.preprocessing import StandardScaler 
17. sc = StandardScaler() 
18. x_train = sc.fit_transform(x_train) 
19. x_test = sc.transform(x_test)

In the above code, we have loaded the dataset into our program using “dataset = pd.read_csv(‘user_data.csv’). The loaded dataset is divided into training and test set, and then we have scaled the feature variable.
The output for the dataset is given as:

2) Fitting Naive Bayes to the Training Set:
After the pre-processing step, now we will fit the Naive Bayes model to the Training set. Below is the code for it:

1. # Fitting Naive Bayes to the Training set 
2. from sklearn.naive_bayes import GaussianNB 
3. classifier = GaussianNB() 
4. classifier.fit(x_train, y_train)

In the above code, we have used the GaussianNB classifier to fit it to the training dataset. We can also use other classifiers as per our requirement.
Output:
Out[6]: GaussianNB(priors=None, var_smoothing=1e-09)
3) Prediction of the test set result:
Now we will predict the test set result. For this, we will create a new predictor variable y_pred, and will use the predict function to make the predictions.

1. # Predicting the Test set results
2. y_pred = classifier.predict(x_test)

Output:

The above output shows the result for prediction vector y_pred and real vector y_test. We can see that some predications are different from the real values, which are the incorrect predictions.
4) Creating Confusion Matrix:
Now we will check the accuracy of the Naive Bayes classifier using the Confusion matrix. Below is the code for it:

1. # Making the Confusion Matrix
2. from sklearn.metrics import confusion_matrix
3. cm = confusion_matrix(y_test, y_pred)

Output:

As we can see in the above confusion matrix output, there are 7+3= 10 incorrect predictions, and 65+25=90 correct predictions.
5) Visualizing the training set result:
Next we will visualize the training set result using Naïve Bayes Classifier. Below is the code for it:

1. # Visualising the Training set results 
2. from matplotlib.colors import ListedColormap 
3. x_set, y_set = x_train, y_train 
4. X1, X2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step = 0.01), 
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01)) 
6. mtp.contourf(X1, X2, classifier.predict(nm.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape), 
7. alpha = 0.75, cmap = ListedColormap(('purple', 'green'))) 
8. mtp.xlim(X1.min(), X1.max()) 
9. mtp.ylim(X2.min(), X2.max()) 
10. for i, j in enumerate(nm.unique(y_set)): 
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], 
12. c = ListedColormap(('purple', 'green'))(i), label = j) 
13. mtp.title('Naive Bayes (Training set)') 
14. mtp.xlabel('Age') 
15. mtp.ylabel('Estimated Salary') 
16. mtp.legend() 
17. mtp.show() 
Output:

In the above output we can see that the Naïve Bayes classifier has segregated the data points with the fine boundary. It is Gaussian curve as we have used GaussianNB classifier in our code.
6) Visualizing the Test set result:

1. # Visualising the Test set results 
2. from matplotlib.colors import ListedColormap 
3. x_set, y_set = x_test, y_test 
4. X1, X2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step = 0.01), 
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01)) 
6. mtp.contourf(X1, X2, classifier.predict(nm.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape), 
7. alpha = 0.75, cmap = ListedColormap(('purple', 'green'))) 
8. mtp.xlim(X1.min(), X1.max()) 
9. mtp.ylim(X2.min(), X2.max()) 
10. for i, j in enumerate(nm.unique(y_set)): 
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], 
12. c = ListedColormap(('purple', 'green'))(i), label = j) 
13. mtp.title('Naive Bayes (test set)') 
14. mtp.xlabel('Age') 
15. mtp.ylabel('Estimated Salary') 
16. mtp.legend() 
17. mtp.show() 
Output:

The above output is final output for test set data. As we can see the classifier has created a Gaussian curve to divide the “purchased” and “not purchased” variables. There are some wrong predictions which we have calculated in Confusion matrix. But still it is pretty good classifier.
So, this brings us to the end of blog. This Tecklearn ‘Naïve Bayes Classifier Algorithm’ blog helps you with commonly asked questions if you are looking out for a job in Data Science. If you wish to learn Data Science and build a career in Data Science domain, then check out our interactive, Data Science using R Language Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/data-science-training-using-r-language/

Data Science using R Language Training

About the Course

Tecklearn’s Data Science using R Language Training develops knowledge and skills to visualize, transform, and model data in R language. It helps you to master the Data Science with R concepts such as data visualization, data manipulation, machine learning algorithms, charts, hypothesis testing, etc. through industry use cases, and real-time examples. Data Science course certification training lets you master data analysis, R statistical computing, connecting R with Hadoop framework, Machine Learning algorithms, time-series analysis, K-Means Clustering, Naïve Bayes, business analytics and more. This course will help you gain hands-on experience in deploying Recommender using R, Evaluation, Data Transformation etc.

Why Should you take Data Science Using R Training?

• The Average salary of a Data Scientist in R is $123k per annum – Glassdoor.com
• A recent market study shows that the Data Analytics Market is expected to grow at a CAGR of 30.08% from 2020 to 2023, which would equate to $77.6 billion.
• IBM, Amazon, Apple, Google, Facebook, Microsoft, Oracle & other MNCs worldwide are using data science for their Data analysis.

What you will Learn in this Course?

Introduction to Data Science
• Need for Data Science
• What is Data Science
• Life Cycle of Data Science
• Applications of Data Science
• Introduction to Big Data
• Introduction to Machine Learning
• Introduction to Deep Learning
• Introduction to R&R-Studio
• Project Based Data Science
Introduction to R
• Introduction to R
• Data Exploration
• Operators in R
• Inbuilt Functions in R
• Flow Control Statements & User Defined Functions
• Data Structures in R
Data Manipulation
• Need for Data Manipulation
• Introduction to dplyr package
• Select (), filter(), mutate(), sample_n(), sample_frac() & count() functions
• Getting summarized results with the summarise() function,
• Combining different functions with the pipe operator
• Implementing sql like operations with sqldf()
Visualization of Data
• Loading different types of datasets in R
• Arranging the data
• Plotting the graphs
Introduction to Statistics
• Types of Data
• Probability
• Correlation and Co-variance
• Hypothesis Testing
• Standardization and Normalization
Introduction to Machine Learning
• What is Machine Learning?
• Machine Learning Use-Cases
• Machine Learning Process Flow
• Machine Learning Categories
• Supervised Learning algorithm: Linear Regression and Logistic Regression
Logistic Regression
• Intro to Logistic Regression
• Simple Logistic Regression in R
• Multiple Logistic Regression in R
• Confusion Matrix
• ROC Curve
Classification Techniques
• What are classification and its use cases?
• What is Decision Tree?
• Algorithm for Decision Tree Induction
• Creating a Perfect Decision Tree
• Confusion Matrix
• What is Random Forest?
• What is Naive Bayes?
• Support Vector Machine: Classification
Decision Tree
• Decision Tree in R
• Information Gain
• Gini Index
• Pruning
Recommender Engines
• What is Association Rules & its use cases?
• What is Recommendation Engine & it’s working?
• Types of Recommendations
• User-Based Recommendation
• Item-Based Recommendation
• Difference: User-Based and Item-Based Recommendation
• Recommendation use cases
Time Series Analysis
• What is Time Series data?
• Time Series variables
• Different components of Time Series data
• Visualize the data to identify Time Series Components
• Implement ARIMA model for forecasting
• Exponential smoothing models
• Identifying different time series scenario based on which different Exponential Smoothing model can be applied

Got a question for us? Please mention it in the comments section and we will get back to you.

543