As we know, the Supervised Machine Learning algorithm can be broadly classified into Regression and Classification Algorithms. In Regression algorithms, we have predicted the output for continuous values, but to predict the categorical values, we need Classification algorithms.

What is the Classification Algorithm?

The Classification algorithm is a Supervised Learning technique that is used to identify the category of new observations on the basis of training data. In Classification, a program learns from the given dataset or observations and then classifies new observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels or categories.
Unlike regression, the output variable of Classification is a category, not a value, such as “Green or Blue”, “fruit or animal”, etc. Since the Classification algorithm is a Supervised learning technique, hence it takes labeled input data, which means it contains input with the corresponding output.
In classification algorithm, a discrete output function(y) is mapped to input variable(x).
1. y=f(x), where y = categorical output
The best example of an ML classification algorithm is Email Spam Detector.
The main goal of the Classification algorithm is to identify the category of a given dataset, and these algorithms are mainly used to predict the output for the categorical data.
Classification algorithms can be better understood using the below diagram. In the below diagram, there are two classes, class A and Class B. These classes have features that are similar to each other and dissimilar to other classes.

The algorithm which implements the classification on a dataset is known as a classifier. There are two types of Classifications:
• Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
• Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.

Learners in Classification Problems:

In the classification problems, there are two types of learners:
1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives the test dataset. In Lazy learner case, classification is done on the basis of the most related data stored in the training dataset. It takes less time in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners: Eager Learners develop a classification model based on a training dataset before receiving a test dataset. Opposite to Lazy learners, Eager learners take less time in training and more time in prediction. Example: Decision Trees, Naïve Bayes, ANN.

Types of ML Classification Algorithms:

Classification Algorithms can be further divided into the Mainly two category:

• Linear Models
o Logistic Regression
o Support Vector Machines
• Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
Note: We will learn the above algorithms in later chapters.

Evaluating a Classification model:

Once our model is completed, it is necessary to evaluate its performance; either it is a Classification or Regression model. So for evaluating a Classification model, we have the following ways:
1. Log Loss or Cross-Entropy Loss:
• It is used for evaluating the performance of a classifier, whose output is a probability value between the 0 and 1.
• For a good binary Classification model, the value of log loss should be near to 0.
• The value of log loss increases if the predicted value deviates from the actual value.
• The lower log loss represents the higher accuracy of the model.
• For Binary classification, cross-entropy can be calculated as:
1. ?(ylog(p)+(1?y)log(1?p))
Where y= Actual output, p= predicted output.
2. Confusion Matrix:
• The confusion matrix provides us a matrix/table as output and describes the performance of the model.
• It is also known as the error matrix.
• The matrix consists of predictions result in a summarized form, which has a total number of correct predictions and incorrect predictions. The matrix looks like as below table:

	Actual Positive	Actual Negative
Predicted Positive	True Positive	False Positive
Predicted Negative	False Negative	True Negative

3. AUC-ROC curve:
• ROC curve stands for Receiver Operating Characteristics Curve and AUC stands for Area Under the Curve.
• It is a graph that shows the performance of the classification model at different thresholds.
• To visualize the performance of the multi-class classification model, we use the AUC-ROC Curve.
• The ROC curve is plotted with TPR and FPR, where TPR (True Positive Rate) on Y-axis and FPR (False Positive Rate) on X-axis.

Use cases of Classification Algorithms

Classification algorithms can be used in different places. Below are some popular use cases of Classification Algorithms:
• Email Spam Detection
• Speech Recognition
• Identifications of Cancer tumor cells.
• Drugs Classification
• Biometric Identification, etc.

So, this brings us to the end of blog. This Tecklearn ‘Classification Algorithm in Machine Learning’ blog helps you with commonly asked questions if you are looking out for a job in Data Science. If you wish to learn Data Science and build a career in Data Science domain, then check out our interactive, Data Science using R Language Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/data-science-training-using-r-language/

Data Science using R Language Training

About the Course

Tecklearn’s Data Science using R Language Training develops knowledge and skills to visualize, transform, and model data in R language. It helps you to master the Data Science with R concepts such as data visualization, data manipulation, machine learning algorithms, charts, hypothesis testing, etc. through industry use cases, and real-time examples. Data Science course certification training lets you master data analysis, R statistical computing, connecting R with Hadoop framework, Machine Learning algorithms, time-series analysis, K-Means Clustering, Naïve Bayes, business analytics and more. This course will help you gain hands-on experience in deploying Recommender using R, Evaluation, Data Transformation etc.

Why Should you take Data Science Using R Training?

• The Average salary of a Data Scientist in R is $123k per annum – Glassdoor.com
• A recent market study shows that the Data Analytics Market is expected to grow at a CAGR of 30.08% from 2020 to 2023, which would equate to $77.6 billion.
• IBM, Amazon, Apple, Google, Facebook, Microsoft, Oracle & other MNCs worldwide are using data science for their Data analysis.

What you will Learn in this Course?

Introduction to Data Science
• Need for Data Science
• What is Data Science
• Life Cycle of Data Science
• Applications of Data Science
• Introduction to Big Data
• Introduction to Machine Learning
• Introduction to Deep Learning
• Introduction to R&R-Studio
• Project Based Data Science
Introduction to R
• Introduction to R
• Data Exploration
• Operators in R
• Inbuilt Functions in R
• Flow Control Statements & User Defined Functions
• Data Structures in R
Data Manipulation
• Need for Data Manipulation
• Introduction to dplyr package
• Select (), filter(), mutate(), sample_n(), sample_frac() & count() functions
• Getting summarized results with the summarise() function,
• Combining different functions with the pipe operator
• Implementing sql like operations with sqldf()
Visualization of Data
• Loading different types of datasets in R
• Arranging the data
• Plotting the graphs
Introduction to Statistics
• Types of Data
• Probability
• Correlation and Co-variance
• Hypothesis Testing
• Standardization and Normalization
Introduction to Machine Learning
• What is Machine Learning?
• Machine Learning Use-Cases
• Machine Learning Process Flow
• Machine Learning Categories
• Supervised Learning algorithm: Linear Regression and Logistic Regression
Logistic Regression
• Intro to Logistic Regression
• Simple Logistic Regression in R
• Multiple Logistic Regression in R
• Confusion Matrix
• ROC Curve
Classification Techniques
• What are classification and its use cases?
• What is Decision Tree?
• Algorithm for Decision Tree Induction
• Creating a Perfect Decision Tree
• Confusion Matrix
• What is Random Forest?
• What is Naive Bayes?
• Support Vector Machine: Classification
Decision Tree
• Decision Tree in R
• Information Gain
• Gini Index
• Pruning
Recommender Engines
• What is Association Rules & its use cases?
• What is Recommendation Engine & it’s working?
• Types of Recommendations
• User-Based Recommendation
• Item-Based Recommendation
• Difference: User-Based and Item-Based Recommendation
• Recommendation use cases
Time Series Analysis
• What is Time Series data?
• Time Series variables
• Different components of Time Series data
• Visualize the data to identify Time Series Components
• Implement ARIMA model for forecasting
• Exponential smoothing models
• Identifying different time series scenario based on which different Exponential Smoothing model can be applied

Got a question for us? Please mention it in the comments section and we will get back to you.

778