Decision tree is a graph to represent choices and their results in form of a tree. The nodes in the graph represent an event or choice and the edges of the graph represent the decision rules or conditions. It is mostly used in Machine Learning and Data Mining applications using R.

Examples of use of decision tress is − predicting an email as spam or not spam, predicting of a tumor is cancerous or predicting a loan as a good or bad credit risk based on the factors in each of these. Generally, a model is created with observed data also called training data. Then a set of validation data is used to verify and improve the model. R has packages which are used to create and visualize decision trees. For new set of predictor variable, we use this model to arrive at a decision on the category (yes/No, spam/not spam) of the data.

The R package “party” is used to create decision trees.

Install R Package

Use the below command in R console to install the package. You also have to install the dependent packages if any.

install.packages("party")

The package “party” has the function ctree() which is used to create and analyze decison tree.

Syntax

The basic syntax for creating a decision tree in R is −

ctree(formula, data)

Following is the description of the parameters used −

formula is a formula describing the predictor and response variables.
data is the name of the data set used.

Input Data

We will use the R in-built data set named readingSkills to create a decision tree. It describes the score of someone’s readingSkills if we know the variables “age”,”shoesize”,”score” and whether the person is a native speaker or not.

Here is the sample data.

# Load the party package. It will automatically load other

# dependent packages.

library(party)

# Print some records from data set readingSkills.

print(head(readingSkills))

When we execute the above code, it produces the following result and chart −

nativeSpeaker   age   shoeSize      score

1           yes     5   24.83189   32.29385

2           yes     6   25.95238   36.63105

3            no    11   30.42170   49.60593

4           yes     7   28.66450   40.28456

5           yes    11   31.88207   55.46085

6           yes    10   30.07843   52.83124

Loading required package: methods

Loading required package: grid

………………………….

Example

We will use the ctree() function to create the decision tree and see its graph.

# Load the party package. It will automatically load other

# dependent packages.

library(party)

# Create the input data frame.

input.dat <- readingSkills[c(1:105),]

# Give the chart file a name.

png(file = "decision_tree.png")

# Create the tree.

output.tree <- ctree(

nativeSpeaker ~ age + shoeSize + score,

data = input.dat)

# Plot the tree.

plot(output.tree)

# Save the file.

dev.off()

When we execute the above code, it produces the following result −

null device

1

Loading required package: methods

Loading required package: grid

Loading required package: mvtnorm

Loading required package: modeltools

Loading required package: stats4

Loading required package: strucchange

Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

as.Date, as.Date.numeric

Loading required package: sandwich

Conclusion

From the decision tree shown above we can conclude that anyone whose reading Skills score is less than 38.3 and age is more than 6 is not a native Speaker.

So, this brings us to the end of blog. This Tecklearn ‘Decision Tree in R’ blog helps you with commonly asked questions if you are looking out for a job in Data Science. If you wish to learn R Language and build a career in Data Science domain, then check out our interactive, Data Science using R Language Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/data-science-training-using-r-language/

Data Science using R Language Training

About the Course

Tecklearn’s Data Science using R Language Training develops knowledge and skills to visualize, transform, and model data in R language. It helps you to master the Data Science with R concepts such as data visualization, data manipulation, machine learning algorithms, charts, hypothesis testing, etc. through industry use cases, and real-time examples. Data Science course certification training lets you master data analysis, R statistical computing, connecting R with Hadoop framework, Machine Learning algorithms, time-series analysis, K-Means Clustering, Naïve Bayes, business analytics and more. This course will help you gain hands-on experience in deploying Recommender using R, Evaluation, Data Transformation etc.

Why Should you take Data Science Using R Training?

The Average salary of a Data Scientist in R is $123k per annum – Glassdoor.com
A recent market study shows that the Data Analytics Market is expected to grow at a CAGR of 30.08% from 2020 to 2023, which would equate to $77.6 billion.
IBM, Amazon, Apple, Google, Facebook, Microsoft, Oracle & other MNCs worldwide are using data science for their Data analysis.

What you will Learn in this Course?

Introduction to Data Science

Need for Data Science
What is Data Science
Life Cycle of Data Science
Applications of Data Science
Introduction to Big Data
Introduction to Machine Learning
Introduction to Deep Learning
Introduction to R&R-Studio
Project Based Data Science

Introduction to R

Introduction to R
Data Exploration
Operators in R
Inbuilt Functions in R
Flow Control Statements & User Defined Functions
Data Structures in R

Data Manipulation

Need for Data Manipulation
Introduction to dplyr package
Select (), filter(), mutate(), sample_n(), sample_frac() & count() functions
Getting summarized results with the summarise() function,
Combining different functions with the pipe operator
Implementing sql like operations with sqldf()

Visualization of Data

Loading different types of datasets in R
Arranging the data
Plotting the graphs

Introduction to Statistics

Types of Data
Probability
Correlation and Co-variance
Hypothesis Testing
Standardization and Normalization

Introduction to Machine Learning

What is Machine Learning?
Machine Learning Use-Cases
Machine Learning Process Flow
Machine Learning Categories
Supervised Learning algorithm: Linear Regression and Logistic Regression

Logistic Regression

Intro to Logistic Regression
Simple Logistic Regression in R
Multiple Logistic Regression in R
Confusion Matrix
ROC Curve

Classification Techniques

What are classification and its use cases?
What is Decision Tree?
Algorithm for Decision Tree Induction
Creating a Perfect Decision Tree
Confusion Matrix
What is Random Forest?
What is Naive Bayes?
Support Vector Machine: Classification

Decision Tree

Decision Tree in R
Information Gain
Gini Index
Pruning

Recommender Engines

What is Association Rules & its use cases?
What is Recommendation Engine & it’s working?
Types of Recommendations
User-Based Recommendation
Item-Based Recommendation
Difference: User-Based and Item-Based Recommendation
Recommendation use cases

Time Series Analysis

What is Time Series data?
Time Series variables
Different components of Time Series data
Visualize the data to identify Time Series Components
Implement ARIMA model for forecasting
Exponential smoothing models
Identifying different time series scenario based on which different Exponential Smoothing model can be applied

Got a question for us? Please mention it in the comments section and we will get back to you.

822