Binomial Distribution and Poisson Regression in R

Last updated on Sep 26 2022
Murugan Swamy

Table of Contents

Binomial Distribution and Poisson Regression in R

The binomial distribution model deals with finding the probability of success of an event which has only two possible outcomes in a series of experiments. For example, tossing of a coin always gives a head or a tail. The probability of finding exactly 3 heads in tossing a coin repeatedly for 10 times is estimated during the binomial distribution.

R has four in-built functions to generate binomial distribution. They are described below.

dbinom(x, size, prob)

pbinom(x, size, prob)

qbinom(p, size, prob)

rbinom(n, size, prob)

Following is the description of the parameters used −

  • x is a vector of numbers.
  • p is a vector of probabilities.
  • n is number of observations.
  • size is the number of trials.
  • prob is the probability of success of each trial.

dbinom()

This function gives the probability density distribution at each point.

 

# Create a sample of 50 numbers which are incremented by 1.
x <- seq(0,50,by = 1)
# Create the binomial distribution.
y <- dbinom(x,50,0.5)
# Give the chart file a name.
png(file = "dbinom.png")

# Plot the graph for this sample.
plot(x,y)
# Save the file.
dev.off()

When we execute the above code, it produces the following result −

r 1

pbinom()

This function gives the cumulative probability of an event. It is a single value representing the probability.

 

# Probability of getting 26 or less heads from a 51 tosses of a coin.
x <- pbinom(26,51,0.5)
print(x)

When we execute the above code, it produces the following result −

[1] 0.610116

qbinom()

This function takes the probability value and gives a number whose cumulative value matches the probability value.

 

# How many heads will have a probability of 0.25 will come out when a coin
# is tossed 51 times.
x <- qbinom(0.25,51,1/2)
print(x)

When we execute the above code, it produces the following result −

[1] 23

rbinom()

This function generates required number of random values of given probability from a given sample.

# Find 8 random values from a sample of 150 with probability of 0.4.
x <- rbinom(8,150,.4)
print(x)

When we execute the above code, it produces the following result −

[1] 58 61 59 66 55 60 61 67

 

R – Poisson Regression

Poisson Regression involves regression models in which the response variable is in the form of counts and not fractional numbers. For example, the count of number of births or number of wins in a football match series. Also the values of the response variables follow a Poisson distribution.

The general mathematical equation for Poisson regression is −

log(y) = a + b1x1 + b2x2 + bnxn.....

Following is the description of the parameters used −

  • y is the response variable.
  • a and b are the numeric coefficients.
  • x is the predictor variable.

The function used to create the Poisson regression model is the glm() function.

Syntax

The basic syntax for glm() function in Poisson regression is −

glm(formula,data,family)

Following is the description of the parameters used in above functions −

  • formula is the symbol presenting the relationship between the variables.
  • data is the data set giving the values of these variables.
  • family is R object to specify the details of the model. It’s value is ‘Poisson’ for Logistic Regression.

Example

We have the in-built data set “warpbreaks” which describes the effect of wool type (A or B) and tension (low, medium or high) on the number of warp breaks per loom. Let’s consider “breaks” as the response variable which is a count of number of breaks. The wool “type” and “tension” are taken as predictor variables.

Input Data

 

input <- warpbreaks

print(head(input))

When we execute the above code, it produces the following result −

      breaks   wool  tension
1     26       A     L

2     30       A     L

3     54       A     L

4     25       A     L

5     70       A     L

6     52       A     L

Create Regression Model

output <-glm(formula = breaks ~ wool+tension, data = warpbreaks,
   family = poisson)
print(summary(output))
When we execute the above code, it produces the following result −
Call:
glm(formula = breaks ~ wool + tension, family = poisson, data = warpbreaks)


Deviance Residuals:
    Min       1Q     Median       3Q      Max 

  -3.6871  -1.6503  -0.4269     1.1902   4.2616 

Coefficients:

            Estimate Std. Error z value Pr(>|z|)   

(Intercept)  3.69196    0.04541  81.302  < 2e-16 ***

woolB       -0.20599    0.05157  -3.994 6.49e-05 ***

tensionM    -0.32132    0.06027  -5.332 9.73e-08 ***

tensionH    -0.51849    0.06396  -8.107 5.21e-16 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 297.37  on 53  degrees of freedom

Residual deviance: 210.39  on 50  degrees of freedom

AIC: 493.06

Number of Fisher Scoring iterations: 4

In the summary we look for the p-value in the last column to be less than 0.05 to consider an impact of the predictor variable on the response variable. As seen the wool type B having tension type M and H have impact on the count of breaks.

So, this brings us to the end of blog. This Tecklearn ‘Binomial Distribution and Poisson Regression in R’ blog helps you with commonly asked questions if you are looking out for a job in Data Science. If you wish to learn R Language and build a career in Data Science domain, then check out our interactive, Data Science using R Language Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/data-science-training-using-r-language/

Data Science using R Language Training

About the Course

Tecklearn’s Data Science using R Language Training develops knowledge and skills to visualize, transform, and model data in R language. It helps you to master the Data Science with R concepts such as data visualization, data manipulation, machine learning algorithms, charts, hypothesis testing, etc. through industry use cases, and real-time examples. Data Science course certification training lets you master data analysis, R statistical computing, connecting R with Hadoop framework, Machine Learning algorithms, time-series analysis, K-Means Clustering, Naïve Bayes, business analytics and more. This course will help you gain hands-on experience in deploying Recommender using R, Evaluation, Data Transformation etc.

Why Should you take Data Science Using R Training?

  • The Average salary of a Data Scientist in R is $123k per annum – Glassdoor.com
  • A recent market study shows that the Data Analytics Market is expected to grow at a CAGR of 30.08% from 2020 to 2023, which would equate to $77.6 billion.
  • IBM, Amazon, Apple, Google, Facebook, Microsoft, Oracle & other MNCs worldwide are using data science for their Data analysis.

What you will Learn in this Course?

Introduction to Data Science

  • Need for Data Science
  • What is Data Science
  • Life Cycle of Data Science
  • Applications of Data Science
  • Introduction to Big Data
  • Introduction to Machine Learning
  • Introduction to Deep Learning
  • Introduction to R&R-Studio
  • Project Based Data Science

Introduction to R

  • Introduction to R
  • Data Exploration
  • Operators in R
  • Inbuilt Functions in R
  • Flow Control Statements & User Defined Functions
  • Data Structures in R

Data Manipulation

  • Need for Data Manipulation
  • Introduction to dplyr package
  • Select (), filter(), mutate(), sample_n(), sample_frac() & count() functions
  • Getting summarized results with the summarise() function,
  • Combining different functions with the pipe operator
  • Implementing sql like operations with sqldf()

Visualization of Data

  • Loading different types of datasets in R
  • Arranging the data
  • Plotting the graphs

Introduction to Statistics

  • Types of Data
  • Probability
  • Correlation and Co-variance
  • Hypothesis Testing
  • Standardization and Normalization

Introduction to Machine Learning

  • What is Machine Learning?
  • Machine Learning Use-Cases
  • Machine Learning Process Flow
  • Machine Learning Categories
  • Supervised Learning algorithm: Linear Regression and Logistic Regression

Logistic Regression

  • Intro to Logistic Regression
  • Simple Logistic Regression in R
  • Multiple Logistic Regression in R
  • Confusion Matrix
  • ROC Curve

Classification Techniques

  • What are classification and its use cases?
  • What is Decision Tree?
  • Algorithm for Decision Tree Induction
  • Creating a Perfect Decision Tree
  • Confusion Matrix
  • What is Random Forest?
  • What is Naive Bayes?
  • Support Vector Machine: Classification

Decision Tree

  • Decision Tree in R
  • Information Gain
  • Gini Index
  • Pruning

Recommender Engines

  • What is Association Rules & its use cases?
  • What is Recommendation Engine & it’s working?
  • Types of Recommendations
  • User-Based Recommendation
  • Item-Based Recommendation
  • Difference: User-Based and Item-Based Recommendation
  • Recommendation use cases

Time Series Analysis

  • What is Time Series data?
  • Time Series variables
  • Different components of Time Series data
  • Visualize the data to identify Time Series Components
  • Implement ARIMA model for forecasting
  • Exponential smoothing models
  • Identifying different time series scenario based on which different Exponential Smoothing model can be applied

Got a question for us? Please mention it in the comments section and we will get back to you.

 

0 responses on "Binomial Distribution and Poisson Regression in R"

Leave a Message

Your email address will not be published. Required fields are marked *