Data Frames in R Language

Last updated on Dec 13 2021
Murugan Swamy

Table of Contents

Data Frames in R Language

A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.

Following are the characteristics of a data frame.

  • The column names should be non-empty.
  • The row names should be unique.
  • The data stored in a data frame can be of numeric, factor or character type.
  • Each column should contain same number of data items.

Create Data Frame

 

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5),
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25),
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactors = FALSE
)

# Print the data frame.                                      
print(emp.data)

When we execute the above code, it produces the following result −

 emp_id    emp_name     salary     start_date

1     1     Rick        623.30     2012-01-01

2     2     Dan         515.20     2013-09-23

3     3     Michelle    611.00     2014-11-15

4     4     Ryan        729.00     2014-05-11

5     5     Gary        843.25     2015-03-27

Get the Structure of the Data Frame

The structure of the data frame can be seen by using str() function.

  

# Create the data frame.

emp.data <- data.frame(

   emp_id = c (1:5),

   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),

   salary = c(623.3,515.2,611.0,729.0,843.25),

  

   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",

      "2015-03-27")),

   stringsAsFactors = FALSE

)

# Get the structure of the data frame.

str(emp.data)

When we execute the above code, it produces the following result −

'data.frame':   5 obs. of  4 variables:

 $ emp_id    : int  1 2 3 4 5

 $ emp_name  : chr  "Rick" "Dan" "Michelle" "Ryan" ...

 $ salary    : num  623 515 611 729 843

 $ start_date: Date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ...

Summary of Data in Data Frame

The statistical summary and nature of the data can be obtained by applying summary() function.

 

# Create the data frame.

emp.data <- data.frame(

   emp_id = c (1:5),

   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),

   salary = c(623.3,515.2,611.0,729.0,843.25),

  

   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",

      "2015-03-27")),

   stringsAsFactors = FALSE

)

# Print the summary.

print(summary(emp.data))

When we execute the above code, it produces the following result −

     emp_id    emp_name             salary        start_date       

 Min.   :1   Length:5           Min.   :515.2   Min.   :2012-01-01 

 1st Qu.:2   Class :character   1st Qu.:611.0   1st Qu.:2013-09-23 

 Median :3   Mode  :character   Median :623.3   Median :2014-05-11 

 Mean   :3                      Mean   :664.4   Mean   :2014-01-14 

 3rd Qu.:4                      3rd Qu.:729.0   3rd Qu.:2014-11-15 

 Max.   :5                      Max.   :843.2   Max.   :2015-03-27

Extract Data from Data Frame

Extract specific column from a data frame using column name.

 

# Create the data frame.

emp.data <- data.frame(

   emp_id = c (1:5),

   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),

   salary = c(623.3,515.2,611.0,729.0,843.25),

  

   start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-05-11",

      "2015-03-27")),

   stringsAsFactors = FALSE

)

# Extract Specific columns.

result <- data.frame(emp.data$emp_name,emp.data$salary)

print(result)

When we execute the above code, it produces the following result −

  emp.data.emp_name emp.data.salary

1              Rick          623.30

2               Dan          515.20

3          Michelle          611.00

4              Ryan          729.00

5              Gary          843.25

Extract the first two rows and then all columns

 

# Create the data frame.

emp.data <- data.frame(

   emp_id = c (1:5),

   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),

   salary = c(623.3,515.2,611.0,729.0,843.25),

  

   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",

      "2015-03-27")),

   stringsAsFactors = FALSE

)

# Extract first two rows.

result <- emp.data[1:2,]

print(result)

When we execute the above code, it produces the following result −

  emp_id    emp_name   salary    start_date

1      1     Rick      623.3     2012-01-01

2      2     Dan       515.2     2013-09-23

Extract 3rd and 5th row with 2nd and 4th column

  

# Create the data frame.

emp.data <- data.frame(

   emp_id = c (1:5),

   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),

   salary = c(623.3,515.2,611.0,729.0,843.25),

  

               start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",

      "2015-03-27")),

   stringsAsFactors = FALSE

)

 

# Extract 3rd and 5th row with 2nd and 4th column.

result <- emp.data[c(3,5),c(2,4)]

print(result)

When we execute the above code, it produces the following result −

  emp_name start_date

3 Michelle 2014-11-15

5     Gary 2015-03-27

Expand Data Frame

A data frame can be expanded by adding columns and rows.

Add Column

Just add the column vector using a new column name.

 

# Create the data frame.

emp.data <- data.frame(

   emp_id = c (1:5),

   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),

   salary = c(623.3,515.2,611.0,729.0,843.25),

  

   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",

      "2015-03-27")),

   stringsAsFactors = FALSE

)

 

# Add the "dept" coulmn.

emp.data$dept <- c("IT","Operations","IT","HR","Finance")

v <- emp.data

print(v)

When we execute the above code, it produces the following result −

  emp_id   emp_name    salary    start_date       dept

1     1    Rick        623.30    2012-01-01       IT

2     2    Dan         515.20    2013-09-23       Operations

3     3    Michelle    611.00    2014-11-15       IT

4     4    Ryan        729.00    2014-05-11       HR

5     5    Gary        843.25    2015-03-27       Finance

Add Row

To add more rows permanently to an existing data frame, we need to bring in the new rows in the same structure as the existing data frame and use the rbind() function.

In the example below we create a data frame with new rows and merge it with the existing data frame to create the final data frame.

 

# Create the first data frame.

emp.data <- data.frame(

   emp_id = c (1:5),

   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),

   salary = c(623.3,515.2,611.0,729.0,843.25),

  

   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",

      "2015-03-27")),

   dept = c("IT","Operations","IT","HR","Finance"),

   stringsAsFactors = FALSE

)




# Create the second data frame

emp.newdata <-    data.frame(

   emp_id = c (6:8),

   emp_name = c("Rasmi","Pranab","Tusar"),

   salary = c(578.0,722.5,632.8),

   start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),

   dept = c("IT","Operations","Fianance"),

   stringsAsFactors = FALSE

)




# Bind the two data frames.

emp.finaldata <- rbind(emp.data,emp.newdata)

print(emp.finaldata)

When we execute the above code, it produces the following result −

  emp_id     emp_name    salary     start_date       dept

1      1     Rick        623.30     2012-01-01       IT

2      2     Dan         515.20     2013-09-23       Operations

3      3     Michelle    611.00     2014-11-15       IT

4      4     Ryan        729.00     2014-05-11       HR

5      5     Gary        843.25     2015-03-27       Finance

6      6     Rasmi       578.00     2013-05-21       IT

7      7     Pranab      722.50     2013-07-30       Operations

8      8     Tusar       632.80     2014-06-17       Fianance

 

So, this brings us to the end of blog. This Tecklearn ‘Data Frames in R Language’ blog helps you with commonly asked questions if you are looking out for a job in Data Science. If you wish to learn R Language and build a career in Data Science domain, then check out our interactive, Data Science using R Language Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/data-science-training-using-r-language/

Data Science using R Language Training

About the Course

Tecklearn’s Data Science using R Language Training develops knowledge and skills to visualize, transform, and model data in R language. It helps you to master the Data Science with R concepts such as data visualization, data manipulation, machine learning algorithms, charts, hypothesis testing, etc. through industry use cases, and real-time examples. Data Science course certification training lets you master data analysis, R statistical computing, connecting R with Hadoop framework, Machine Learning algorithms, time-series analysis, K-Means Clustering, Naïve Bayes, business analytics and more. This course will help you gain hands-on experience in deploying Recommender using R, Evaluation, Data Transformation etc.

Why Should you take Data Science Using R Training?

  • The Average salary of a Data Scientist in R is $123k per annum – Glassdoor.com
  • A recent market study shows that the Data Analytics Market is expected to grow at a CAGR of 30.08% from 2020 to 2023, which would equate to $77.6 billion.
  • IBM, Amazon, Apple, Google, Facebook, Microsoft, Oracle & other MNCs worldwide are using data science for their Data analysis.

What you will Learn in this Course?

Introduction to Data Science

  • Need for Data Science
  • What is Data Science
  • Life Cycle of Data Science
  • Applications of Data Science
  • Introduction to Big Data
  • Introduction to Machine Learning
  • Introduction to Deep Learning
  • Introduction to R&R-Studio
  • Project Based Data Science

Introduction to R

  • Introduction to R
  • Data Exploration
  • Operators in R
  • Inbuilt Functions in R
  • Flow Control Statements & User Defined Functions
  • Data Structures in R

Data Manipulation

  • Need for Data Manipulation
  • Introduction to dplyr package
  • Select (), filter(), mutate(), sample_n(), sample_frac() & count() functions
  • Getting summarized results with the summarise() function,
  • Combining different functions with the pipe operator
  • Implementing sql like operations with sqldf()

Visualization of Data

  • Loading different types of datasets in R
  • Arranging the data
  • Plotting the graphs

Introduction to Statistics

  • Types of Data
  • Probability
  • Correlation and Co-variance
  • Hypothesis Testing
  • Standardization and Normalization

Introduction to Machine Learning

  • What is Machine Learning?
  • Machine Learning Use-Cases
  • Machine Learning Process Flow
  • Machine Learning Categories
  • Supervised Learning algorithm: Linear Regression and Logistic Regression

Logistic Regression

  • Intro to Logistic Regression
  • Simple Logistic Regression in R
  • Multiple Logistic Regression in R
  • Confusion Matrix
  • ROC Curve

Classification Techniques

  • What are classification and its use cases?
  • What is Decision Tree?
  • Algorithm for Decision Tree Induction
  • Creating a Perfect Decision Tree
  • Confusion Matrix
  • What is Random Forest?
  • What is Naive Bayes?
  • Support Vector Machine: Classification

Decision Tree

  • Decision Tree in R
  • Information Gain
  • Gini Index
  • Pruning

Recommender Engines

  • What is Association Rules & its use cases?
  • What is Recommendation Engine & it’s working?
  • Types of Recommendations
  • User-Based Recommendation
  • Item-Based Recommendation
  • Difference: User-Based and Item-Based Recommendation
  • Recommendation use cases

Time Series Analysis

  • What is Time Series data?
  • Time Series variables
  • Different components of Time Series data
  • Visualize the data to identify Time Series Components
  • Implement ARIMA model for forecasting
  • Exponential smoothing models
  • Identifying different time series scenario based on which different Exponential Smoothing model can be applied

Got a question for us? Please mention it in the comments section and we will get back to you.

 

0 responses on "Data Frames in R Language"

Leave a Message

Your email address will not be published. Required fields are marked *