Deep Dive into Apache Pig Functions: Load & Store, Bag & Tuple, String, Date-time, Math

Last updated on May 30 2022
Inderjeet Chopra

Table of Contents

Deep Dive into Apache Pig Functions: Load & Store, Bag & Tuple, String, Date-time, Math

Apache Pig – Load & Store Functions

The Load and Store functions in Apache Pig are used to determine how the data goes ad comes out of Pig. These functions are used with the load and store operators. Given below is the list of load and store functions available in Pig.

S.N. Function & Description
1 PigStorage()

To load and store structured files.

2 TextLoader()

To load unstructured data into Pig.

3 BinStorage()

To load and store data into Pig using machine readable format.

4 Handling Compression

In Pig Latin, we can load and store compressed data.

Apache Pig – Bag & Tuple Functions

Given below is the list of Bag and Tuple functions.

S.N. Function & Description
1 TOBAG()

To convert two or more expressions into a bag.

2 TOP()

To get the top N tuples of a relation.

3 TOTUPLE()

To convert one or more expressions into a tuple.

4 TOMAP()

To convert the key-value pairs into a Map.

Apache Pig – String Functions

We have the following String functions in Apache Pig.

S.N. Functions & Description
1 ENDSWITH(string, testAgainst)

To verify whether a given string ends with a particular substring.

2 STARTSWITH(string, substring)

Accepts two string parameters and verifies whether the first string starts with the second.

3 SUBSTRING(string, startIndex, stopIndex)

Returns a substring from a given string.

4 EqualsIgnoreCase(string1, string2)

To compare two stings ignoring the case.

5 INDEXOF(string, ‘character’, startIndex)

Returns the first occurrence of a character in a string, searching forward from a start index.

6 LAST_INDEX_OF(expression)

Returns the index of the last occurrence of a character in a string, searching backward from a start index.

7 LCFIRST(expression)

Converts the first character in a string to lower case.

8 UCFIRST(expression)

Returns a string with the first character converted to upper case.

9 UPPER(expression)

UPPER(expression) Returns a string converted to upper case.

10 LOWER(expression)

Converts all characters in a string to lower case.

11 REPLACE(string, ‘oldChar’, ‘newChar’);

To replace existing characters in a string with new characters.

12 STRSPLIT(string, regex, limit)

To split a string around matches of a given regular expression.

13 STRSPLITTOBAG(string, regex, limit)

Similar to the STRSPLIT() function, it splits the string by given delimiter and returns the result in a bag.

14 TRIM(expression)

Returns a copy of a string with leading and trailing whitespaces removed.

15 LTRIM(expression)

Returns a copy of a string with leading whitespaces removed.

16 RTRIM(expression)

Returns a copy of a string with trailing whitespaces removed.

Apache Pig – Date-time Functions

Apache Pig provides the following Date and Time functions −

S.N. Functions & Description
1 ToDate(milliseconds)

This function returns a date-time object according to the given parameters. The other alternative for this function are ToDate(iosstring), ToDate(userstring, format), ToDate(userstring, format, timezone)

2 CurrentTime()

returns the date-time object of the current time.

3 GetDay(datetime)

Returns the day of a month from the date-time object.

4 GetHour(datetime)

Returns the hour of a day from the date-time object.

5 GetMilliSecond(datetime)

Returns the millisecond of a second from the date-time object.

6 GetMinute(datetime)

Returns the minute of an hour from the date-time object.

7 GetMonth(datetime)

Returns the month of a year from the date-time object.

8 GetSecond(datetime)

Returns the second of a minute from the date-time object.

9 GetWeek(datetime)

Returns the week of a year from the date-time object.

10 GetWeekYear(datetime)

Returns the week year from the date-time object.

11 GetYear(datetime)

Returns the year from the date-time object.

12 AddDuration(datetime, duration)

Returns the result of a date-time object along with the duration object.

13 SubtractDuration(datetime, duration)

Subtracts the Duration object from the Date-Time object and returns the result.

14 DaysBetween(datetime1, datetime2)

Returns the number of days between the two date-time objects.

15 HoursBetween(datetime1, datetime2)

Returns the number of hours between two date-time objects.

16 MilliSecondsBetween(datetime1, datetime2)

Returns the number of milliseconds between two date-time objects.

17 MinutesBetween(datetime1, datetime2)

Returns the number of minutes between two date-time objects.

18 MonthsBetween(datetime1, datetime2)

Returns the number of months between two date-time objects.

19 SecondsBetween(datetime1, datetime2)

Returns the number of seconds between two date-time objects.

20 WeeksBetween(datetime1, datetime2)

Returns the number of weeks between two date-time objects.

21 YearsBetween(datetime1, datetime2)

Returns the number of years between two date-time objects.

Apache Pig – Math Functions

We have the following Math functions in Apache Pig −

S.N. Functions & Description
1 ABS(expression)

To get the absolute value of an expression.

2 ACOS(expression)

To get the arc cosine of an expression.

3 ASIN(expression)

To get the arc sine of an expression.

4 ATAN(expression)

This function is used to get the arc tangent of an expression.

5 CBRT(expression)

This function is used to get the cube root of an expression.

6 CEIL(expression)

This function is used to get the value of an expression rounded up to the nearest integer.

7 COS(expression)

This function is used to get the trigonometric cosine of an expression.

8 COSH(expression)

This function is used to get the hyperbolic cosine of an expression.

9 EXP(expression)

This function is used to get the Euler’s number e raised to the power of x.

10 FLOOR(expression)

To get the value of an expression rounded down to the nearest integer.

11 LOG(expression)

To get the natural logarithm (base e) of an expression.

12 LOG10(expression)

To get the base 10 logarithm of an expression.

13 RANDOM( )

To get a pseudo random number (type double) greater than or equal to 0.0 and less than 1.0.

14 ROUND(expression)

To get the value of an expression rounded to an integer (if the result type is float) or rounded to a long (if the result type is double).

15 SIN(expression)

To get the sine of an expression.

16 SINH(expression)

To get the hyperbolic sine of an expression.

17 SQRT(expression)

To get the positive square root of an expression.

18 TAN(expression)

To get the trigonometric tangent of an angle.

19 TANH(expression)

To get the hyperbolic tangent of an expression.

So, this brings us to the end of blog. This Tecklearn ‘Deep dive into Apache Pig Functions Load & Store , Bag & Tuple , String , Date-time , Math’ helps you with commonly asked questions if you are looking out for a job in Apache Pig and Big Data Domain.
If you wish to learn Apache Pig and build a career in Apache Pig or Big Data domain, then check out our interactive, Big Data Hadoop Analyst Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

Big Data Hadoop Analyst

Big Data Hadoop Analyst Training

About the Course

Big Data analysis is emerging as a key advantage in business intelligence for many organizations. Our Big Data and Hadoop training course lets you deep-dive into the concepts of Big Data, equipping you with the skills required for Hadoop Analyst roles. This course will enable an Analyst to work on Big Data and Hadoop which takes into consideration the burgeoning demands of the industry to process and analyse data at high speeds. This training course will give you the right skills to deploy various tools and techniques to be a Hadoop Analyst working with Big Data.

Why Should you take Hadoop Analyst Training?

• Average salary for a Big Data Hadoop Analyst is $115,819– ZipRecruiter.com.
• Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015 – Forbes.
• Amazon, Cloudera, Data Stax, DELL, EMC2, IBM, Microsoft & other MNCs worldwide use Hadoop

What you will Learn in this Course?

Hadoop Fundamentals
• The Motivation for Hadoop
• Hadoop Overview
• Data Storage: HDFS
• Distributed Data Processing: YARN, MapReduce, and Spark
• Data Processing and Analysis: Pig, Hive, and Impala
• Data Integration: Sqoop
• Other Hadoop Data Tools
• Exercise Scenarios Explanation
Introduction to Pig
• What Is Pig?
• Pig’s Features
• Pig Use Cases
• Interacting with Pig
Basic Data Analysis with Pig
• Pig Latin Syntax
• Loading Data
• Simple Data Types
• Field Definitions
• Data Output
• Viewing the Schema
• Filtering and Sorting Data
• Commonly-Used Functions
Processing Complex Data with Pig
• Storage Formats
• Complex/Nested Data Types
• Grouping
• Built-In Functions for Complex Data
• Iterating Grouped Data
Multi-Dataset Operations with Pig
• Techniques for Combining Data Sets
• Joining Data Sets in Pig
• Set Operations
• Splitting Data Sets
Pig Troubleshooting and Optimization
• Troubleshooting Pig
• Logging
• Using Hadoop’s Web UI
• Data Sampling and Debugging
• Performance Overview
• Understanding the Execution Plan
• Tips for Improving the Performance of Your Pig Jobs
Introduction to Hive and Impala
• What Is Hive?
• What Is Impala?
• Schema and Data Storage
• Comparing Hive to Traditional Databases
• Hive Use Cases
Querying with Hive and Impala
• Databases and Tables
• Basic Hive and Impala Query Language Syntax
• Data Types
• Differences Between Hive and Impala Query Syntax
• Using Hue to Execute Queries
• Using the Impala Shell
Data Management
• Data Storage
• Creating Databases and Tables
• Loading Data
• Altering Databases and Tables
• Simplifying Queries with Views
• Storing Query Results
Data Storage and Performance
• Partitioning Tables
• Choosing a File Format
• Managing Metadata
• Controlling Access to Data
Relational Data Analysis with Hive and Impala
• Joining Datasets
• Common Built-In Functions
• Aggregation and Windowing
Working with Impala
• How Impala Executes Queries
• Extending Impala with User-Defined Functions
• Improving Impala Performance
Analyzing Text and Complex Data with Hive
• Complex Values in Hive
• Using Regular Expressions in Hive
• Sentiment Analysis and N-Grams
• Conclusion
Hive Optimization
• Understanding Query Performance
• Controlling Job Execution Plan
• Bucketing
• Indexing Data
Extending Hive
• SerDes
• Data Transformation with Custom Scripts
• User-Defined Functions
• Parameterized Queries
Choosing the Best Tool for the Job
• Comparing MapReduce, Pig, Hive, Impala, and Relational Databases

Got a question for us? Please mention it in the comments section and we will get back to you.

 

0 responses on "Deep Dive into Apache Pig Functions: Load & Store, Bag & Tuple, String, Date-time, Math"

Leave a Message

Your email address will not be published. Required fields are marked *