The Hive Query Language (HiveQL) is a query language for Hive to process and analyze structured data in a Metastore. This blog explains way to use the SELECT statement with WHERE clause.

SELECT statement is employed to retrieve the data from a table. WHERE clause works similar to a condition. It filters the data using the condition and gives you a finite result. The built-in operators and functions generate an expression, which fulfils the condition.

Syntax

Given below is the syntax of the SELECT query:

SELECT [ALL | DISTINCT] select_expr, select_expr, …

FROM table_reference

[WHERE where_condition]

[GROUP BY col_list]

[HAVING having_condition]

[CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]]

[LIMIT number];

Example

Let us take an example for SELECT…WHERE clause. Assume we have the employee table as given below, with fields named Id, Name, Salary, Designation, and Dept. Generate a query to retrieve the employee details who earn a salary of more than Rs 30000.

+——+————–+————-+——————-+——–+

+——+————–+————-+——————-+——–+

|1201 | Gopal | 45000 | Technical manager | TP |

|1202 | Manisha | 45000 | Proofreader | PR |

|1203 | Masthanvali | 40000 | Technical writer | TP |

|1204 | Krian | 40000 | Hr Admin | HR |

+——+————–+————-+——————-+——–+

The subsequent query retrieves the employee details using the above scenario:

hive> SELECT * FROM employee WHERE salary>30000;

On successful execution of the query, you get to see the subsequent response:

+——+————–+————-+——————-+——–+

+——+————–+————-+——————-+——–+

|1201 | Gopal | 45000 | Technical manager | TP |

|1202 | Manisha | 45000 | Proofreader | PR |

|1203 | Masthanvali | 40000 | Technical writer | TP |

|1204 | Krian | 40000 | Hr Admin | HR |

+——+————–+————-+——————-+——–+

JDBC Program

The JDBC program to apply where clause for the given example is as follows.

import java.sql.SQLException;

import java.sql.Connection;

import java.sql.ResultSet;

import java.sql.Statement;

import java.sql.DriverManager;

public class HiveQLWhere {

private static String driverName =

“org.apache.hadoop.hive.jdbc.HiveDriver”;

public static void main(String[] args) throws SQLException {

// Register driver and create driver instance

Class.forName(driverName);

// get connection

Connection con = DriverManager.

getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”);

// create statement

Statement stmt = con.createStatement();

// execute statement

Resultset res = stmt.executeQuery(“SELECT * FROM employee WHERE

salary>30000;”);

System.out.println(“Result:”);

System.out.println(” ID \t Name \t Salary \t Designation \t Dept “);

while (res.next()) {

System.out.println(res.getInt(1)+” “+ res.getString(2)+” “+

res.getDouble(3)+” “+ res.getString(4)+” “+ res.getString(5));

}

con.close();

}

Save the program in a file named HiveQLWhere.java. Use the subsequent commands to compile and execute this program.

$ javac HiveQLWhere.java

$ java HiveQLWhere

Output:

ID Name Salary Designation Dept

1201 Gopal 45000 Technical manager TP

1202 Manisha 45000 Proofreader PR

1203 Masthanvali 40000 Technical writer TP

1204 Krian 40000 Hr Admin HR

Hiveql Select…Order By

This section of blog explains way to use the ORDER BY clause in a SELECT statement. The ORDER BY clause is employed to retrieve the details based on one column and sort the result set by ascending or descending order.

Syntax

Given below is the syntax of the ORDER BY clause:

SELECT [ALL | DISTINCT] select_expr, select_expr, …

FROM table_reference

[WHERE where_condition]

[GROUP BY col_list]

[HAVING having_condition]

[ORDER BY col_list]]

[LIMIT number];

Example

Let us take an example for SELECT…ORDER BY clause. Assume employee table as given below, with the fields named Id, Name, Salary, Designation, and Dept. Generate a query to retrieve the employee details in order by using Department name.

+——+————–+————-+——————-+——–+

+——+————–+————-+——————-+——–+

|1201 | Gopal | 45000 | Technical manager | TP |

|1202 | Manisha | 45000 | Proofreader | PR |

|1203 | Masthanvali | 40000 | Technical writer | TP |

|1204 | Krian | 40000 | Hr Admin | HR |

+——+————–+————-+——————-+——–+

The subsequent query retrieves the employee details using the above scenario:

hive> SELECT Id, Name, Dept FROM employee ORDER BY DEPT;

On successful execution of the query, you get to see the subsequent response:

+——+————–+————-+——————-+——–+

+——+————–+————-+——————-+——–+

|1204 | Krian | 40000 | Hr Admin | HR |

|1202 | Manisha | 45000 | Proofreader | PR |

|1201 | Gopal | 45000 | Technical manager | TP |

|1203 | Masthanvali | 40000 | Technical writer | TP |

+——+————–+————-+——————-+——–+

JDBC Program

Here is the JDBC program to apply Order By clause for the given example.

import java.sql.SQLException;

import java.sql.Connection;

import java.sql.ResultSet;

import java.sql.Statement;

import java.sql.DriverManager;

public class HiveQLOrderBy {

private static String driverName =

“org.apache.hadoop.hive.jdbc.HiveDriver”;

public static void main(String[] args) throws SQLException {

// Register driver and create driver instance

Class.forName(driverName);

// get connection

Connection con = DriverManager.

getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”);

// create statement

Statement stmt = con.createStatement();

// execute statement

Resultset res = stmt.executeQuery(“SELECT * FROM employee ORDER BY

DEPT;”);

System.out.println(” ID \t Name \t Salary \t Designation \t Dept “);

while (res.next()) {

System.out.println(res.getInt(1)+” “+ res.getString(2)+” “+

res.getDouble(3)+” “+ res.getString(4)+” “+ res.getString(5));

}

con.close();

}

Save the program in a file named HiveQLOrderBy.java. Use the subsequent commands to compile and execute this program.

$ javac HiveQLOrderBy.java

$ java HiveQLOrderBy

Output:

ID Name Salary Designation Dept

1205 Kranthi 30000 Op Admin Admin

1204 Krian 40000 Hr Admin HR

1202 Manisha 45000 Proofreader PR

1201 Gopal 45000 Technical manager TP

1203 Masthanvali 40000 Technical writer TP

1204 Krian 40000 Hr Admin HR

Hiveql Group By

This section of blog explains the details of GROUP BY clause in a SELECT statement. The GROUP BY clause is employed to group all the records in a result set using a particular collection column. It’s employed to query a group of records.

Syntax

The syntax of GROUP BY clause is as follows:

SELECT [ALL | DISTINCT] select_expr, select_expr, …

FROM table_reference

[WHERE where_condition]

[GROUP BY col_list]

[HAVING having_condition]

[ORDER BY col_list]]

[LIMIT number];

Example

Let us take an example of SELECT…GROUP BY clause. Assume employee table as given below, with Id, Name, Salary, Designation, and Dept fields. Generate a query to retrieve the number of employees in each department.

+——+————–+————-+——————-+——–+

+——+————–+————-+——————-+——–+

|1201 | Gopal | 45000 | Technical manager | TP |

|1202 | Manisha | 45000 | Proofreader | PR |

|1203 | Masthanvali | 40000 | Technical writer | TP |

|1204 | Krian | 45000 | Proofreader | PR |

+——+————–+————-+——————-+——–+

The subsequent query retrieves the employee details using the above scenario.

hive> SELECT Dept,count(*) FROM employee GROUP BY DEPT;

On successful execution of the query, you get to see the subsequent response:

+——+————–+

| Dept | Count(*) |

+——+————–+

|Admin | 1 |

|PR | 2 |

|TP | 3 |

+——+————–+

JDBC Program

Given below is the JDBC program to apply the Group By clause for the given example.

import java.sql.SQLException;

import java.sql.Connection;

import java.sql.ResultSet;

import java.sql.Statement;

import java.sql.DriverManager;

public class HiveQLGroupBy {

private static String driverName =

“org.apache.hadoop.hive.jdbc.HiveDriver”;

public static void main(String[] args) throws SQLException {

// Register driver and create driver instance

Class.forName(driverName);

// get connection

Connection con = DriverManager.

getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”);

// create statement

Statement stmt = con.createStatement();

// execute statement

Resultset res = stmt.executeQuery(“SELECT Dept,count(*) ”

+“FROM employee GROUP BY DEPT; ”);

System.out.println(” Dept \t count(*)”);

while (res.next()) {

System.out.println(res.getString(1)+” “+ res.getInt(2));

}

con.close();

}

Save the program in a file named HiveQLGroupBy.java. Use the subsequent commands to compile and execute this program.

$ javac HiveQLGroupBy.java

$ java HiveQLGroupBy

Output:

Dept Count(*)

Admin 1

PR 2

TP 3

So, this brings us to the end of blog. This Tecklearn ‘How to use the select statement in Hive Query Language’ helps you with commonly asked questions if you are looking out for a job in Big Data and Hadoop Domain.

If you wish to learn Hive and build a career in Big Data or Hadoop domain, then check out our interactive, Big Data Hadoop-Architect (All in 1) Combo Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/bigdata-hadoop-architect-all-in-1-combo-course/

Big Data Hadoop-Architect (All in 1) Combo Training

About the Course

Tecklearn’s Big Data Hadoop-Architect (All in 1) combo includes the following Courses:

BigData Hadoop Analyst
BigData Hadoop Developer
BigData Hadoop Administrator
BigData Hadoop Tester
Big Data Security with Kerberos

Why Should you take BigData Hadoop Combo Training?

Average salary for a Hadoop Administrator ranges from approximately $104,528 to $141,391 per annum – Indeed.com
Average salary for a Spark and Hadoop Developer ranges from approximately $106,366 to $127,619 per annum – Indeed.com
Average salary for a Big Data Hadoop Analyst is $115,819– ZipRecruiter.com

What you will Learn in this Course?

Introduction

The Case for Apache Hadoop
Why Hadoop?
Core Hadoop Components
Fundamental Concepts

HDFS

HDFS Features
Writing and Reading Files
NameNode Memory Considerations
Overview of HDFS Security
Using the Namenode Web UI
Using the Hadoop File Shell

Getting Data into HDFS

Ingesting Data from External Sources with Flume
Ingesting Data from Relational Databases with Sqoop
REST Interfaces
Best Practices for Importing Data

YARN and MapReduce

What Is MapReduce?
Basic MapReduce Concepts
YARN Cluster Architecture
Resource Allocation
Failure Recovery
Using the YARN Web UI
MapReduce Version 1

Planning Your Hadoop Cluster

General Planning Considerations
Choosing the Right Hardware
Network Considerations
Configuring Nodes
Planning for Cluster Management

Hadoop Installation and Initial Configuration

Deployment Types
Installing Hadoop
Specifying the Hadoop Configuration
Performing Initial HDFS Configuration
Performing Initial YARN and MapReduce Configuration
Hadoop Logging

Installing and Configuring Hive, Impala, and Pig

Hive
Impala
Pig

Hadoop Clients

What is a Hadoop Client?
Installing and Configuring Hadoop Clients
Installing and Configuring Hue
Hue Authentication and Authorization

Cloudera Manager

The Motivation for Cloudera Manager
Cloudera Manager Features
Express and Enterprise Versions
Cloudera Manager Topology
Installing Cloudera Manager
Installing Hadoop Using Cloudera Manager
Performing Basic Administration Tasks Using Cloudera Manager

Advanced Cluster Configuration

Advanced Configuration Parameters
Configuring Hadoop Ports
Explicitly Including and Excluding Hosts
Configuring HDFS for Rack Awareness
Configuring HDFS High Availability

Hadoop Security

Why Hadoop Security Is Important
Hadoop’s Security System Concepts
What Kerberos Is and How it Works
Securing a Hadoop Cluster with Kerberos

Managing and Scheduling Jobs

Managing Running Jobs
Scheduling Hadoop Jobs
Configuring the Fair Scheduler
Impala Query Scheduling

Cluster Maintenance

Checking HDFS Status
Copying Data Between Clusters
Adding and Removing Cluster Nodes
Rebalancing the Cluster
Cluster Upgrading

Cluster Monitoring and Troubleshooting

General System Monitoring
Monitoring Hadoop Clusters
Common Troubleshooting Hadoop Clusters
Common Misconfigurations

Introduction to Pig

What Is Pig?
Pig’s Features
Pig Use Cases
Interacting with Pig

Basic Data Analysis with Pig

Pig Latin Syntax
Loading Data
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Filtering and Sorting Data
Commonly-Used Functions

Processing Complex Data with Pig

Storage Formats
Complex/Nested Data Types
Grouping
Built-In Functions for Complex Data
Iterating Grouped Data

Multi-Dataset Operations with Pig

Techniques for Combining Data Sets
Joining Data Sets in Pig
Set Operations
Splitting Data Sets

Pig Troubleshooting and Optimization

Troubleshooting Pig
Logging
Using Hadoop’s Web UI
Data Sampling and Debugging
Performance Overview
Understanding the Execution Plan
Tips for Improving the Performance of Your Pig Jobs

Introduction to Hive and Impala

What Is Hive?
What Is Impala?
Schema and Data Storage
Comparing Hive to Traditional Databases
Hive Use Cases

Querying with Hive and Impala

Databases and Tables
Basic Hive and Impala Query Language Syntax
Data Types
Differences Between Hive and Impala Query Syntax
Using Hue to Execute Queries
Using the Impala Shell

Data Management

Data Storage
Creating Databases and Tables
Loading Data
Altering Databases and Tables
Simplifying Queries with Views
Storing Query Results

Data Storage and Performance

Partitioning Tables
Choosing a File Format
Managing Metadata
Controlling Access to Data

Relational Data Analysis with Hive and Impala

Joining Datasets
Common Built-In Functions
Aggregation and Windowing

Working with Impala

How Impala Executes Queries
Extending Impala with User-Defined Functions
Improving Impala Performance

Analyzing Text and Complex Data with Hive

Complex Values in Hive
Using Regular Expressions in Hive
Sentiment Analysis and N-Grams
Conclusion

Hive Optimization

Understanding Query Performance
Controlling Job Execution Plan
Bucketing
Indexing Data

Extending Hive

SerDes
Data Transformation with Custom Scripts
User-Defined Functions
Parameterized Queries

Importing Relational Data with Apache Sqoop

Sqoop Overview
Basic Imports and Exports
Limiting Results
Improving Sqoop’s Performance
Sqoop 2

Introduction to Impala and Hive

Introduction to Impala and Hive
Why Use Impala and Hive?
Comparing Hive to Traditional Databases
Hive Use Cases

Modelling and Managing Data with Impala and Hive

Data Storage Overview
Creating Databases and Tables
Loading Data into Tables
HCatalog
Impala Metadata Caching

Data Formats

Selecting a File Format
Hadoop Tool Support for File Formats
Avro Schemas
Using Avro with Hive and Sqoop
Avro Schema Evolution
Compression

Data Partitioning

Partitioning Overview
Partitioning in Impala and Hive

Capturing Data with Apache Flume

What is Apache Flume?
Basic Flume Architecture
Flume Sources
Flume Sinks
Flume Channels
Flume Configuration

Spark Basics

What is Apache Spark?
Using the Spark Shell
RDDs (Resilient Distributed Datasets)
Functional Programming in Spark

Working with RDDs in Spark

A Closer Look at RDDs
Key-Value Pair RDDs
MapReduce
Other Pair RDD Operations

Writing and Deploying Spark Applications

Spark Applications vs. Spark Shell
Creating the SparkContext
Building a Spark Application (Scala and Java)
Running a Spark Application
The Spark Application Web UI
Configuring Spark Properties
Logging

Parallel Programming with Spark

Review: Spark on a Cluster
RDD Partitions
Partitioning of File-based RDDs
HDFS and Data Locality
Executing Parallel Operations
Stages and Tasks

Spark Caching and Persistence

RDD Lineage
Caching Overview
Distributed Persistence

Common Patterns in Spark Data Processing

Common Spark Use Cases
Iterative Algorithms in Spark
Graph Processing and Analysis
Machine Learning
Example: k-means

Preview: Spark SQL

Spark SQL and the SQL Context
Creating DataFrames
Transforming and Querying DataFrames
Saving DataFrames
Comparing Spark SQL with Impala

Hadoop Testing

Hadoop Application Testing
Roles and Responsibilities of Hadoop Testing Professional
Framework MRUnit for Testing of MapReduce Programs
Unit Testing
Test Execution
Test Plan Strategy and Writing Test Cases for Testing Hadoop Application

Big Data Testing

BigData Testing
Unit Testing
Integration Testing
Functional Testing
Non-Functional Testing
Golden Data Set

System Testing

Building and Set up
Testing SetUp
Solary Server
Non-Functional Testing
Longevity Testing
Volumetric Testing

Security Testing

Security Testing
Non-Functional Testing
Hadoop Cluster
Security-Authorization RBA
IBM Project

Automation Testing

Query Surge Tool

Oozie

Why Oozie
Installation Engine
Oozie Workflow Engine
Oozie security
Oozie Job Process
Oozie terminology
Oozie bundle

Got a question for us? Please mention it in the comments section and we will get back to you.

555