How to use the select statement in Hive Query Language

Last updated on May 30 2022
Abhimanyu Joshi

Table of Contents

How to use the select statement in Hive Query Language

Hiveql Select…Where

The Hive Query Language (HiveQL) is a query language for Hive to process and analyze structured data in a Metastore. This blog explains way to use the SELECT statement with WHERE clause.

SELECT statement is employed to retrieve the data from a table. WHERE clause works similar to a condition. It filters the data using the condition and gives you a finite result. The built-in operators and functions generate an expression, which fulfils the condition.

Syntax

Given below is the syntax of the SELECT query:

SELECT [ALL | DISTINCT] select_expr, select_expr, …

FROM table_reference

[WHERE where_condition]

[GROUP BY col_list]

[HAVING having_condition]

[CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]]

[LIMIT number];

Example

Let us take an example for SELECT…WHERE clause. Assume we have the employee table as given below, with fields named Id, Name, Salary, Designation, and Dept. Generate a query to retrieve the employee details who earn a salary of more than Rs 30000.

+——+————–+————-+——————-+——–+

| ID   | Name         | Salary      | Designation       | Dept   |

+——+————–+————-+——————-+——–+

|1201  | Gopal        | 45000       | Technical manager | TP     |

|1202  | Manisha      | 45000       | Proofreader       | PR     |

|1203  | Masthanvali  | 40000       | Technical writer  | TP     |

|1204  | Krian        | 40000       | Hr Admin          | HR     |

|1205  | Kranthi      | 30000       | Op Admin          | Admin  |

+——+————–+————-+——————-+——–+

The subsequent query retrieves the employee details using the above scenario:

hive> SELECT * FROM employee WHERE salary>30000;

On successful execution of the query, you get to see the subsequent response:

+——+————–+————-+——————-+——–+

| ID   | Name         | Salary      | Designation       | Dept   |

+——+————–+————-+——————-+——–+

|1201  | Gopal        | 45000       | Technical manager | TP     |

|1202  | Manisha      | 45000       | Proofreader       | PR     |

|1203  | Masthanvali  | 40000       | Technical writer  | TP     |

|1204  | Krian        | 40000       | Hr Admin          | HR     |

+——+————–+————-+——————-+——–+

JDBC Program

The JDBC program to apply where clause for the given example is as follows.

import java.sql.SQLException;

import java.sql.Connection;

import java.sql.ResultSet;

import java.sql.Statement;

import java.sql.DriverManager;

 

public class HiveQLWhere {

private static String driverName =

“org.apache.hadoop.hive.jdbc.HiveDriver”;

public static void main(String[] args) throws SQLException {

// Register driver and create driver instance

Class.forName(driverName);

// get connection

Connection con = DriverManager.

getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”);

// create statement

Statement stmt = con.createStatement();

// execute statement

Resultset res = stmt.executeQuery(“SELECT * FROM employee WHERE

salary>30000;”);

System.out.println(“Result:”);

System.out.println(” ID \t Name \t Salary \t Designation \t Dept “);

while (res.next()) {

System.out.println(res.getInt(1)+” “+ res.getString(2)+” “+

res.getDouble(3)+” “+ res.getString(4)+” “+ res.getString(5));

}

con.close();

}

}

Save the program in a file named HiveQLWhere.java. Use the subsequent commands to compile and execute this program.

$ javac HiveQLWhere.java

$ java HiveQLWhere

Output:

ID       Name           Salary      Designation          Dept

1201     Gopal          45000       Technical manager    TP

1202     Manisha        45000       Proofreader          PR

1203     Masthanvali    40000       Technical writer     TP

1204     Krian          40000       Hr Admin             HR

 

 

Hiveql Select…Order By

This section of blog explains way to use the ORDER BY clause in a SELECT statement. The ORDER BY clause is employed to retrieve the details based on one column and sort the result set by ascending or descending order.

Syntax

Given below is the syntax of the ORDER BY clause:

SELECT [ALL | DISTINCT] select_expr, select_expr, …

FROM table_reference

[WHERE where_condition]

[GROUP BY col_list]

[HAVING having_condition]

[ORDER BY col_list]]

[LIMIT number];

Example

Let us take an example for SELECT…ORDER BY clause. Assume employee table as given below, with the fields named Id, Name, Salary, Designation, and Dept. Generate a query to retrieve the employee details in order by using Department name.

+——+————–+————-+——————-+——–+

| ID   | Name         | Salary      | Designation       | Dept   |

+——+————–+————-+——————-+——–+

|1201  | Gopal        | 45000       | Technical manager | TP     |

|1202  | Manisha      | 45000       | Proofreader       | PR     |

|1203  | Masthanvali  | 40000       | Technical writer  | TP     |

|1204  | Krian        | 40000       | Hr Admin          | HR     |

|1205  | Kranthi      | 30000       | Op Admin          | Admin  |

+——+————–+————-+——————-+——–+

The subsequent query retrieves the employee details using the above scenario:

hive> SELECT Id, Name, Dept FROM employee ORDER BY DEPT;

On successful execution of the query, you get to see the subsequent response:

+——+————–+————-+——————-+——–+

| ID   | Name         | Salary      | Designation       | Dept   |

+——+————–+————-+——————-+——–+

|1205  | Kranthi      | 30000       | Op Admin          | Admin  |

|1204  | Krian        | 40000       | Hr Admin          | HR     |

|1202  | Manisha      | 45000       | Proofreader       | PR     |

|1201  | Gopal        | 45000       | Technical manager | TP     |

|1203  | Masthanvali  | 40000       | Technical writer  | TP     |

+——+————–+————-+——————-+——–+

JDBC Program

Here is the JDBC program to apply Order By clause for the given example.

import java.sql.SQLException;

import java.sql.Connection;

import java.sql.ResultSet;

import java.sql.Statement;

import java.sql.DriverManager;

 

public class HiveQLOrderBy {

private static String driverName =

“org.apache.hadoop.hive.jdbc.HiveDriver”;

public static void main(String[] args) throws SQLException {

// Register driver and create driver instance

Class.forName(driverName);

// get connection

Connection con = DriverManager.

getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”);

// create statement

Statement stmt = con.createStatement();

// execute statement

Resultset res = stmt.executeQuery(“SELECT * FROM employee ORDER BY

DEPT;”);

System.out.println(” ID \t Name \t Salary \t Designation \t Dept “);

while (res.next()) {

System.out.println(res.getInt(1)+” “+ res.getString(2)+” “+

res.getDouble(3)+” “+ res.getString(4)+” “+ res.getString(5));

}

con.close();

}

}

Save the program in a file named HiveQLOrderBy.java. Use the subsequent commands to compile and execute this program.

$ javac HiveQLOrderBy.java

$ java HiveQLOrderBy

Output:

ID       Name           Salary      Designation          Dept

1205     Kranthi        30000       Op Admin             Admin

1204     Krian          40000       Hr Admin             HR

1202     Manisha        45000       Proofreader          PR

1201     Gopal          45000       Technical manager    TP

1203     Masthanvali    40000       Technical writer     TP

1204     Krian          40000       Hr Admin             HR

 

 

Hiveql Group By

This section of blog explains the details of GROUP BY clause in a SELECT statement. The GROUP BY clause is employed to group all the records in a result set using a particular collection column. It’s employed to query a group of records.

Syntax

The syntax of GROUP BY clause is as follows:

SELECT [ALL | DISTINCT] select_expr, select_expr, …

FROM table_reference

[WHERE where_condition]

[GROUP BY col_list]

[HAVING having_condition]

[ORDER BY col_list]]

[LIMIT number];

Example

Let us take an example of SELECT…GROUP BY clause. Assume employee table as given below, with Id, Name, Salary, Designation, and Dept fields. Generate a query to retrieve the number of employees in each department.

+——+————–+————-+——————-+——–+

| ID   | Name         | Salary      | Designation       | Dept   |

+——+————–+————-+——————-+——–+

|1201  | Gopal        | 45000       | Technical manager | TP     |

|1202  | Manisha      | 45000       | Proofreader       | PR     |

|1203  | Masthanvali  | 40000       | Technical writer  | TP     |

|1204  | Krian        | 45000       | Proofreader       | PR     |

|1205  | Kranthi      | 30000       | Op Admin          | Admin  |

+——+————–+————-+——————-+——–+

The subsequent query retrieves the employee details using the above scenario.

hive> SELECT Dept,count(*) FROM employee GROUP BY DEPT;

On successful execution of the query, you get to see the subsequent response:

+——+————–+

| Dept | Count(*)     |

+——+————–+

|Admin |    1         |

|PR    |    2         |

|TP    |    3         |

+——+————–+

JDBC Program

Given below is the JDBC program to apply the Group By clause for the given example.

import java.sql.SQLException;

import java.sql.Connection;

import java.sql.ResultSet;

import java.sql.Statement;

import java.sql.DriverManager;

 

public class HiveQLGroupBy {

private static String driverName =

“org.apache.hadoop.hive.jdbc.HiveDriver”;

public static void main(String[] args) throws SQLException {

// Register driver and create driver instance

Class.forName(driverName);

// get connection

Connection con = DriverManager.

getConnection(“jdbc:hive://localhost:10000/userdb”, “”, “”);

// create statement

Statement stmt = con.createStatement();

// execute statement

Resultset res = stmt.executeQuery(“SELECT Dept,count(*) ”

+“FROM employee GROUP BY DEPT; ”);

System.out.println(” Dept \t count(*)”);

while (res.next()) {

System.out.println(res.getString(1)+” “+ res.getInt(2));

}

con.close();

}

}

Save the program in a file named HiveQLGroupBy.java. Use the subsequent commands to compile and execute this program.

$ javac HiveQLGroupBy.java

$ java HiveQLGroupBy

Output:

Dept     Count(*)

Admin       1

PR          2

TP          3

 

So, this brings us to the end of blog. This Tecklearn ‘How to use the select statement in Hive Query Language’ helps you with commonly asked questions if you are looking out for a job in Big Data and Hadoop Domain.

If you wish to learn Hive and build a career in Big Data or Hadoop domain, then check out our interactive, Big Data Hadoop-Architect (All in 1) Combo Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

 

https://www.tecklearn.com/course/bigdata-hadoop-architect-all-in-1-combo-course/

Big Data Hadoop-Architect (All in 1) Combo Training

About the Course

Tecklearn’s Big Data Hadoop-Architect (All in 1) combo includes the following Courses:

  • BigData Hadoop Analyst
  • BigData Hadoop Developer
  • BigData Hadoop Administrator
  • BigData Hadoop Tester
  • Big Data Security with Kerberos

Why Should you take BigData Hadoop Combo Training?

  • Average salary for a Hadoop Administrator ranges from approximately $104,528 to $141,391 per annum – Indeed.com
  • Average salary for a Spark and Hadoop Developer ranges from approximately $106,366 to $127,619 per annum – Indeed.com
  • Average salary for a Big Data Hadoop Analyst is $115,819– ZipRecruiter.com

What you will Learn in this Course?

Introduction

  • The Case for Apache Hadoop
  • Why Hadoop?
  • Core Hadoop Components
  • Fundamental Concepts

HDFS

  • HDFS Features
  • Writing and Reading Files
  • NameNode Memory Considerations
  • Overview of HDFS Security
  • Using the Namenode Web UI
  • Using the Hadoop File Shell

Getting Data into HDFS

  • Ingesting Data from External Sources with Flume
  • Ingesting Data from Relational Databases with Sqoop
  • REST Interfaces
  • Best Practices for Importing Data

YARN and MapReduce

  • What Is MapReduce?
  • Basic MapReduce Concepts
  • YARN Cluster Architecture
  • Resource Allocation
  • Failure Recovery
  • Using the YARN Web UI
  • MapReduce Version 1

Planning Your Hadoop Cluster

  • General Planning Considerations
  • Choosing the Right Hardware
  • Network Considerations
  • Configuring Nodes
  • Planning for Cluster Management

Hadoop Installation and Initial Configuration

  • Deployment Types
  • Installing Hadoop
  • Specifying the Hadoop Configuration
  • Performing Initial HDFS Configuration
  • Performing Initial YARN and MapReduce Configuration
  • Hadoop Logging

Installing and Configuring Hive, Impala, and Pig

  • Hive
  • Impala
  • Pig

Hadoop Clients

  • What is a Hadoop Client?
  • Installing and Configuring Hadoop Clients
  • Installing and Configuring Hue
  • Hue Authentication and Authorization

Cloudera Manager

  • The Motivation for Cloudera Manager
  • Cloudera Manager Features
  • Express and Enterprise Versions
  • Cloudera Manager Topology
  • Installing Cloudera Manager
  • Installing Hadoop Using Cloudera Manager
  • Performing Basic Administration Tasks Using Cloudera Manager

Advanced Cluster Configuration

  • Advanced Configuration Parameters
  • Configuring Hadoop Ports
  • Explicitly Including and Excluding Hosts
  • Configuring HDFS for Rack Awareness
  • Configuring HDFS High Availability

Hadoop Security

  • Why Hadoop Security Is Important
  • Hadoop’s Security System Concepts
  • What Kerberos Is and How it Works
  • Securing a Hadoop Cluster with Kerberos

Managing and Scheduling Jobs

  • Managing Running Jobs
  • Scheduling Hadoop Jobs
  • Configuring the Fair Scheduler
  • Impala Query Scheduling

Cluster Maintenance

  • Checking HDFS Status
  • Copying Data Between Clusters
  • Adding and Removing Cluster Nodes
  • Rebalancing the Cluster
  • Cluster Upgrading

Cluster Monitoring and Troubleshooting

  • General System Monitoring
  • Monitoring Hadoop Clusters
  • Common Troubleshooting Hadoop Clusters
  • Common Misconfigurations

Introduction to Pig

  • What Is Pig?
  • Pig’s Features
  • Pig Use Cases
  • Interacting with Pig

Basic Data Analysis with Pig

  • Pig Latin Syntax
  • Loading Data
  • Simple Data Types
  • Field Definitions
  • Data Output
  • Viewing the Schema
  • Filtering and Sorting Data
  • Commonly-Used Functions

Processing Complex Data with Pig

  • Storage Formats
  • Complex/Nested Data Types
  • Grouping
  • Built-In Functions for Complex Data
  • Iterating Grouped Data

Multi-Dataset Operations with Pig

  • Techniques for Combining Data Sets
  • Joining Data Sets in Pig
  • Set Operations
  • Splitting Data Sets

Pig Troubleshooting and Optimization

  • Troubleshooting Pig
  • Logging
  • Using Hadoop’s Web UI
  • Data Sampling and Debugging
  • Performance Overview
  • Understanding the Execution Plan
  • Tips for Improving the Performance of Your Pig Jobs

Introduction to Hive and Impala

  • What Is Hive?
  • What Is Impala?
  • Schema and Data Storage
  • Comparing Hive to Traditional Databases
  • Hive Use Cases

Querying with Hive and Impala

  • Databases and Tables
  • Basic Hive and Impala Query Language Syntax
  • Data Types
  • Differences Between Hive and Impala Query Syntax
  • Using Hue to Execute Queries
  • Using the Impala Shell

Data Management

  • Data Storage
  • Creating Databases and Tables
  • Loading Data
  • Altering Databases and Tables
  • Simplifying Queries with Views
  • Storing Query Results

Data Storage and Performance

  • Partitioning Tables
  • Choosing a File Format
  • Managing Metadata
  • Controlling Access to Data

Relational Data Analysis with Hive and Impala

  • Joining Datasets
  • Common Built-In Functions
  • Aggregation and Windowing

Working with Impala 

  • How Impala Executes Queries
  • Extending Impala with User-Defined Functions
  • Improving Impala Performance

Analyzing Text and Complex Data with Hive

  • Complex Values in Hive
  • Using Regular Expressions in Hive
  • Sentiment Analysis and N-Grams
  • Conclusion

Hive Optimization 

  • Understanding Query Performance
  • Controlling Job Execution Plan
  • Bucketing
  • Indexing Data

Extending Hive 

  • SerDes
  • Data Transformation with Custom Scripts
  • User-Defined Functions
  • Parameterized Queries

Importing Relational Data with Apache Sqoop

  • Sqoop Overview
  • Basic Imports and Exports
  • Limiting Results
  • Improving Sqoop’s Performance
  • Sqoop 2

Introduction to Impala and Hive

  • Introduction to Impala and Hive
  • Why Use Impala and Hive?
  • Comparing Hive to Traditional Databases
  • Hive Use Cases

Modelling and Managing Data with Impala and Hive

  • Data Storage Overview
  • Creating Databases and Tables
  • Loading Data into Tables
  • HCatalog
  • Impala Metadata Caching

Data Formats

  • Selecting a File Format
  • Hadoop Tool Support for File Formats
  • Avro Schemas
  • Using Avro with Hive and Sqoop
  • Avro Schema Evolution
  • Compression

Data Partitioning

  • Partitioning Overview
  • Partitioning in Impala and Hive

Capturing Data with Apache Flume

  • What is Apache Flume?
  • Basic Flume Architecture
  • Flume Sources
  • Flume Sinks
  • Flume Channels
  • Flume Configuration

Spark Basics

  • What is Apache Spark?
  • Using the Spark Shell
  • RDDs (Resilient Distributed Datasets)
  • Functional Programming in Spark

Working with RDDs in Spark

  • A Closer Look at RDDs
  • Key-Value Pair RDDs
  • MapReduce
  • Other Pair RDD Operations

Writing and Deploying Spark Applications

  • Spark Applications vs. Spark Shell
  • Creating the SparkContext
  • Building a Spark Application (Scala and Java)
  • Running a Spark Application
  • The Spark Application Web UI
  • Configuring Spark Properties
  • Logging

Parallel Programming with Spark

  • Review: Spark on a Cluster
  • RDD Partitions
  • Partitioning of File-based RDDs
  • HDFS and Data Locality
  • Executing Parallel Operations
  • Stages and Tasks

Spark Caching and Persistence

  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

Common Patterns in Spark Data Processing

  • Common Spark Use Cases
  • Iterative Algorithms in Spark
  • Graph Processing and Analysis
  • Machine Learning
  • Example: k-means

Preview: Spark SQL

  • Spark SQL and the SQL Context
  • Creating DataFrames
  • Transforming and Querying DataFrames
  • Saving DataFrames
  • Comparing Spark SQL with Impala

Hadoop Testing

  • Hadoop Application Testing
  • Roles and Responsibilities of Hadoop Testing Professional
  • Framework MRUnit for Testing of MapReduce Programs
  • Unit Testing
  • Test Execution
  • Test Plan Strategy and Writing Test Cases for Testing Hadoop Application

Big Data Testing

  • BigData Testing
  • Unit Testing
  • Integration Testing
  • Functional Testing
  • Non-Functional Testing
  • Golden Data Set

System Testing

  • Building and Set up
  • Testing SetUp
  • Solary Server
  • Non-Functional Testing
  • Longevity Testing
  • Volumetric Testing

Security Testing

  • Security Testing
  • Non-Functional Testing
  • Hadoop Cluster
  • Security-Authorization RBA
  • IBM Project

Automation Testing

  • Query Surge Tool

Oozie

  • Why Oozie
  • Installation Engine
  • Oozie Workflow Engine
  • Oozie security
  • Oozie Job Process
  • Oozie terminology
  • Oozie bundle

Got a question for us? Please mention it in the comments section and we will get back to you.

0 responses on "How to use the select statement in Hive Query Language"

Leave a Message

Your email address will not be published. Required fields are marked *