How to perform changes on a given table and how to delete table in Impala

Last updated on May 30 2022
Swati Dogra

Table of Contents

How to perform changes on a given table and how to delete table in Impala

Impala – Alter Table

The Alter table statement in Impala is used to perform changes on a given table. Using this statement, we can add, delete, or modify columns in an existing table and we can also rename it.

This Chapter explains various types of alter statements with syntax and examples. First of all assume that we have a table named customers in the my_db database in Impala, with the following data

ID   NAME     AGE   ADDRESS    SALARY

— ——— —– ———– ——–

1   Ramesh    32    Ahmedabad   20000

2   Khilan    25    Delhi       15000

3   Hardik    27    Bhopal      40000

4   Chaitali  25    Mumbai      35000

5   kaushik   23    Kota        30000

6   Komal     22    Mp          32000

And, if you get the list of tables in the database my_db, you can find the customers table in it as shown below.

[quickstart.cloudera:21000] > show tables;

 

Query: show tables

+———–+

| name      |

+———–+

| customers |

| employee  |

| student   |

| student1  |

+———–+

Altering the name of a table

Syntax

The basic syntax of ALTER TABLE to rename an existing table is as follows −

ALTER TABLE [old_db_name.]old_table_name RENAME TO [new_db_name.]new_table_name

Example

Following is an example of changing the name of the table using the alter statement. Here we are changing the name of the table customers to users.

[quickstart.cloudera:21000] > ALTER TABLE my_db.customers RENAME TO my_db.users;

After executing the above query, Impala changes the name of the table as required, displaying the following message.

Query: alter TABLE my_db.customers RENAME TO my_db.users

You can verify the list of tables in the current database using the show tables statement. You can find the table named users instead of customers.

Query: show tables

+———-+

| name     |

+———-+

| employee |

| student  |

| student1 |

| users    |

+———-+

Fetched 4 row(s) in 0.10s

Adding columns to a table

Syntax

The basic syntax of ALTER TABLE to add columns to an existing table is as follows −

ALTER TABLE name ADD COLUMNS (col_spec[, col_spec …])

Example

The following query is an example demonstrating how to add columns to an existing table. Here we are adding two columns account_no and phone_number (both are of bigint data type) to the users table.

[quickstart.cloudera:21000] > ALTER TABLE users ADD COLUMNS (account_no BIGINT,

phone_no BIGINT);

On executing the above query, it will add the specified columns to the table named student, displaying the following message.

Query: alter TABLE users ADD COLUMNS (account_no BIGINT, phone_no BIGINT)

If you verify the schema of the table users, you can find the newly added columns in it as shown below.

quickstart.cloudera:21000] > describe users;

 

Query: describe users

+————+——–+———+

| name       | type   | comment |

+————+——–+———+

| id         | int    |         |

| name       | string |         |

| age        | int    |         |

| address    | string |         |

| salary     | bigint |         |

| account_no | bigint |         |

| phone_no   | bigint |         |

+————+——–+———+

Fetched 7 row(s) in 0.20s

Dropping columns from a table

Syntax

The basic syntax of ALTER TABLE to DROP COLUMN in an existing table is as follows −

ALTER TABLE name DROP [COLUMN] column_name

Example

The following query is an example of deleting columns from an existing table. Here we are deleting the column named account_no.

[quickstart.cloudera:21000] > ALTER TABLE users DROP account_no;

On executing the above query, Impala deletes the column named account_no displaying the following message.

Query: alter TABLE users DROP account_no

If you verify the schema of the table users, you cannot find the column named account_no since it was deleted.

[quickstart.cloudera:21000] > describe users;

 

Query: describe users

+———-+——–+———+

| name     | type   | comment |

+———-+——–+———+

| id       | int    |         |

| name     | string |         |

| age      | int    |         |

| address  | string |         |

| salary   | bigint |         |

| phone_no | bigint |         |

+———-+——–+———+

Fetched 6 row(s) in 0.11s

Changing the name and type of a column

Syntax

The basic syntax of ALTER TABLE to change the name and datatype of a column in an existing table is as follows −

ALTER TABLE name CHANGE column_name new_name new_type

Example

Following is an example of changing the name and datatype of a column using the alter statement. Here we are changing the name of the column phone_no to email and its data type to string.

[quickstart.cloudera:21000] > ALTER TABLE users CHANGE phone_no e_mail string;

On executing the above query, Impala does the specified changes, displaying the following message.

Query: alter TABLE users CHANGE phone_no e_mail string

You can verify the metadata of the table users using the describe statement. You can observe that Impala has done the required changes to the specified column.

[quickstart.cloudera:21000] > describe users;

Query: describe users

+———-+——–+———+

| name     | type   | comment |

+———-+——–+———+

| id       | int    |         |

| name     | string |         |

| age      | int    |         |

| address  | string |         |

| salary   | bigint |         |

| phone_no | bigint |         |

+———-+——–+———+

Fetched 6 row(s) in 0.11s

Altering a Table using Hue

Open the Impala Query editor and type the alter statement in it and click on the execute button as shown in the following screenshot.

image1 2

On executing the above query, it will change the name of the table customers to users. In the same way, we can execute all the alter queries.

Impala – Drop a Table

The Impala drop table statement is used to delete an existing table in Impala. This statement also deletes the underlying HDFS files for internal tables

NOTE − You have to be careful while using this command because once a table is deleted, then all the information available in the table would also be lost forever.

Syntax

Following is the syntax of the DROP TABLE Statement. Here, IF EXISTS is an optional clause. If we use this clause, a table with the given name is deleted, only if it exists. Otherwise, no operation will be carried out.

DROP table database_name.table_name;

If you try to delete a table that doesn’t exist without the IF EXISTS clause, an error will be generated. Optionally you can specify database_name along with table_name.

Example

Let us first verify the list of tables in the database my_db as shown below.

[quickstart.cloudera:21000] > show tables;

 

Query: show tables

+————+

| name       |

+————+

| customers  |

| employee   |

| student    |

+————+

Fetched 3 row(s) in 0.11s

From the above result, you can observe that the database my_db contains 3 tables

Following is an example of the drop table statement. In this example, we are deleting the table named student from the database my_db.

[quickstart.cloudera:21000] > drop table if exists my_db.student;

On executing the above query, a table with the specified name will be deleted, displaying the following output.

Query: drop table if exists student

Verification

The show Tables query gives a list of the tables in the current database in Impala. Therefore, you can verify whether a table is deleted, using the Show Tables statement.

First of all, you need to switch the context to the database in which the required table exists, as shown below.

[quickstart.cloudera:21000] > use my_db;

Query: use my_db

Then, if you get the list of tables using the show tables query, you can observe the table named student is not in the list.

[quickstart.cloudera:21000] > show tables;

 

Query: show tables

+———–+

| name      |

+———–+

| customers |

| employee  |

| student   |

+———–+

Fetched 3 row(s) in 0.11s

Creating a Database using Hue Browser

Open Impala Query editor and type the drop Table Statement in it. And click on the execute button as shown in the following screenshot.

image2 2

 

After executing the query, gently move the cursor to the top of the dropdown men

 

u and you will find a refresh symbol. If you click on the refresh symbol, the list of databases will be refreshed and the recent changes done are applied to it.

image3

Verification

Click on the drop down under the heading DATABASE on the left-hand side of the editor. There you can see a list of databases; select the database my_db as shown below.

On selecting the database my_db, you can see a list of tables in it as shown below. Here you cannot find the deleted table student in the list as shown below.

image4

So, this brings us to the end of blog. This Tecklearn ‘How to perform changes on a given table and how to delete table in Impala’ helps you with commonly asked questions if you are looking out for a job in Big Data and Hadoop Domain.

If you wish to learn Impala and build a career in Big Data or Hadoop domain, then check out our interactive, Big Data Hadoop-Architect (All in 1) Combo Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/bigdata-hadoop-architect-all-in-1-combo-course/

Big Data Hadoop-Architect (All in 1) Combo Training

About the Course

Tecklearn’s Big Data Hadoop-Architect (All in 1) combo includes the following Courses:

  • BigData Hadoop Analyst
  • BigData Hadoop Developer
  • BigData Hadoop Administrator
  • BigData Hadoop Tester
  • Big Data Security with Kerberos

Why Should you take BigData Hadoop Combo Training?

  • Average salary for a Hadoop Administrator ranges from approximately $104,528 to $141,391 per annum – Indeed.com
  • Average salary for a Spark and Hadoop Developer ranges from approximately $106,366 to $127,619 per annum – Indeed.com
  • Average salary for a Big Data Hadoop Analyst is $115,819– ZipRecruiter.com

What you will Learn in this Course?

Introduction

  • The Case for Apache Hadoop
  • Why Hadoop?
  • Core Hadoop Components
  • Fundamental Concepts

HDFS

  • HDFS Features
  • Writing and Reading Files
  • NameNode Memory Considerations
  • Overview of HDFS Security
  • Using the Namenode Web UI
  • Using the Hadoop File Shell

Getting Data into HDFS

  • Ingesting Data from External Sources with Flume
  • Ingesting Data from Relational Databases with Sqoop
  • REST Interfaces
  • Best Practices for Importing Data

YARN and MapReduce

  • What Is MapReduce?
  • Basic MapReduce Concepts
  • YARN Cluster Architecture
  • Resource Allocation
  • Failure Recovery
  • Using the YARN Web UI
  • MapReduce Version 1

Planning Your Hadoop Cluster

  • General Planning Considerations
  • Choosing the Right Hardware
  • Network Considerations
  • Configuring Nodes
  • Planning for Cluster Management

Hadoop Installation and Initial Configuration

  • Deployment Types
  • Installing Hadoop
  • Specifying the Hadoop Configuration
  • Performing Initial HDFS Configuration
  • Performing Initial YARN and MapReduce Configuration
  • Hadoop Logging

Installing and Configuring Hive, Impala, and Pig

  • Hive
  • Impala
  • Pig

Hadoop Clients

  • What is a Hadoop Client?
  • Installing and Configuring Hadoop Clients
  • Installing and Configuring Hue
  • Hue Authentication and Authorization

Cloudera Manager

  • The Motivation for Cloudera Manager
  • Cloudera Manager Features
  • Express and Enterprise Versions
  • Cloudera Manager Topology
  • Installing Cloudera Manager
  • Installing Hadoop Using Cloudera Manager
  • Performing Basic Administration Tasks Using Cloudera Manager

Advanced Cluster Configuration

  • Advanced Configuration Parameters
  • Configuring Hadoop Ports
  • Explicitly Including and Excluding Hosts
  • Configuring HDFS for Rack Awareness
  • Configuring HDFS High Availability

Hadoop Security

  • Why Hadoop Security Is Important
  • Hadoop’s Security System Concepts
  • What Kerberos Is and How it Works
  • Securing a Hadoop Cluster with Kerberos

Managing and Scheduling Jobs

  • Managing Running Jobs
  • Scheduling Hadoop Jobs
  • Configuring the Fair Scheduler
  • Impala Query Scheduling

Cluster Maintenance

  • Checking HDFS Status
  • Copying Data Between Clusters
  • Adding and Removing Cluster Nodes
  • Rebalancing the Cluster
  • Cluster Upgrading

Cluster Monitoring and Troubleshooting

  • General System Monitoring
  • Monitoring Hadoop Clusters
  • Common Troubleshooting Hadoop Clusters
  • Common Misconfigurations

Introduction to Pig

  • What Is Pig?
  • Pig’s Features
  • Pig Use Cases
  • Interacting with Pig

Basic Data Analysis with Pig

  • Pig Latin Syntax
  • Loading Data
  • Simple Data Types
  • Field Definitions
  • Data Output
  • Viewing the Schema
  • Filtering and Sorting Data
  • Commonly-Used Functions

Processing Complex Data with Pig

  • Storage Formats
  • Complex/Nested Data Types
  • Grouping
  • Built-In Functions for Complex Data
  • Iterating Grouped Data

Multi-Dataset Operations with Pig

  • Techniques for Combining Data Sets
  • Joining Data Sets in Pig
  • Set Operations
  • Splitting Data Sets

Pig Troubleshooting and Optimization

  • Troubleshooting Pig
  • Logging
  • Using Hadoop’s Web UI
  • Data Sampling and Debugging
  • Performance Overview
  • Understanding the Execution Plan
  • Tips for Improving the Performance of Your Pig Jobs

Introduction to Hive and Impala

  • What Is Hive?
  • What Is Impala?
  • Schema and Data Storage
  • Comparing Hive to Traditional Databases
  • Hive Use Cases

Querying with Hive and Impala

  • Databases and Tables
  • Basic Hive and Impala Query Language Syntax
  • Data Types
  • Differences Between Hive and Impala Query Syntax
  • Using Hue to Execute Queries
  • Using the Impala Shell

Data Management

  • Data Storage
  • Creating Databases and Tables
  • Loading Data
  • Altering Databases and Tables
  • Simplifying Queries with Views
  • Storing Query Results

Data Storage and Performance

  • Partitioning Tables
  • Choosing a File Format
  • Managing Metadata
  • Controlling Access to Data

Relational Data Analysis with Hive and Impala

  • Joining Datasets
  • Common Built-In Functions
  • Aggregation and Windowing

Working with Impala 

  • How Impala Executes Queries
  • Extending Impala with User-Defined Functions
  • Improving Impala Performance

Analyzing Text and Complex Data with Hive

  • Complex Values in Hive
  • Using Regular Expressions in Hive
  • Sentiment Analysis and N-Grams
  • Conclusion

Hive Optimization 

  • Understanding Query Performance
  • Controlling Job Execution Plan
  • Bucketing
  • Indexing Data

Extending Hive 

  • SerDes
  • Data Transformation with Custom Scripts
  • User-Defined Functions
  • Parameterized Queries

Importing Relational Data with Apache Sqoop

  • Sqoop Overview
  • Basic Imports and Exports
  • Limiting Results
  • Improving Sqoop’s Performance
  • Sqoop 2

Introduction to Impala and Hive

  • Introduction to Impala and Hive
  • Why Use Impala and Hive?
  • Comparing Hive to Traditional Databases
  • Hive Use Cases

Modelling and Managing Data with Impala and Hive

  • Data Storage Overview
  • Creating Databases and Tables
  • Loading Data into Tables
  • HCatalog
  • Impala Metadata Caching

Data Formats

  • Selecting a File Format
  • Hadoop Tool Support for File Formats
  • Avro Schemas
  • Using Avro with Hive and Sqoop
  • Avro Schema Evolution
  • Compression

Data Partitioning

  • Partitioning Overview
  • Partitioning in Impala and Hive

Capturing Data with Apache Flume

  • What is Apache Flume?
  • Basic Flume Architecture
  • Flume Sources
  • Flume Sinks
  • Flume Channels
  • Flume Configuration

Spark Basics

  • What is Apache Spark?
  • Using the Spark Shell
  • RDDs (Resilient Distributed Datasets)
  • Functional Programming in Spark

Working with RDDs in Spark

  • A Closer Look at RDDs
  • Key-Value Pair RDDs
  • MapReduce
  • Other Pair RDD Operations

Writing and Deploying Spark Applications

  • Spark Applications vs. Spark Shell
  • Creating the SparkContext
  • Building a Spark Application (Scala and Java)
  • Running a Spark Application
  • The Spark Application Web UI
  • Configuring Spark Properties
  • Logging

Parallel Programming with Spark

  • Review: Spark on a Cluster
  • RDD Partitions
  • Partitioning of File-based RDDs
  • HDFS and Data Locality
  • Executing Parallel Operations
  • Stages and Tasks

Spark Caching and Persistence

  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

Common Patterns in Spark Data Processing

  • Common Spark Use Cases
  • Iterative Algorithms in Spark
  • Graph Processing and Analysis
  • Machine Learning
  • Example: k-means

Preview: Spark SQL

  • Spark SQL and the SQL Context
  • Creating DataFrames
  • Transforming and Querying DataFrames
  • Saving DataFrames
  • Comparing Spark SQL with Impala

Hadoop Testing

  • Hadoop Application Testing
  • Roles and Responsibilities of Hadoop Testing Professional
  • Framework MRUnit for Testing of MapReduce Programs
  • Unit Testing
  • Test Execution
  • Test Plan Strategy and Writing Test Cases for Testing Hadoop Application

Big Data Testing

  • BigData Testing
  • Unit Testing
  • Integration Testing
  • Functional Testing
  • Non-Functional Testing
  • Golden Data Set

System Testing

  • Building and Set up
  • Testing SetUp
  • Solary Server
  • Non-Functional Testing
  • Longevity Testing
  • Volumetric Testing

Security Testing

  • Security Testing
  • Non-Functional Testing
  • Hadoop Cluster
  • Security-Authorization RBA
  • IBM Project

Automation Testing

  • Query Surge Tool

Oozie

  • Why Oozie
  • Installation Engine
  • Oozie Workflow Engine
  • Oozie security
  • Oozie Job Process
  • Oozie terminology
  • Oozie bundle

Got a question for us? Please mention it in the comments section and we will get back to you.

 

 

0 responses on "How to perform changes on a given table and how to delete table in Impala"

Leave a Message

Your email address will not be published. Required fields are marked *