The target table must exist in the target database. The files which are given as input to the Sqoop contain records, which are called rows in table. Those are read and parsed into a set of records and delimited with user-specified delimiter.

Insert all the record from the input files to the database table using the INSERT statement is the default operation. In update mode, Sqoop generates the UPDATE statement that updates the existing record into the database.

Syntax

The following is the syntax for the export command.

$ sqoop export (generic-args) (export-args)

$ sqoop-export (generic-args) (export-args)

Example

Let us take an example of the employee data in file, in HDFS. The employee data is available in emp_data file in ‘emp/’ directory in HDFS. The emp_data is as follows.

1201, gopal,     manager, 50000, TP

1202, manisha,   preader, 50000, TP

1203, kalil,     php dev, 30000, AC

1204, prasanth,  php dev, 30000, AC

1205, kranthi,   admin,   20000, TP

1206, satish p,  grp des, 20000, GR

It is mandatory that the table to be exported is present in the database from where it has to be exported and is created manually.

The following query is used to create the table ‘employee’ in mysql command line.

$ mysql

mysql> USE db;

mysql> CREATE TABLE employee (

   id INT NOT NULL PRIMARY KEY,

   name VARCHAR(20),

   deg VARCHAR(20),

   salary INT,

   dept VARCHAR(10));

The following command is used to export the table data (which is in emp_data file on HDFS) to the employee table in db database of Mysql database server.

$ sqoop export \

--connect jdbc:mysql://localhost/db \

--username root \

--table employee \

--export-dir /emp/emp_data

The following command is used to verify the table in mysql command line.

mysql>select * from employee;

If the given data is stored successfully, then you can find the following table of given employee data.

+——+————–+————-+——————-+——–+

+——+————–+————-+——————-+——–+

| 1201 | gopal | manager | 50000 | TP |

| 1202 | manisha | preader | 50000 | TP |

| 1203 | kalil | php dev | 30000 | AC |

| 1204 | prasanth | php dev | 30000 | AC |

| 1205 | kranthi | admin | 20000 | TP |

| 1206 | satish p | grp des | 20000 | GR |

+——+————–+————-+——————-+——–+

Sqoop – Job

This section of blog describes how to create and maintain the Sqoop jobs. Sqoop job creates and saves the import and export commands. It specifies parameters to identify and recall the saved job. This re-calling or re-execution is used in the incremental import, which can import the updated rows from RDBMS table to HDFS.

Syntax

The following is the syntax for creating a Sqoop job.

$ sqoop job (generic-args) (job-args)

   [-- [subtool-name] (subtool-args)]




$ sqoop-job (generic-args) (job-args)

   [-- [subtool-name] (subtool-args)]

Create Job (–create)

Here we are creating a job with the name myjob, which can import the table data from RDBMS table to HDFS. The following command is used to create a job that is importing data from the employee table in the db database to the HDFS file.

$ sqoop job --create myjob \

-- import \

--connect jdbc:mysql://localhost/db \

--username root \

--table employee --m 1

Verify Job (–list)

‘–list’ argument is used to verify the saved jobs. The following command is used to verify the list of saved Sqoop jobs.

$ sqoop job --list

It shows the list of saved jobs.

Available jobs:

   myjob

Inspect Job (–show)

‘–show’ argument is used to inspect or verify particular jobs and their details. The following command and sample output is used to verify a job called myjob.

$ sqoop job --show myjob

It shows the tools and their options, which are used in myjob.

Job: myjob

 Tool: import Options:

 ----------------------------

 direct.import = true

 codegen.input.delimiters.record = 0

 hdfs.append.dir = false

 db.table = employee

 ...

 incremental.last.value = 1206

 ...

Execute Job (–exec)

‘–exec’ option is used to execute a saved job. The following command is used to execute a saved job called myjob.

$ sqoop job --exec myjob

It shows you the following output.

10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation

...

So, this brings us to the end of blog. This Tecklearn ‘How to export data back from Hadoop HDFS to RDBMS and Create and maintain the Sqoop jobs’ helps you with commonly asked questions if you are looking out for a job in Apache Sqoop and Big Data Hadoop Developer.

If you wish to learn Sqoop and build a career in Big Data Hadoop domain, then check out our interactive, Big Data Hadoop Developer Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/big-data-spark-and-hadoop-developer/

Big Data Spark and Hadoop Developer Training

About the Course

Big Data analysis is emerging as a key advantage in business intelligence for many organizations. In this Big Data course, you will master MapReduce, Hive, Pig, Sqoop, Oozie and Flume, Spark framework and RDD, Scala and Spark SQL, Machine Learning using Spark, Spark Streaming, etc. It is a comprehensive Hadoop Big Data training course designed by industry experts considering current industry job requirements to help you learn Big Data Hadoop and Spark modules. This Cloudera Hadoop and Spark training will prepare you to clear Cloudera CCA175 Big Data certification.

Why Should you take Spark and Hadoop Developer Training?

Average salary for a Spark and Hadoop Developer ranges from approximately $106,366 to $127,619 per annum – Indeed.com.
Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015 – Forbes.
Amazon, Cloudera, Data Stax, DELL, EMC2, IBM, Microsoft & other MNCs worldwide use Hadoop

What you will Learn in this Course?

Introduction to Hadoop and the Hadoop Ecosystem

Problems with Traditional Large-scale Systems
Hadoop!
The Hadoop Ecosystem

Hadoop Architecture and HDFS

Distributed Processing on a Cluster
Storage: HDFS Architecture • Storage: Using HDFS
Resource Management: YARN Architecture
Resource Management: Working with YARN

Importing Relational Data with Apache Sqoop

Sqoop Overview
Basic Imports and Exports
Limiting Results
Improving Sqoop’s Performance
Sqoop 2

Introduction to Impala and Hive

Introduction to Impala and Hive
Why Use Impala and Hive?
Comparing Hive to Traditional Databases
Hive Use Cases

Modelling and Managing Data with Impala and Hive

Data Storage Overview
Creating Databases and Tables
Loading Data into Tables
HCatalog
Impala Metadata Caching

Data Formats

Selecting a File Format
Hadoop Tool Support for File Formats
Avro Schemas
Using Avro with Hive and Sqoop
Avro Schema Evolution
Compression

Data Partitioning

Partitioning Overview
Partitioning in Impala and Hive

Capturing Data with Apache Flume

What is Apache Flume?
Basic Flume Architecture
Flume Sources
Flume Sinks
Flume Channels
Flume Configuration

Spark Basics

What is Apache Spark?
Using the Spark Shell
RDDs (Resilient Distributed Datasets)
Functional Programming in Spark

Working with RDDs in Spark

A Closer Look at RDDs
Key-Value Pair RDDs
MapReduce
Other Pair RDD Operations

Writing and Deploying Spark Applications

Spark Applications vs. Spark Shell
Creating the Spark Context
Building a Spark Application (Scala and Java)
Running a Spark Application
The Spark Application Web UI
Configuring Spark Properties
Logging

Parallel Programming with Spark

Review: Spark on a Cluster
RDD Partitions
Partitioning of File-based RDDs
HDFS and Data Locality
Executing Parallel Operations
Stages and Tasks

Spark Caching and Persistence

RDD Lineage
Caching Overview
Distributed Persistence

Common Patterns in Spark Data Processing

Common Spark Use Cases
Iterative Algorithms in Spark
Graph Processing and Analysis
Machine Learning
Example: k-means

Preview: Spark SQL

Spark SQL and the SQL Context
Creating DataFrames
Transforming and Querying DataFrames
Saving DataFrames
Comparing Spark SQL with Impala

Got a question for us? Please mention it in the comments section and we will get back to you.

632