Deep Dive into Oozie Bundle System and CLI & Extensions

Last updated on Jun 19 2021
Avinash M

Table of Contents

Apache Oozie – Bundle

The Oozie Bundle system allows the user to define and execute a bunch of coordinator applications often called a knowledge pipeline. there’s no explicit dependency among the coordinator applications during a bundle. However, a user could use the info dependency of coordinator applications to make an implicit data application pipeline.

The user will be able to start/stop/suspend/resume/rerun in the bundle level resulting in a better and easy operational control.

Bundle

Let’s extend our workflow and coordinator example to a bundle.

<bundle-app xmlns = 'uri:oozie:bundle:0.1'

name = 'bundle_copydata_from_external_orc'>

<controls>

<kick-off-time>${kickOffTime}</kick-off-time>

</controls>

<coordinator name = 'coord_copydata_from_external_orc' >

<app-path>pathof_coordinator_xml</app-path>

<configuration>

<property>

<name>startTime1</name>

<value>time to start</value>

</property>

</configuration>

</coordinator>

</bundle-app>

Kick-off-time − The time when a bundle should start and submit coordinator applications.

There can be more than one coordinator in a bundle.

Bundle Job Status

At any time, a bundle job is in one of the following status: PREP, RUNNING, PREPSUSPENDED, SUSPENDED, PREPPAUSED, PAUSED, SUCCEEDED, DONWITHERROR, KILLED, FAILED.

Valid bundle job status transitions are −

  • PREP− PREPSUSPENDED | PREPPAUSED | RUNNING | KILLED
  • RUNNING− SUSPENDED | PAUSED | SUCCEEDED | DONWITHERROR | KILLED | FAILED
  • PREPSUSPENDED− PREP | KILLED
  • SUSPENDED− RUNNING | KILLED
  • PREPPAUSED− PREP | KILLED
  • PAUSED− SUSPENDED | RUNNING | KILLED
  • When a bundle job is submitted, Oozie parses the bundle job XML. Oozie then creates a record for the bundle with status PREPand returns a unique ID.
  • When a user requests to suspend a bundle job that is in PREPstate, Oozie puts the job in status PREPSUSPEND. Similarly, when the pause time reaches for a bundle job with PREP status, Oozie puts the job in status PREPPAUSED.
  • Conversely, when a user requests to resume a PREPSUSPENDED bundle job, Oozie puts the job in status PREP. And when pause time is reset for a bundle job that is in PREPPAUSEDstate, Oozie puts the job in status PREP.
  • There are two ways a bundle job could be started. * If kick-off-time(defined in the bundle xml) reaches. The default value is null, which means starts coordinators NOW. * If user sends a start request to START the bundle.
  • When a bundle job starts, Oozie puts the job in status RUNNINGand it submits the all coordinator jobs.
  • When a user requests to kill a bundle job, Oozie puts the job in status KILLEDand it sends kill to all submitted coordinator jobs.
  • When a user requests to suspend a bundle job that is not in PREPstatus, Oozie puts the job in status SUSPEND and it suspends all submitted coordinator jobs.
  • When pause time reaches for a bundle job that is not in PREPstatus, Oozie puts the job in status PAUSED. When the paused time is reset, Oozie puts back the job in status RUNNING.

When all the coordinator jobs finish, Oozie updates the bundle status accordingly. If all coordinators reach to the same terminal state, the bundle job status also moves to the same status. For example, if all coordinators are SUCCEEDED, Oozie puts the bundle job into SUCCEEDED status. However, if all coordinator jobs don’t finish with the same status, Oozie puts the bundle job into DONEWITHERROR.

Apache Oozie – CLI and Extensions

In the last part of this blog , let’s touch base some of the other important concepts in Oozie.

Command Line Tools

We have seen a few commands earlier to run the jobs of workflow, coordinator and bundle. Oozie provides a command line utility, Oozie, to perform job and admin tasks.

oozie version : show client version

Following are some of the other job operations −

oozie job <OPTIONS> :

-action <arg> coordinator rerun on action ids (requires -rerun); coordinator log

retrieval on action ids (requires -log)

-auth <arg> select authentication type [SIMPLE|KERBEROS]

-change <arg> change a coordinator/bundle job

-config <arg> job configuration file ‘.xml’ or ‘.properties’

-D <property = value> set/override value for given property

-date <arg> coordinator/bundle rerun on action dates (requires -rerun)

-definition <arg> job definition

-doas <arg> doAs user, impersonates as the specified user

-dryrun Supported in Oozie-2.0 or later versions ONLY – dryrun or test run a

coordinator job, job is not queued

-info <arg> info of a job

-kill <arg> kill a job

-len <arg> number of actions (default TOTAL ACTIONS, requires -info)

-localtime use local time (default GMT)

-log <arg> job log

-nocleanup do not clean up output-events of the coordinator rerun actions (requires

-rerun)

-offset <arg> job info offset of actions (default ‘1’, requires -info)

-oozie <arg> Oozie URL

-refresh re-materialize the coordinator rerun actions (requires -rerun)

-rerun <arg> rerun a job (coordinator requires -action or -date; bundle requires

-coordinator or -date)

-resume <arg> resume a job

-run run a job

-start <arg> start a job

-submit submit a job

-suspend <arg> suspend a job

-value <arg> new endtime/concurrency/pausetime value for changing a coordinator

job;new pausetime value for changing a bundle job

-verbose verbose mode

To check the status of the job, following commands can be used.

-auth <arg> select authentication type [SIMPLE|KERBEROS]

-doas <arg> doAs user, impersonates as the specified user.

-filter <arg> user = <U>; name = <N>; group = <G>; status = <S>; …

-jobtype <arg> job type (‘Supported in Oozie-2.0 or later versions ONLY –

coordinator’ or ‘wf’ (default))

-len <arg> number of jobs (default ‘100’)

-localtime use local time (default GMT)

-offset <arg> jobs offset (default ‘1’)

-oozie <arg> Oozie URL

-verbose verbose mode

For example − To check the status of the Oozie system you can run the following command −

oozie admin -oozie http://localhost:8080/oozie -status

Validating a Workflow XML −

oozie validate myApp/workflow.xml

It performs an XML Schema validation on the specified workflow XML file.

Action Extensions

We have seen hive extensions. Similarly, Oozie provides more action extensions few of them are as below −

Email Action

The email action allows sending emails in Oozie from a workflow application. An email action must provide to addresses, cc addresses (optional), a subject and a body. Multiple recipients of an email can be provided as comma separated addresses.

All the values specified in the email action can be parameterized (templated) using EL expressions.

Example

<workflow-app name = "sample-wf" xmlns = "uri:oozie:workflow:0.1">

...

<action name = "an-email">

<email xmlns = "uri:oozie:email-action:0.1">

<to>julie@xyz.com,max@abc.com</to>

<cc>jax@xyz.com</cc>

<subject>Email notifications for ${wf:id()}</subject>

<body>The wf ${wf:id()} successfully completed.</body>

</email>

<ok to = "main_job"/>

<error to = "kill_job"/>

</action>

...

</workflow-app>

Shell Action

The shell action runs a Shell command. The workflow job will wait until the Shell command completes before continuing to the next action.

To run the Shell job, you have to configure the shell action with the =job-tracker=, name-node and Shell exec elements as well as the necessary arguments and configuration. A shell action can be configured to create or delete HDFS directories before starting the Shell job.

The shell launcher configuration can be specified with a file, using the job-xml element, and inline, using the configuration elements.

Example

How to run any shell script?

<workflow-app xmlns = 'uri:oozie:workflow:0.3' name = 'shell-wf'>

<start to = 'shell1' />

<action name = 'shell1'>

<shell xmlns = "uri:oozie:shell-action:0.1">

<job-tracker>${jobTracker}</job-tracker>

<name-node>${nameNode}</name-node>

<file>path_of_file_name</file>

</shell>

<ok to = "end" />

<error to = "fail" />

</action>


<kill name = "fail">

<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]

</message>

</kill>

<end name = 'end' />

</workflow-app>

Similarly, we can have many more actions like ssh, sqoop, java action, etc.

So, this brings us to the end of blog. This Tecklearn ‘Deep Dive into Oozie Bundle System and CLI & Extensions’ helps you with commonly asked questions if you are looking out for a job in Apache oozie and Big Data Hadoop Testing.

If you wish to learn Oozie and build a career in Big Data Hadoop domain, then check out our interactive, Big Data Hadoop Testing Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/big-data-hadoop-testing/

Big Data Hadoop Testing Training

About the Course

Big Data analysis is emerging as a key advantage in business intelligence for many organizations. Hadoop testing training will provide you with the right skills to detect, analyse and rectify errors in Hadoop framework. You will be trained in the Hadoop software, architecture, MapReduce, HDFS and various components like Sqoop, Flume and Oozie. With this Hadoop testing training you will also be fully equipped with experience in various test case scenarios, proof of concepts implementation and real-world scenarios. It is a comprehensive Hadoop Big Data training course designed by industry experts considering current industry job requirements to help you learn Big Data Hadoop Testing.

Why Should you take Big Data Hadoop Training?

  • The Average Salary for BigData Hadoop Tester ranges from approximately $34.65 per hour for Senior Tester to $124,599 per year for Senior Software Engineer. – Glassdoor.com
  • Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015 – Forbes.
  • Amazon, Cloudera, Data Stax, DELL, EMC2, IBM, Microsoft & other MNCs worldwide use Hadoop.

What you will Learn in this Course?

Introduction to Hadoop

  • The Case for Apache Hadoop
  • Why Hadoop?
  • Core Hadoop Components
  • Fundamental Concepts

HDFS

  • HDFS Features
  • Writing and Reading Files
  • NameNode Memory Considerations
  • Overview of HDFS Security
  • Using the Namenode Web UI
  • Using the Hadoop File Shell

Getting Data into HDFS

  • Ingesting Data from External Sources with Flume
  • Ingesting Data from Relational Databases with Sqoop
  • REST Interfaces
  • Best Practices for Importing Data

Hadoop Testing

  • Hadoop Application Testing
  • Roles and Responsibilities of Hadoop Testing Professional
  • Framework MRUnit for Testing of MapReduce Programs
  • Unit Testing
  • Test Execution
  • Test Plan Strategy and Writing Test Cases for Testing Hadoop Application

Big Data Testing

  • BigData Testing
  • Unit Testing
  • Integration Testing
  • Functional Testing
  • Non-Functional Testing
  • Golden Data Set

System Testing

  • Building and Set up
  • Testing SetUp
  • Solary Server
  • Non-Functional Testing
  • Longevity Testing
  • Volumetric Testing

Security Testing

  • Security Testing
  • Non-Functional Testing
  • Hadoop Cluster
  • Security-Authorization RBA
  • IBM Project

Automation Testing

  • Query Surge Tool

Oozie

  • Why Oozie
  • Installation Engine
  • Oozie Workflow Engine
  • Oozie security
  • Oozie Job Process
  • Oozie terminology
  • Oozie bundle

 

Got a question for us? Please mention it in the comments section and we will get back to you.

 

0 responses on "Deep Dive into Oozie Bundle System and CLI & Extensions"

Leave a Message

Your email address will not be published. Required fields are marked *