Concept of Coordinators applications using Apache Oozie

Last updated on Jun 19 2021
Avinash M

Table of Contents

Apache Oozie – Coordinator

Organizer applications permit clients to plan complex work processes, including work processes that are planned routinely. Oozie Coordinator models the work process execution triggers as time, information or occasion predicates. The work process work referenced inside the Coordinator is began solely after the given conditions are fulfilled.

Coordinators

We shall learn concepts of coordinators with an example.

The initial two hive activities of the work process in our model makes the table. We needn’t bother these steps when we run the workflow in a coordinated manner each time with a given frequency. So, let’s modify the workflow which will then be called by our coordinator.

In a real-life scenario, the external table will have a flowing data and as soon as the data is loaded in the external table, the data will be processed into ORC and from the file.

Modified Workflow

<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "simple-Workflow">

   <start to = "Insert_into_Table" />

   <action name = "Insert_into_Table">

      <hive xmlns = "uri:oozie:hive-action:0.4">

         <job-tracker>${jobTracker}</job-tracker>

         <name-node>${nameNode}</name-node>

         <script>${script_name_copy}</script>

         <param>${database}</param>

      </hive>

      <ok to = "end" />

      <error to = "kill_job" />

   </action>

   <kill name = "kill_job">

      <message>Job failed</message>

   </kill>

   <end name = "end" />

</workflow-app>

Now let’s write a simple coordinator to use this workflow.

<coordinator-app xmlns = "uri:oozie:coordinator:0.2" name =

"coord_copydata_from_external_orc" frequency = "5 * * * *" start =

"2016-00-18T01:00Z" end = "2025-12-31T00:00Z"" timezone = "America/Los_Angeles">

<controls>

<timeout>1</timeout>

<concurrency>1</concurrency>

<execution>FIFO</execution>

<throttle>1</throttle>

</controls>

<action>

<workflow>

<app-path>pathof_workflow_xml/workflow.xml</app-path>

</workflow>

</action>
</coordinator-app>

Meanings of the above given code is as per the following −

  • start − It implies the beginning datetime for the work. Beginning as of now the activities will be appeared.
  • end − The end datetime for the work. At the point when activities will quit being appeared.
  • timezone − The timezone of the facilitator application.
  • frequency − The recurrence, in minutes, to emerge activities.

Control Information

  • timeout − The greatest time, in minutes, that an appeared activity will be trusting that the extra conditions will be fulfilled prior to being disposed of. A timeout of 0 demonstrates that at the hour of appearance the wide range of various conditions should be fulfilled, else the activity will be disposed of. A timeout of 0 shows that if every one of the information occasions are not fulfilled at the hour of activity appearance, the activity ought to break right away. A timeout of – 1 demonstrates no break, the appeared activity will stand by everlastingly for different conditions to be fulfilled. The default value is – 1.
  • concurrency− The maximum number of actions for this job that can be running at the same time. This value allows to materialize and submit multiple instances of the coordinator app, and allows operations to catchup on delayed processing. The default value is 1.
  • execution− Specifies the execution request if different cases of the organizer work have fulfilled their execution models. Valid values are −
    • FIFO (oldest first) default.
    • LIFO (newest first).
    • LAST_ONLY (discards all older materializations).

(Ref of definitions − http://oozie.apache.org/docs/3.2.0-incubating/CoordinatorFunctionalSpec.html#a6.3._Synchronous_Coordinator_Application_Definition)

Above coordinator will run at a given frequency i.e. every 5th minute of an hour. (Similar to a cron job).

To run this coordinator, use the following command.

  • oozie      job−    oozie              http://host_name:8080/oozie                                          –config edgenode_path/job1.properties -D
  • wf.application.path=hdfs−   //Namenodepath/pathof_coordinator_xml/coordinator.xml   -d      “2 minute”`   -run-d  “2minute”  will ensure that the coordinator starts only after 2 minutes of when the job was submitted.

The above coordinator will call the workflow which in turn will call the hive script. This script will insert the data from external table to hive the managed table.

Coordinator Job Status

Like the work process, boundaries can be passed to an organizer likewise utilizing the .properties file. These boundaries are settled utilizing the arrangement properties of Job configuration used to submit the coordinator job.

If a configuration property used in the definitions is not provided with the job configuration used to submit a coordinator job, the value of the parameter will be undefined and the job submission will fail.

At any time, a coordinator job is in one of the following statuses − PREP, RUNNING, PREPSUSPENDED, SUSPENDED, PREPPAUSED, PAUSED, SUCCEEDED, DONWITHERROR, KILLED, FAILED.

Valid coordinator job status transitions are −

  • PREP− PREPSUSPENDED | PREPPAUSED | RUNNING | KILLED
  • RUNNING− SUSPENDED | PAUSED | SUCCEEDED | DONWITHERROR | KILLED | FAILED
  • PREPSUSPENDED− PREP | KILLED
  • SUSPENDED− RUNNING | KILLED
  • PREPPAUSED− PREP | KILLED
  • PAUSED− SUSPENDED | RUNNING | KILLED
  • When a coordinator job is submitted, Oozie parses the coordinator job XML. Oozie then creates a record for the coordinator with status PREPand returns a unique ID. The coordinator is also started immediately if the pause time is not set.
  • When a user requests to suspend a coordinator job that is in status PREP, Oozie puts the job in the status PREPSUSPEND. Similarly, when the pause time reaches for a coordinator job with the status PREP, Oozie puts the job in the status PREPPAUSED.
  • Conversely, when a user requests to resume a PREPSUSPENDcoordinator job, Oozie puts the job in status PREP. And when the pause time is reset for a coordinator job and job status is PREPPAUSED, Oozie puts the job in status PREP.
  • When a coordinator job starts, Oozie puts the job in status RUNNINGand starts materializing workflow jobs based on the job frequency.
  • When a user requests to kill a coordinator job, Oozie puts the job in status KILLEDand it sends kill to all submitted workflow jobs. If any coordinator action finishes with not KILLED, Oozie puts the coordinator job into DONEWITHERROR.
  • When a user requests to suspend a coordinator job that is in status RUNNING, Oozie puts the job in status SUSPENDand it suspends all the submitted workflow jobs.
  • When pause time reaches for a coordinator job that is in status RUNNING, Oozie puts the job in status PAUSED.

Alternately, when a user solicitation to continue a SUSPEND coordinator work, Oozie places the work in status RUNNING. Also, when interruption time is reset for a facilitator work and occupation status is PAUSED, Oozie places the job in status RUNNING.

A coordinator job creates workflow jobs (commonly coordinator actions) only for the duration of the coordinator job and only if the coordinator job is in RUNNING status. If the coordinator job has been suspended, when resumed it will create all the coordinator actions that should have been created during the time it was suspended, actions will not be lost, they will be delayed.

When the coordinator job materialization finishes and all the workflow jobs finish, Oozie updates the coordinator status accordingly. For example, if all the workflows are SUCCEEDED, Oozie puts the coordinator job into SUCCEEDED status. However, if any workflow job finishes with not SUCCEEDED (e.g. KILLED or FAILED or TIMEOUT), then Oozie puts the coordinator job into DONEWITHERROR. If all coordinator actions are TIMEDOUT, Oozie puts the coordinator job into DONEWITHERROR.

(Reference − http://oozie.apache.org/docs/)

Parametrization of a Coordinator

The workflow parameters can be passed to a coordinator as well using the .properties file. These parameters are resolved using the configuration properties of Job configuration used to submit the coordinator job.

If a configuration property used in the definition is not provided with the job configuration used to submit a coordinator job, the value of the parameter will be undefined and the job submission will fail.

 

So, this brings us to the end of blog. This Tecklearn ‘Concept of Coordinators applications using Apache Oozie’ helps you with commonly asked questions if you are looking out for a job in Apache oozie and Big Data Hadoop Testing.

If you wish to learn Oozie and build a career in Big Data Hadoop domain, then check out our interactive, Big Data Hadoop Testing Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/big-data-hadoop-testing/

Big Data Hadoop Testing Training

About the Course

Big Data analysis is emerging as a key advantage in business intelligence for many organizations. Hadoop testing training will provide you with the right skills to detect, analyse and rectify errors in Hadoop framework. You will be trained in the Hadoop software, architecture, MapReduce, HDFS and various components like Sqoop, Flume and Oozie. With this Hadoop testing training you will also be fully equipped with experience in various test case scenarios, proof of concepts implementation and real-world scenarios. It is a comprehensive Hadoop Big Data training course designed by industry experts considering current industry job requirements to help you learn Big Data Hadoop Testing.

Why Should you take Big Data Hadoop Training?

  • The Average Salary for BigData Hadoop Tester ranges from approximately $34.65 per hour for Senior Tester to $124,599 per year for Senior Software Engineer. – Glassdoor.com
  • Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015 – Forbes.
  • Amazon, Cloudera, Data Stax, DELL, EMC2, IBM, Microsoft & other MNCs worldwide use Hadoop.

What you will Learn in this Course?

Introduction to Hadoop

  • The Case for Apache Hadoop
  • Why Hadoop?
  • Core Hadoop Components
  • Fundamental Concepts

HDFS

  • HDFS Features
  • Writing and Reading Files
  • NameNode Memory Considerations
  • Overview of HDFS Security
  • Using the Namenode Web UI
  • Using the Hadoop File Shell

Getting Data into HDFS

  • Ingesting Data from External Sources with Flume
  • Ingesting Data from Relational Databases with Sqoop
  • REST Interfaces
  • Best Practices for Importing Data

Hadoop Testing

  • Hadoop Application Testing
  • Roles and Responsibilities of Hadoop Testing Professional
  • Framework MRUnit for Testing of MapReduce Programs
  • Unit Testing
  • Test Execution
  • Test Plan Strategy and Writing Test Cases for Testing Hadoop Application

Big Data Testing

  • BigData Testing
  • Unit Testing
  • Integration Testing
  • Functional Testing
  • Non-Functional Testing
  • Golden Data Set

System Testing

  • Building and Set up
  • Testing SetUp
  • Solary Server
  • Non-Functional Testing
  • Longevity Testing
  • Volumetric Testing

Security Testing

  • Security Testing
  • Non-Functional Testing
  • Hadoop Cluster
  • Security-Authorization RBA
  • IBM Project

Automation Testing

  • Query Surge Tool

Oozie

  • Why Oozie
  • Installation Engine
  • Oozie Workflow Engine
  • Oozie security
  • Oozie Job Process
  • Oozie terminology
  • Oozie bundle

 

Got a question for us? Please mention it in the comments section and we will get back to you.

 

 

0 responses on "Concept of Coordinators applications using Apache Oozie"

Leave a Message

Your email address will not be published. Required fields are marked *