• Home
  • Big Data
  • Top Apache Oozie Interview Questions and Answers You must Prepare Gaurav

Top Apache Oozie Interview Questions and Answers You must Prepare Gaurav

Last updated on Feb 18 2022
Gaurav S Gaurav is a technology enthusiast working as a Sr. Research Analyst. He has expertise in domains like Big data, Artificial Intelligence and Cloud Computing

Table of Contents

Top Apache Oozie Interview Questions and Answers You must Prepare

Explain the Apache Oozie.

Oozie may be a workflow scheduler for Hadoop Oozie allows a user to make Directed A cyclic Graphs of workflows and these are often running in parallel and sequential in Hadoop. It also can run plain java classes, Pig workflows and interact with the HDFS. It can run jobs sequentially and in parallel.

cyclic graph of apache oozie workflow
cyclic graph of apache oozie workflow

What’s the necessity for Apache Oozie?

Apache Oozie provides an excellent thanks to handle multiple jobs. There are differing types of jobs that users want to schedule to be run later or the tasks that require to follow a selected sequence during execution. These sorts of executions are often made easy with the assistance of Apache Oozie. Using Apache Oozie, the administrator or the user can execute the varied independent jobs parallelly, run the roles back-to-back following a particular sequence, or can control the roles from anywhere thus, making it very useful.

What are the most components of the Apache Oozie workflow?

The Apache Oozie workflow consists of the control flow nodes and action nodes.

• Control flow nodes. These nodes define the beginning and end of the workflow, i.e., start, end, and fail. Besides, it also offers the mechanism that manages the execution path within the workflow, i.e., decision, fork, and join.
• Action nodes. These nodes offer the mechanism that initiates the execution of the processing or computation task. Oozie supports different actions, including Hadoop MapReduce, Pig, and filing system, and system-specific jobs like HTTP, SSH, and email.

control flow nodes in apache oozie workflow
control flow nodes in apache oozie workflow

 What’s the utilization of Join and Fork nodes in Oozie?

The fork and join nodes in Oozie get utilized in pairs. The fork node splits the execution path into many multiprogramming paths. The join node joins the 2 or more multiprogramming paths into one. The join node is that the children of the fork nodes that concurrently join to form join nodes.

 What are a number of the useful EL functions within the Oozie workflow?

Below is that the list of some useful EL functions of Oozie workflow.

• wf. name() – It returns the appliance name within the workflow.
• wf. id() – This function returns the work id of the currently running workflow job.
• wf.errorCode(String node) – It returns the error code of the executing action node.
• wf.lastErrorNod() – This function returns the name of the last executed action node during a workflow that exits with a mistake .

Explain the various nodes supported in Oozie workflow.

• Map Reduce Action. This action node initiates the Hadoop Map-Reduce job
• Pig Action. This node is employed to start out the Pig job from Apache Oozie workflow.
• Java Action. it’s the sub-workflow action node that helps within the execution of public static void main (String[] args) method of main java class in Oozie workflow.
• FS (HDFS) Action. This action node allows the Oozie workflow to control all the HDFS-related files and directories. Also, it supports commands like mkdir, move, chmod, delete, chgrp, and touchz.

Various nodes in oozie workflow
Various nodes in oozie workflow

What’s Oozie Bundle?

Oozie bundle allows the user to execute the work in batches. The Oozie bundle jobs are started, stopped, suspended, resumed, re-run, or killed in batches, thus providing better operational control.

Flowchart depiciting Oozie Bundle
Flowchart depiciting Oozie Bundle

Explain the Pipeline works In Oozie

The pipeline in Oozie helps in connecting the multiple jobs during a workflow that executes regularly but during different intervals. during this pipeline, the output of multiple executions of workflow becomes the input of subsequent scheduled job within the workflow that gets executed back-to-back within the pipeline. The joined chain of workflows forms the Oozie pipeline of jobs.

Explain the life cycle of the Oozie workflow job

Flowchart depicting life cycle of Oozie workflow Job
Flowchart depicting life cycle of Oozie workflow Job

• PREP – this is often the state when the user creates the workflow job. During PREP state, the work is merely defined and isn’t running.
• RUNNING – When the work starts, it changes to the RUNNING state and remains during this state until the work reaches the top state, a mistake occurs, or the work is suspended.
• SUSPENDED – The state of the work in Oozie workflow changes to SUSPENDED if the work is suspended in between. the work will remain during this state until it’s killed or resumed.
• SUCCEEDED – The workflow job becomes SUCCEEDED when the work reaches the top node.
• KILLED – The workflow job transitions to KILLED state when the administrator kills any job in PREP, RUNNING OR SUSPENDED states
• FAILED – the work state changes into a FAILED state when the running job fails thanks to an unexpected error.

How long is that the Oozie Log files retained before being deleted?

It is retained upto 30 days or upto a complete of 720 log files are generated.

What is the instruction choice to check the status of workflow/coordinator or bundle action in oozie?

oozie job -oozie http.//localhost.8080/oozie -info

What is bundle in oozie and what controls does it have?

Bundle may be a higher-level abstraction in oozie that batches a group of coordinator applications. In bundle, the user has the control to start/stop/suspend/resume/rerun the roles within the bundle level leading to a far better and straightforward operational control.

List the varied status a bundle job will go under transition?

PREP, RUNNING, RUNNINGWITHERROR, SUSPENDED, PREPSUSPENDED, SUSPENDEDWITHERROR, PAUSED, PAUSEDWITHERROR, PREPPAUSED, SUCCEEDED, DONEWITHERROR, KILLED, FAILED

Name the spark action element tag where the small print about the driving force & executor memory and extra configuration properties are specified?

It is < spark-opts > within spark action
Ex .
< master > yarn-cluster < /master >
< name > Spark Example < /name >
< jar > pi.py
< spark-opts > –executor-memory 20G –num-executors 50
–conf spark.executor.extraJavaOptions=”-XX.+HeapDumpOnOutOfMemoryError -XX.HeapDumpPath=/tmp” < /spark-opts >
< arg > 100 < /arg >

Describe the utilization of element tag within the action nodes in oozie?

The prepare element, if present, indicates an inventory of paths to delete or create before starting the work .The path to be created/deleted should be hdfs localtion.

< prepare >
< delete path=”[PATH]”/ >
< mkdir path=”[PATH]”/ >

< /prepare >

Is it mandatory to offer password when the oozie workflow contains Hive2 action node and connects to Hive2?

Yes, passwords are required for a secured Hive Server2 that’s backed by authentication like LDAP.

Name the oozie action extension that won’t to copy from files from one cluster to a different or within an equivalent cluster?

DistCp action
Ex .
< action name=”[NODE-NAME]” >
< distcp =”uri.oozie.workflow.0.1″ >

< action name=”[NODE-NAME]” >
< sub-workflow >
< app-path > [WF-APPLICATION-PATH] < /app-path >
< propagate-configuration/ >
< configuration >
< property >
< name > [PROPERTY-NAME] < /name >
< value > [PROPERTY-VALUE] < /value >
< /property >

< /configuration >
< /sub-workflow >
……

What quite application is Oozie?

Oozie may be a server-based application which holds an embedded Tomcat server.

How to check and verify whether the Oozie workflow.xml file is syntactically correct and parses correctly consistent with XML format?

This can be checked via oozie instruction tool called “validate”. It performs an XML Schema validation on the required workflow XML file.
Ex. $ oozie validate < app-name > /workflow.xml

What is the instruction syntax to submit and begin either a Workflow/Coordinator or Bundle job?

Submitting employment
—————-
$ oozie job -oozie http.//localhost.8080/oozie -config job.properties -submit
job .. < job-id >

The parameters for the work must be provided in either a properties file or .xml, which must be specified with the -config option.
The workflow/coordiantor/bundle application path must be laid out in the file with the oozie.wf.application/coordinator/bundle.path property.Specified path must be an HDFS path.
The job are going to be created, but it’ll not be started, it’ll be in PREP status.

Starting employment
————–
$ oozie job -oozie http.//localhost.8080/oozie -start < job-id >
The start option starts a previously submitted workflow job, coordinator job or bundle job that’s in PREP status.The status is then changed to RUNNING.

List the various sort of Oozie jobs?

Oozie Workflow – it’s a set of actions arranged during a Directed Acyclic Graph (DAG)
Oozie coordinator – Are recurrent Oozie Workflow jobs that are triggered by time and data availability
Oozie Bundle – A higher-level oozie abstraction that batches a group of coordinator applications. The user has the control
to start/stop/suspend/resume/rerun within the bundle level resulting a far better and straightforward operational control

How are workflow nodes classified?

It is classified as two types.
Control flow nodes. nodes that control the beginning and end of the workflow and workflow job execution path.
Action nodes. nodes that trigger the execution of a computation/processing task

Name the action node which acts sort of a switch-case statement within the oozie workflow?

Decision Control Node – a choice node enables a workflow to form a variety on the execution path to follow.

A decision node consists of an inventory of predicates-transition pairs plus a default transition. Predicates are evaluated so as or appearance until one among them evaluates to true and therefore the corresponding transition is taken. If none of the predicates evaluates to true the default transition is taken.
Predicates are JSP Expression Language (EL) expressions
< workflow-app name=”[WF-DEF-NAME]” =”uri.oozie.email-action.0.1″ >
< to > bob@initech.com,the.other.bob@initech.com < /to >
< cc > will@initech.com < /cc >
< bcc > yet.another.bob@initech.com < /bcc >
< subject > Email notifications for ${wf.id()} < /subject >
< body > The wf ${wf.id()} successfully completed. < /body >
< /email >

 Why use oozie rather than just cascading employment one after another?

Major Flexibility. Start, stop, re-run and suspend Oozie allows us to restart from failure

 How to make a workflow?

First make a Hadoop job and make sure that it works Make a jar out of classes and then make a workflow.xml file and copy all of the job configuration properties in to the XML file. Input files Output files Input readers and writers mappers and reducers job specific arguments job.properties

What are the properties that we have to mention in .Properties?

Name Node Job Tracker Oozie.wf.application.path Lib Path Jar Path

What is application pipeline in Oozie?

It is necessary to connect workflow jobs that run regularly, but at different time intervals. The outputs of multiple subsequent runs of a workflow become the input to the next workflow. Chaining together these workflows result it is referred as a data application pipeline.

What are the extra files we need when we run a Hive action in Oozie?

hive.hql
hive-site.xml

How to run Oozie?

$ oozie job -oozie http.//172.20.95.107.11000(oozie server node)/oozie -config job.properties -run
This will give the job id.
To know the status. $ oozie job -oozie http.//172.20.95.107.11000(oozie server node)/oozie -info

What are all the actions can be performed in Oozie?

Email Action Hive Action Shell Action Ssh Action Sqoop Action Writing a custom Action Executor

Why we use Fork and Join nodes of oozie?

— A fork node splits one path of execution into multiple concurrent paths of execution. — A join node waits until every multiprogramming path of a previous fork node arrives thereto. — The fork and join nodes must be utilized in pairs. The join node assumes multiprogramming paths are children of an equivalent fork node.

What’s Decision Node in Oozie?

Decision Nodes are switch statements which will run different jobs supported the outcomes of an expression.

Explain the working of OOZIE?

Apache Oozie may be a workflow scheduler for Hadoop. it’s a system which runs the workflow of dependent jobs. Here, users are permitted to make Directed Acyclic Graphs of workflows, which may be run in parallel and sequentially in Hadoop.

It consists of two parts.

• Workflow engine. Responsibility of a workflow engine is to store and run workflows composed of Hadoop jobs e.g., MapReduce, Pig, Hive.
• Coordinator engine. It runs workflow jobs supported predefined schedules and availability of knowledge .
Oozie is scalable and may manage the timely execution of thousands of workflows (each consisting of dozens of jobs) during a Hadoop cluster.
at is Database SQL

Oozie is extremely much flexible, as well. One can easily start, stop, suspend and rerun jobs. Oozie makes it very easy to rerun failed workflows. One can easily understand how difficult it are often to catch up missed or failed jobs thanks to downtime or failure. it’s even possible to skip a selected failed node.

How does OOZIE work?

Oozie runs as a service within the cluster and clients submit workflow definitions for immediate or later processing.
Oozie workflow consists of action nodes and control-flow nodes.
An action node represents a workflow task, e.g., moving files into HDFS, running a MapReduce, Pig or Hive jobs, importing data using Sqoop or running a shell script of a program written in Java.
A control-flow node controls the workflow execution between actions by allowing constructs like conditional logic wherein different branches could also be followed counting on the results of earlier action node.
Start Node, End Node, and Error Node fall into this category of nodes.
Start Node, designates the beginning of the workflow job.
End Node, signals end of the work .
Error Node designates the occurrence of a mistake and corresponding error message to be printed.
At the top of execution of a workflow, HTTP callback is employed by Oozie to update the client with the workflow status. Entry-to or exit from an action node can also trigger the callback.
Example Workflow Diagram

Packaging and deploying an Oozie workflow application
A workflow application consists of the workflow definition and every one the associated resources like MapReduce Jar files, Pig scripts etc. Applications got to follow an easy directory structure and are deployed to HDFS in order that Oozie can access them.
An example directory structure is shown below-
/
??? lib/
? ??? hadoop-examples.jar
??? workflow.xml
It is necessary to stay workflow.xml (a workflow definition file) within the top level directory (parent directory with workflow name). Lib directory contains Jar files containing MapReduce classes. Workflow application conforming to the present layout are often built with any build tool e.g., Ant or Maven.
Such a build got to be copied to HDFS employing a command, for instance –
% hadoop fs -put hadoop-examples/target/ name of workflow

Steps for Running an Oozie workflow job
In this section, we’ll see the way to run a workflow job. To run this, we’ll use the Oozie command-line tool (a client program which communicates with the Oozie server).
1. Export OOZIE_URL environment variable which tells the oozie command which Oozie server to use (here we’re using one running locally).
% export OOZIE_URL=”http.//localhost.11000/oozie”
2. Run workflow job using-
% oozie job -config ch05/src/main/resources/max-temp-workflow.properties -run
The -config option refers to an area Java properties file containing definitions for the parameters within the workflow XML file, also as oozie.wf.application.path, which tells Oozie the situation of the workflow application in HDFS.
Example contents of the properties file.
nameNode=hdfs.//localhost.8020
jobTracker=localhost.8021
oozie.wf.application.path=${nameNode}/user/${user.name}/
3. Get the status of workflow job-
Status of workflow job are often seen using subcommand ‘job’ with ‘-info’ option and specifying job id after ‘-info’.
e.g., % oozie job -info
Output shows status which is one among RUNNING, KILLED or SUCCEEDED.
4. Results of successful workflow execution are often seen using Hadoop command like-
% hadoop fs -cat

Why use Oozie?

The main purpose of using Oozie is to manage different sort of jobs being processed in Hadoop system.
Dependencies between jobs are specified by a user within the sort of Directed Acyclic Graphs. Oozie consumes this information and takes care of their execution within the correct order as laid out in a workflow. That way user’s time to manage complete workflow is saved. additionally , Oozie features a provision to specify the frequency of execution of a specific job.
Features of Oozie
• Oozie has client API and instruction interface which may be wont to launch, control and monitor job from Java application.
• Using its Web Service APIs one can control jobs from anywhere.
• Oozie has provision to execute jobs which are scheduled to run periodically.
• Oozie has provision to send email notifications upon completion of jobs.

Explain the Apache Oozie

Apache Oozie may be a scheduler that lets users schedule and executes Hadoop jobs. Users can execute multiple tasks parallelly in order that quite one job are often executed simultaneously. it’s a scalable, extensible, and reliable system that supports differing types of Hadoop jobs. These jobs include MapReduce jobs, Hive, Streaming jobs, Scoop, and Pig.

What’s the necessity for Apache Oozie?

Apache Oozie provides an excellent thanks to handle multiple jobs. There are differing types of jobs that users want to schedule to be run later or the tasks that require to follow a selected sequence during execution. These sorts of executions are often made easy with the assistance of Apache Oozie. Using Apache Oozie, the administrator or the user can execute the varied independent jobs parallelly, run the roles back-to-back following a particular sequence, or can control the roles from anywhere thus, making it very useful.

What are the most components of the Apache Oozie workflow?

The Apache Oozie workflow consists of the control flow nodes and action nodes.
Below is that the explanation of those nodes.
• Control flow nodes. These nodes define the beginning and end of the workflow, i.e., start, end, and fail. Besides, it also offers the mechanism that manages the execution path within the workflow, i.e., decision, fork, and join.
• Action nodes. These nodes offer the mechanism that initiates the execution of the processing or computation task. Oozie supports different actions, including Hadoop MapReduce, Pig, and filing system , and system-specific jobs like HTTP, SSh, and email.

What’s the utilization of Join and Fork nodes in Oozie?

The fork and join nodes in Oozie get utilized in pairs. The fork node splits the execution path into many multiprogramming paths. The join node joins the 2 or more multiprogramming paths into one one. The join node is that the children of the fork nodes that concurrently join to form join nodes.

What are a number of the useful EL functions within the Oozie workflow?

Below is that the list of some useful EL functions of Oozie workflow.
• wf. name() — It returns the appliance name within the workflow.
• wf. id() — This function returns the work id of the currently running workflow job.
• wf.errorCode(String node) — It returns the error code of the executing action node.
• wf.lastErrorNod() — This function returns the name of the last executed action node during a workflow that exits with a mistake .

Explain the various nodes supported in Oozie workflow.

Below is that the list of action nodes that Apache Oozie workflow supports in and helps within the computation tasks.
• Map Reduce Action. This action node initiates the Hadoop Map-Reduce job
• Pig Action. This node is employed to start out the Pig job from Apache Oozie workflow.
• FS (HDFS) Action. This action node allows the Oozie workflow to control all the HDFS-related files and directories. Also, it supports commands like mkdir, move, chmod, delete, chgrp, and touchz.
• Java Action. it’s the sub-workflow action node that helps within the execution of public static void main(String[] args) method of main java class in Oozie workflow.

What’s Oozie Bundle?

Oozie bundle allows the user to execute the work in batches. The Oozie bundle jobs are started, stopped, suspended, resumed, re-run, or killed in batches, thus providing better operational control.

Explain the Pipeline works In Oozie

The pipeline in Oozie helps in connecting the multiple jobs during a workflow that executes regularly but during different intervals. during this pipeline, the output of multiple executions of workflow becomes the input of subsequent scheduled job within the workflow that gets executed back-to-back within the pipeline. The joined chain of workflows forms the Oozie pipeline of jobs.

Explain the life cycle of the Oozie workflow job

The job within the Apache Oozie workflow transition through the blow states.
• PREP — this is often the state when the user creates the workflow job. During PREP state, the work is merely defined and isn’t running.
• RUNNING — When the work starts, it changes to the RUNNING state and remains during this state until the work reaches the top state, a mistake occurs, or the work is suspended.
• SUSPENDED — The state of the work in Oozie workflow changes to SUSPENDED if the work is suspended in between. the work will remain during this state until it’s killed or resumed.
• SUCCEEDED — The workflow job becomes SUCCEEDED when the work reaches the top node.
• KILLED — The workflow job transitions to KILLED state when the administrator kills any job in PREP, RUNNING OR SUSPENDED states
• FAILED — the work state changes into a FAILED state when the running job fails thanks to an unexpected error.

What’s Apache Oozie?

Apache Oozie may be a Java Web application wont to schedule Apache Hadoop jobs.It is integrated with the Hadoop stack and supports Hadoop jobs for Apache MapReduce, Apache Pig, Apache Hive, and Apache Sqoop. Oozie may be a scalable, reliable and extensible system. Oozie is employed in production at Yahoo!, running quite 200,000 jobs a day .

Mention Some Features of Oozie?

o Oozie has client API and instruction interface which may be wont to launch, control and monitor job from Java application.
o Using its Web Service APIs one can control jobs from anywhere.
o Oozie has provision to execute jobs which are scheduled to run periodically.
o Oozie has provision to send email notifications upon completion of jobs.

Explain Need for Oozie?

With Apache Hadoop becoming the open-source de-facto standard for processing and storing Big Data, many other languages like Pig and Hive have followed – simplifying the method of writing big data applications supported Hadoop.
Although Pig, Hive and lots of others have simplified the method of writing Hadoop jobs, repeatedly one Hadoop Job isn’t sufficient to urge the specified output. Many Hadoop Jobs need to be chained, data has got to be shared in between the roles , which makes the entire process very complicated.

What Are the Alternatives to Oozie Workflow Scheduler?

o Azkaban may be a batch workflow job scheduler
o Apache NiFi is a simple to use, powerful, and reliable system to process and distribute data.
o apache Falcon – Feed management and processing platform

Explain sorts of Oozie Jobs?

Oozie supports job scheduling for the complete Hadoop stack like Apache MapReduce, Apache Hive, Apache Sqoop and Apache Pig.
It consists of two parts.
Workflow e2ngine. Responsibility of a workflow engine is to store and run workflows composed of Hadoop jobs e.g., MapReduce, Pig, Hive.
Coordinator engine. It runs workflow jobs supported predefined schedules and availability of knowledge .

Explain Oozie Workflow?

An Oozie Workflow may be a collection of actions arranged during a Directed Acyclic Graph (DAG) . Control nodes define job chronology, setting rules for beginning and ending a workflow, which controls the workflow execution path with decision, fork and join nodes. Action nodes trigger the execution of tasks.
Workflow nodes are classified on top of things flow nodes and action nodes.
Control flow nodes. nodes that control the beginning and end of the workflow and workflow job execution path.
Action nodes. nodes that trigger the execution of a computation/processing task.
Workflow definitions are often parameterized. The parameterization of workflow definitions it done using JSP Expression Language syntax, allowing not only to support variables as parameters but also functions and sophisticated expressions.

What’s Oozie Workflow Application?

Workflow application may be a ZIP file that has the workflow definition and therefore the necessary files to run all the actions.
It contains the subsequent files.
o Configuration file – config-default.xml
o App files – lib/ directory with JAR then files
o Pig scripts

What Are the Properties That we’ve to say in properties?

o Name Node
o Job Tracker
o Oozie.wf.application.path
o Lib Path
o Jar Path

What Are the additional Files we’d like once we Run A Hive Action in Oozie?

o hive.hql
o hive-site.xml

What’s Decision Node in Oozie?

Decision Nodes are switch statements which will run different jobs supported the outcomes of an expression.

Explain Oozie Coordinator?

Oozie Coordinator jobs are recurrent Oozie Workflow jobs that are triggered by time and data availability.Oozie Coordinator also can manage multiple workflows that are hooked in to the result of subsequent workflows. The outputs of subsequent workflows become the input to subsequent workflow. This chain is named a ‘data application pipeline’.
Oozie processes coordinator jobs during a fixed timezone with no DST (typically UTC ), this timezone is referred as ‘Oozie processing timezone’. The Oozie processing timezone is employed to resolve coordinator jobs start/end times, job pause times and therefore the initial-instance of datasets. Also, all coordinator dataset instance URI templates are resolved to a datetime within the Oozie processing time-zone.
The usage of Oozie Coordinator are often categorized in 3 different segments.
Small. consisting of one coordinator application with embedded dataset definitions
Medium. consisting of one shared dataset definitions and a couple of coordinator applications
Large. consisting of one or multiple shared dataset definitions and a number of other coordinator applications

Explain Briefly About Oozie Bundle?

Oozie Bundle may be a higher-level oozie abstraction which will batch a group of coordinator applications. The user are going to be ready to start/stop/suspend/resume/rerun within the bundle level resulting a far better and straightforward operational control.
More specifically, the oozie Bundle system allows the user to define and execute a bunch of coordinator applications often called a knowledge pipeline. there’s no explicit dependency among the coordinator applications during a bundle. However, a user could use the info dependency of coordinator applications to make an implicit data application pipeline.
Oozie executes workflow supported.
o Time Dependency (Frequency)
o Data Dependency

What’s Application Pipeline in Oozie?

It is necessary to attach workflow jobs that run regularly, but at different time intervals. The outputs of multiple subsequent runs of a workflow become the input to subsequent workflow. Chaining together these workflows result it’s referred as a knowledge application pipeline.

How Does Oozie Work?

  • Oozie runs as a service within the cluster and clients submit workflow definitions for immediate or later processing. Oozie workflow consists of action nodes and control-flow nodes.
  • An action node represents a workflow task, e.g., moving files into HDFS, running a MapReduce, Pig or Hive jobs, importing data using Sqoop or running a shell script of a program written in Java.
  • A control-flow node controls the workflow execution between actions by allowing constructs like conditional logic wherein different branches could also be followed counting on the results of earlier action node. Start Node, End Node and Error Node fall into this category of nodes.
  • Start Node, designates start of the workflow job.
  • End Node, signals end of the work .
  • Error Node, designates an event of error and corresponding error message to be printed.
    At the top of execution of workflow, HTTP callback is employed by Oozie to update client with the workflow status. Entry-to or exit-from an action node can also trigger callback.

The way to Deploy Application?

$ hadoop fs-put wordcount-wf hdfs.//bar.com.9000/usr/abc/wordcount

Mention Workflow Job Parameters?

$ cat job.properties
Oozie.wf.application.path=hdfs.//bar.com.9000/usr/abc/wordcount
Input=/usr/abc/input-data
Output=/usr/abc/output-data

List the way to Execute Job?

$ oozie job –run –config job.properties
Job.1-20090525161321-oozie-xyz-W

What Are All the Actions are often Performed in Oozie?

  • Email Action
  • Hive Action
  • Shell Action
  • Ssh Action
  • Sqoop Action
  • Writing a custom Action Executor

Why We Use Fork and Join Nodes of Oozie?

o A fork node splits one path of execution into multiple concurrent paths of execution.
o A join node waits until every multiprogramming path of a previous fork node arrives thereto .
o The fork and join nodes must be utilized in pairs. The join node assumes multiprogramming paths are children of an equivalent fork node.

Why Oozie Security?

o User aren’t allowed to change job of another user
o Hadoop doesn’t support the authentication of user
o Oozie has got to verify and confirms its user before transferring the work to Hadoop

What additional configuration is required for Oozie email Action?

SMTP server configuration has got to present in oozie-site.xml.
oozie.email.smtp.host – host where the e-mail action may find the SMTP server
oozie.email.smtp.port= The port to attach to for the SMTP server (25 by default). oozie.email.from.address= The from address to be used for mailing all
emails oozie.email.smtp.auth= – Boolean property that toggles if authentication is to be done or not. (false by default).
oozie.email.smtp.username= If authentication is enabled, the username to login as (empty by default).
oozie.email.smtp.password= If authentication is enabled, the username’s password (empty by default).
oozie.email.attachment.enabled= Boolean property that toggles if configured attachments are to be placed into the emails. (false by default).

< email >
< to > bob@initech.com,the.other.bob@initech.com < /to >
< cc > will@initech.com < /cc >
< bcc > yet.another.bob@initech.com < /bcc >
< subject > Email notifications for ${wf.id()} < /subject >
< body > The wf ${wf.id()} successfully completed. < /body >
< /email >

How will you define Oozie?

Oozie may be a workflow scheduler system to manage Apache Hadoop jobs. Scalable, reliable and extensible system. It supporting several sorts of Hadoop jobs (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) also as system specific jobs like Java programs and shell scripts.

List few EL functions in Oozie workflow?

wf.id() – returns the workflow job ID for the present workflow job.
wf.name() – returns the workflow application name
wf.lastErrorNode() – returns the name of the last workflow action node that exit with a mistake exit state
wf.errorCode(String node) – returns the error code for the required action node

So, this brings us to the end of the Apache Oozie Interview questions blog.
This Tecklearn ‘Apache Oozie Interview Questions and Answers’ helps you with commonly asked questions if you are looking out for a job in Big Data Domain.
If you wish to learn Oozie and build a career in Big Data Testing domain, then check out our interactive, Big Data Hadoop Testing Training, that comes with 24*7 support to guide you throughout your learning period.

What you will Learn in this Course?

Introduction to Hadoop

• The Case for Apache Hadoop
• Why Hadoop?
• Core Hadoop Components
• Fundamental Concepts

HDFS

• HDFS Features
• Writing and Reading Files
• NameNode Memory Considerations
• Overview of HDFS Security
• Using the Namenode Web UI
• Using the Hadoop File Shell

Getting Data into HDFS

• Ingesting Data from External Sources with Flume
• Ingesting Data from Relational Databases with Sqoop
• REST Interfaces
• Best Practices for Importing Data

Hadoop Testing

• Hadoop Application Testing
• Roles and Responsibilities of Hadoop Testing Professional
• Framework MRUnit for Testing of MapReduce Programs
• Unit Testing
• Test Execution
• Test Plan Strategy and Writing Test Cases for Testing Hadoop Application

Big Data Testing

• BigData Testing
• Unit Testing
• Integration Testing
• Functional Testing
• Non-Functional Testing
• Golden Data Set

System Testing

• Building and Set up
• Testing SetUp
• Solary Server
• Non-Functional Testing
• Longevity Testing
• Volumetric Testing

Security Testing

• Security Testing
• Non-Functional Testing
• Hadoop Cluster
• Security-Authorization RBA
• IBM Project

Automation Testing

• Query Surge Tool
Oozie
• Why Oozie
• Installation Engine
• Oozie Workflow Engine
• Oozie security
• Oozie Job Process
• Oozie terminology
• Oozie bundle

Got a question for us? Please mention it in the comments section and we will get back to you.

0 responses on "Top Apache Oozie Interview Questions and Answers You must Prepare Gaurav"

Leave a Message

Your email address will not be published. Required fields are marked *