A working Storm cluster should have one nimbus and one or more supervisors. Another important node is Apache ZooKeeper, which can be used for the coordination between the nimbus and therefore the supervisors.

Let us now take an in depth check out the workflow of Apache Storm −

Initially, the nimbus will await the “Storm Topology” to be submitted thereto .
Once a topology is submitted, it’ll process the topology and gather all the tasks that are to be administered and therefore the order during which the task is to be executed.
Then, the nimbus will evenly distribute the tasks to all or any the available supervisors.
At a specific interval , all supervisors will send heartbeats to the nimbus to tell that they’re still alive.
When a supervisor dies and doesn’t send a heartbeat to the nimbus, then the nimbus assigns the tasks to a different supervisor.
When the nimbus itself dies, supervisors will work on the already assigned task with none issue.
Once all the tasks are completed, the supervisor will await a replacement task to return in.
Within the meantime, the dead nimbus is going to be restarted automatically by service monitoring tools.
The restarted nimbus will continue from where it stopped. Similarly, the dead supervisor also can be restarted automatically. Since both the nimbus and therefore the supervisor is often restarted automatically and both will continue as before, Storm is bound to process all the task a minimum of once.
Once all the topologies are processed, the nimbus waits for a replacement topology to arrive and similarly the supervisor waits for brand spanking new tasks.

By default, there are two modes in a Storm cluster −

Local mode − This mode is used for development, testing, and debugging because it is the easiest way to see all the topology components working together. In this mode, we can adjust parameters that enable us to see how our topology runs in different Storm configuration environments. In Local mode, storm topologies run on the local machine in a single JVM.
Production mode − In this mode, we submit our topology to the working storm cluster, which is composed of many processes, usually running on different machines. As discussed in the workflow of storm, a working cluster will run indefinitely until it is shut down.

Storm – Distributed Messaging System

Apache Storm processes real-time data and therefore the input normally comes from a message queuing system. An external distributed messaging system will provide the input necessary for the real-time computation. Spout will read the info from the messaging system and convert it into tuples and input into the Apache Storm. The interesting fact is that Apache Storm uses its own distributed messaging system internally for the communication between its nimbus and supervisor.

What is Distributed Messaging System?

Distributed messaging is predicated on the concept of reliable message queuing. Messages are queued asynchronously between client applications and messaging systems. A distributed messaging system provides the advantages of reliability, scalability, and persistence.

Most of the messaging patterns follow the publish-subscribe model (simply Pub-Sub) where the senders of the messages are called publishers and people who want to receive the messages are called subscribers.

Once the message has been published by the sender, the subscribers can receive the chosen message with the assistance of a filtering option. Usually we’ve two sorts of filtering, one is topic-based filtering and another one is content-based filtering.

Note that the pub-sub model can communicate only via messages. it’s a really loosely coupled architecture; even the senders don’t know who their subscribers are. Many of the message patterns enable with message broker to exchange publish messages for timely access by many subscribers. A real-life example is Dish TV, which publishes different channels like sports, movies, music, etc., and anyone can subscribe their own set of channels and obtain them whenever their subscribed channels are available.

The following table describes some of the popular high throughput messaging systems −

Distributed messaging system	Description
Apache Kafka	Kafka was developed at LinkedIn corporation and later it became a sub-project of Apache. Apache Kafka is based on brokerenabled, persistent, distributed publish-subscribe model. Kafka is fast, scalable, and highly efficient.
RabbitMQ	RabbitMQ is an open source distributed robust messaging application. It is easy to use and runs on all platforms.
JMS(Java Message Service)	JMS is an open source API that supports creating, reading, and sending messages from one application to another. It provides guaranteed message delivery and follows publish-subscribe model.
ActiveMQ	ActiveMQ messaging system is an open source API of JMS.
ZeroMQ	ZeroMQ is broker-less peer-peer message processing. It provides push-pull, router-dealer message patterns.
Kestrel	Kestrel is a fast, reliable, and simple distributed message queue.

Thrift Protocol

Thrift was built at Facebook for cross-language services development and remote procedure call (RPC). Later, it became an open-source Apache project. Apache Thrift is an Interface Definition Language and allows to define new data types and services implementation on top of the defined data types in a simple manner.

Apache Thrift is additionally a communication framework that supports embedded systems, mobile applications, web applications, and lots of other programming languages. a number of the key features related to Apache Thrift are its modularity, flexibility, and high performance. additionally, it can perform streaming, messaging, and RPC in distributed applications.

Storm extensively uses Thrift Protocol for its internal communication and data definition. Storm topology is just Thrift Structs. Storm Nimbus that runs the topology in Apache Storm may be a Thrift service.

So, this brings us to the end of blog. This Tecklearn ‘Detailed understanding of Workflow of Apache Storm’ helps you with commonly asked questions if you are looking out for a job in Apache Storm and Big Data Domain.

If you wish to learn Apache Storm and build a career in Apache Storm or Big Data domain, then check out our interactive, Apace Storm Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/apache-strom-training/

Apache Storm Training

About the Course

Tecklearn Apache Storm training will give you a working knowledge of the open-source computational engine, Apache Storm. You will be able to do distributed real-time data processing and come up with valuable insights. You will learn about the deployment and development of Apache Storm applications in real world for handling Big Data and implementing various analytical tools for powerful enterprise-grade solutions. Upon completion of this online training, you will hold a solid understanding and hands-on experience with Apache Storm.

Why Should you take Apache Storm Training?

The average pay of Apache Storm Professional stands at $90,167 P.A – Indeed.com
Groupon, Twitter and many companies using Apache Storm for business purposes like real-time analytics and micro-batch processing.
Apache Storm is a free and open source, distributed real-time computation system for processing fast, large streams of data

What you will Learn in this Course?

Introduction to Apache Storm

Apache Storm
Apache Storm Data Model

Architecture of Storm

Apache Storm Architecture
Hadoop distributed computing
Apache Storm features

Installation and Configuration

Pre-requisites for Installation
Installation and Configuration

Storm UI

Zookeeper
Storm UI

Storm Topology Patterns

Got a question for us? Please mention it in the comments section and we will get back to you.

408