Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

What are the features of Apache Cassandra?

Apache Cassandra has a lot of features, some of them which make it stand out of crowd are:

Explain what is Cassandra?

Cassandra is an open-source data storage system developed at Facebook for inbox search and designed for storing and managing large amounts of data across commodity servers. It can server as both

Real time data store system for online applications
Also, as a read intensive database for business intelligence system

Explain what is composite type in Cassandra?

In Cassandra, composite type allows to define key or a column name with a concatenation of data of different type. You can use two types of Composite Type

Row Key
Column Name

Mention what are the main components of Cassandra Data Model?

The main components of Cassandra Data Model are

Cluster
Keyspace
Column
Column & Family

Eplain what is a column family in Cassandra?

Column family in Cassandra is referred for a collection of Rows.

Explain what is a cluster in Cassandra?

A cluster is a container for keyspaces. Cassandra database is segmented over several machines that operate together. The cluster is the outermost container which arranges the nodes in a ring format and assigns data to them. These nodes have a replica which takes charge in case of data handling failure.

Mention when you can use Alter keyspace?

ALTER KEYSPACE can be used to change properties such as the number of replicas and the durable_write of a keyspace.

Explain what is Cassandra-Cqlsh?

Cassandra-Cqlsh is a query language that enables users to communicate with its database. By using Cassandra cqlsh, you can do following things

Define a schema
Insert a data and
Execute a query

Mention what does the shell commands “Capture” and “Consistency” determines?

There are various Cqlsh shell commands in Cassandra. Command “Capture”, captures the output of a command and adds it to a file while, command “Consistency” display the current consistency level or set a new consistency level.

What is mandatory while creating a table in Cassandra?

While creating a table primary key is mandatory, it is made up of one or more columns of a table.

Explain how Cassandra writes data?

Cassandra writes data in three components

Commitlog write
Memtable write
SStable write

Cassandra first writes data to a commit log and then to an in-memory table structure memtable and at last in SStable

Explain what is Memtable in Cassandra?

Cassandra writes the data to a in memory structure known as Memtable
It is an in-memory cache with content stored as key/column
By key Memtable data are sorted
There is a separate Memtable for each ColumnFamily, and it retrieves column data from the key

Explain what is SStable consist of?

SStable consist of mainly 2 files

Index file (Bloom filter & Key offset pairs)
Data file (Actual column data)

Explain how Cassandra delete Data?

SSTables are immutable and cannot remove a row from SSTables. When a row needs to be deleted, Cassandra assigns the column value with a special value called Tombstone. When the data is read, the Tombstone value is considered as deleted.

Compare MongoDB with Cassandra.

Criteria	MongoDB	Cassandra
Data Model	Document	Google Bigtable like
Database scalability	Read	Write
Querying of data	Multi-indexed	Using Key or Scan

What is Cassandra?

Cassandra is one of the most favored NoSQL distributed database management systems by Apache. With its open-source technology, Cassandra is efficiently designed to store and manage large volumes of data without any failure. Highly scalable for Big Data models and originally designed by Facebook, Apache Cassandra is written in Java comprising flexible schemas. Apache Cassandra has no single point of failure. There are various types of NoSQL databases, and Cassandra is a hybrid of column-oriented and key–value store database. The keyspace is the outermost container for an application, and the table or column family in Cassandra is the keyspace entity.

List the benefits of using Cassandra.

Unlike traditional or any other database, Apache Cassandra delivers near real-time performance simplifying the work of Developers, Administrators, Data Analysts, and Software Engineers.

Instead of master–slave architecture, Cassandra is established on a peer-to-peer architecture ensuring no failure.
It also assures phenomenal flexibility as it allows the insertion of multiple nodes to any Cassandra cluster in any data center. Further, any client can forward its request to any server.
Cassandra facilitates extensible scalability and can be easily scaled up and scaled down as per the requirements. With a high throughput for read and write operations, this NoSQL application need not be restarted while scaling.
Cassandra is also revered for its strong data replication on nodes capability as it allows data storage at multiple locations enabling users to retrieve data from another location if one node fails. Users have the option to set up the number of replicas they want to create.
Shows brilliant performance when used for massive datasets and thus, the most preferable NoSQL DB by most organizations.
Operates on column-oriented structure and thus, quickens and simplifies the process of slicing. Even data access and retrieval become more efficient with column-based data model.
Further, Apache Cassandra supports schema-free/schema-optional data model, which un-necessitate the purpose of showing all the columns required by your application. Find out how Cassandra Versus MongoDB can help you get ahead in your career!

Define the management tools in Cassandra.

DataStax OpsCenter: It is the Internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional edition of OpsCenter.

SPM primarily administers Cassandra metrics and various OS and JVM metrics. Besides Cassandra, SPM also monitors Hadoop, Spark, Solr, Storm, ZooKeeper, and other Big Data platforms. The main features of SPM include correlation of events and metrics, distributed transaction tracing, creating real-time graphs with zooming, anomaly detection, and heartbeat alerting.

Define memtable.

Similar to a table, a memtable is the in-memory/write-back cache space consisting of the content in a key and column format. The data in a memtable is sorted by key, and each column family consists of a distinct memtable that retrieves column data via the key. It stores the writes until it is full, and then flushes them out.

What is SSTable? How is it different from other relational tables?

SSTable expands to ‘Sorted String Table,’ which refers to an important data file in Cassandra and accepts regular written memtables. They are stored on disk and exist for each Cassandra table. Exhibiting immutability, SSTables do not allow any further addition and removal of data items once written. For each SSTable, Cassandra creates three separate files like partition index, partition summary, and a bloom filter.

Explain the concept of Bloom Filter.

Associated with SSTable, Bloom filter is an off-heap (off the Java heap to native memory) data structure to check whether there is any data available in the SSTable before performing any I/O disk operation.

State the differences between a node, a cluster, and a data center in Cassandra.

There are various components of Cassandra. While a node is a single machine running Cassandra, cluster is a collection of nodes that have similar types of data grouped together. Data centers are useful components when serving customers in different geographical areas. You can group different nodes of a cluster into different data centers.

How to write a query in Cassandra?

Using CQL (Cassandra Query Language) we can write queries in Cassandra. Cqlsh is used for interacting with the database.

What OS does Cassandra support?

Cassandra supports both Windows and Linux.

What is Cassandra Data Model?

Cassandra data model consists of four main components:
Cluster: Made up of multiple nodes and keyspaces
Keyspace: A namespace to group multiple column families, especially one per partition
Column: Consisting of a column name, value, and timestamp
Column Family: Multiple columns with the row key reference

What is CQL?

CQL is Cassandra query language to access and query Apache distributed database. It consists of a CQL parser that incites all the implementation details to the server. The syntax of CQL is similar to SQL, but it does not alter the Cassandra data model.

Explain the concept of compaction in Cassandra.

Compaction refers to a maintenance process in Cassandra, in which the SSTables are reorganized for data optimization of data structures on the disk. The compaction process is useful during interacting with memtables. There are two types of compaction in Cassandra.

Minor compaction: It gets started automatically when a new SSTable is created. Here, Cassandra condenses all the equally sized SSTables into one.

Major compaction: It is triggered manually using the nodetool. It compacts all SSTables of a column family into one.

Explain Cqlsh.

Cqlsh expands to Cassandra Query Language Shell that configures the CQL interactive terminal. It is a Python-based command-line prompt used on Linux or Windows and executes CQL commands like ASSUME, CAPTURE, CONSISTENCY, COPY, DESCRIBE, and many others. With cqlsh, users can define a schema, insert data, and execute a query.

What is Super Column in Cassandra?

Cassandra Super Column is a unique element consisting of similar collections of data. They are actually key–value pairs with values as columns. It is a sorted array of columns, and they follow a hierarchy when in action: keystore > column family > super column > column data structure in JSON.
Similar to the row keys, super column data entries contain no independent values but are used to collect other columns. It is interesting to note that super column keys appearing in different rows do not necessarily match and will not ever.

Define the consistency levels for read operations in Cassandra.

ALL: Highly consistent. A write must be written to a commitlog and a memtable on all replica nodes in the cluster.
EACH_QUORUM: A write must be written to a commitlog and a memtable on quorum of replica nodes in all data centers.
LOCAL_QUORUM: A write must be written to a commitlog and a memtable on quorum of replica nodes in the same center.
ONE: A write must be written to a commitlog and a memtable of at least one replica node.
TWO, Three: Same as One but with at least two and three replica nodes, respectively
LOCAL_ONE: A write must be written for at least one replica node in the local data center.
ANY
SERIAL: Linearizable consistency to prevent unconditional update
LOCAL_SERIAL: Same as serial but restricted to a local data center

Explain Tombstone in Cassandra.

Tombstone is a row marker indicating a column deletion. These marked columns are deleted during compaction. Tombstones are of great significance as Cassandra supports eventual consistency, where the data must respond before any successful operation.

On what platforms does Cassandra run?

Since Cassandra is a Java application, it can successfully run on any Java-driven platform or on Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on Red Hat, CentOS, Debian, and Ubuntu Linux platforms.

Name the ports that Cassandra uses.

The default settings state that Cassandra uses 7000 port for Cluster Management, 9160 for Thrift Clients, and 8080 for JMX. These are all TCP ports and can be edited in the configuration file: bin/cassandra.in.sh

What are the applications of Cassandra?

Ans: Cassandra has become the primary choice for many companies when it comes to app development and data management. Even new start-ups are preferring it because of the ease with which an operator can work.

Cassandra is a great application where data is collected at high speed from different kinds of sources. As the internet of things application could use Cassandra. It could also be used in a product and retail apps, messaging, social media analytics, and even by a recommendation engine.

Explain Apache Cassandra vs Traditional Databases

Ans: Although traditional databases provide you with many other features here are some highlights and benefits only a database like Cassandra have:

Traditional databases	Cassandra database
Data is written in mostly one location.	Data is written in many locations.
Data volumes are moderate.	Processing data volumes are high.
It can handle only moderate incoming data.	It can handle high incoming data volumes.
Supports complex transactions	Supports simple transactions.
Lines up for just read scalability.	Supports both read and write scalability.

Name the features of Cassandra.

Ans: Cassandra has become famous for its outstanding technical features. Here are some features you must know:

Elastic scalability
Always on architecture
Fast linear and scale performance
Flexible in data storage
Easy to do data distribution
Excellent transaction support.

What are the functions of Cassandra?

Ans: This database supports two main categories of functions:

Scalar functions: Its primary purpose is taking some groups of values and producing an output with it.

Aggregate functions: Its primary function is producing a combined result using selected multiple rows.

What are the key terms in Cassandra?

Ans: They go as follows:

Nodes
Data centre
Rack
Cluster
Commit log
SSTable
MemTable
Replication

What is a node?

Ans: A node is a basic unit of Cassandra, and it is a system which is part of a cluster. Node is the main area where the data is stored.

And the units of a node is represented as computer/server

Describe what is memtable?

Ans: MemTable is a location where data is written and stored temporarily. Data is written in memtable after the data is completed in the commit log.

Memtable is a storage engine in Cassandra. Data in MemTable is classified into a key, and where the data is retrieved using the key as each column category has its own MemTable. When the write memory is full, it deletes the messages automatically.

What is SSTable?

Ans: SSTable also means ‘Sorted String Table’. SSTable is a data file in Cassandra, and its main function is to save data which is flushed from memtable. Unlike MemTable, SSTbale doesn’t delete any data or lets any further addition once data is written.

What is the difference between memtable and SSTable?

Ans: In MemTable it doesn’t store the data. It temporarily accumulates ‘write data’, but it cannot store it into the disk.

Whereas in SStable, it is used to store the data from Memtable into Cassandra database. The data stored in SSTable is permanent and cannot be changed.

What is a direct request?

Ans: Direct request in Cassandra is a part of the read operation. In this, the coordinator node contacts the replica node.

Define digest request?

Ans: When the coordinator node contacts replicas, it actually requests those nodes which reply fastest. Then these contacted nodes respond with a digest of the data required.

Explain read repair request?

Ans: When the coordinator node sends requests, it checks in the nodes for any outdated data. This data is sent for a background read and repair and is replaced with the updated data. Read and repair requests, is a method to keep the data updated, and it also makes sure that the requested row is consistent on all replicas.

What is Cassandra: CAP Theorem?

Ans: The CAP theorem, also knowns Brewer’s theorem, states that a distributed computer system can’t use all its three properties at the same time which are

Consistency,
Availability,
Partition-tolerance.

What do you mean by ACID?

Ans: ACID stands for

Atomicity: Which means either your transaction can fail or commit

Consistency: Its definition changes from software to software or an application to application, but its general meaning is that data has to stay consistent.

Isolation: Data has to be isolated and separated from each other

Durability: It assures you that once the database receives data, it should ensure that the data is processed. So it is an advantage if the database fails, then the data will not be lost.

What is BASE?

Ans: Not every application or software needs this strong consistency, so this is where base comes into action. The BASE stands for Basically Available Soft-state Eventually-consistent properties. NoSQL databases basically use these models.

Explain, what is tunable consistency?

Ans: Consistency refers to updating and synchronizing a row of Cassandra data in all of its replicas. By offering tunable consistency for a given operation (read/write), helps the application to decide the right consistency of data.

What is the relation between tunable consistency and Cassandra?

Ans: Tunable consistency ensures proper levels of consistency for its reads and writes which is the main reason why Cassandra prefers NoSQL databases.

What is the NoSQL database?

Ans: The primary purpose of usage of NoSQL databases is because it provides smooth handling of large data. Its simplicity of design and simplicity in horizontal scaling to clusters and fine control are few of the reasons why Cassandra uses a NoSQL database.

What are the objectives of NoSQL?

Ans: The primary objectives of NoSQL DB are:

To have the simplicity of design
More exceptional control over availability and
Horizontal scaling

Describe a bloom filter?

Ans: A bloom filter is a tool used by Cassandra. The read path of Cassandra has to go through Memtable and the row cache. A bloom filter is a partition cache, and its role is the read path is to avoid checking every SStable to find one particular data.

What is CQL?

Initially, Cassandra required an API to do some of the basic tasks like insert, get and delete. But over time, these basic queries were improved and then named Cassandra Query Language. (CQL).

CQL provides a great set of built-in data types, and it also helps the applications to make their own custom data types. Cassandra is also classified as a NoSQL database.

What is a cluster in Cassandra?

A cluster is a collection of nodes. This collection of nodes represents a single system. It is the outermost structure of the ring in Cassandra.

What are CRUD operations?

These operations are used to make changes in the Cassandra database.

CRUD stands for

Create operation
Read operation
Update operation and
Delete/drop operation.

Describe Keyspace.

Ans: A keyspace is a part of the cluster which controls the replication of the data in a database. A cluster contains one keyspace per node.

Name the types of Keyspace in Cassandra?

Ans: Cassandra keyspace contains 3 types of operations which go as follows:

Create keyspace
Alter keyspace
Drop keyspace

Define column family in Cassandra?

Ans: Column family in Cassandra is defined as the collection of rows in an ordered and systematic way. It is used to represent the stored data in a structured manner. These are contained in a keyspace, at least one column family in a keyspace

Explain about the super column in Cassandra?

Ans: A super column in Cassandra is an extraordinary and important column. It has so much value because it has the roadmap to all the sub-columns in the database.

These super columns are used to improve the performance of the database

What do you understand by Cassandra?

It is a NoSQL based technology which is highly selected by the users and customers. This company is run by Apache. Cassandra is so popular because it is very capable to store and manage huge data without any loss or damages. It is written in Java. The most amazing feature of Cassandra is that it has no chance of failure. Cassandra is the mixture of the key-value store and column-oriented where Key-value represents the external chamber for an application while column represents the keyspace thing.

What are the main benefits of using Cassandra?

First of all, the extraordinary points that makes the people so attracted to it that it has no chance of failure.
It is very efficient as it delivers an exact time execution that is very helpful in analyzing the data. By this, it is quite easy to handle the work for engineers, developers etc.
It is designed as on an equal footing, not like the master-slave.
Again, it is very flexible for the users. As anyone can insert a number of nodes to any Cassandra in any of the data centers.
Users are able to send requests to the servers.
As correspondence to the new technologies, it leaves no issues of competition. It expedites scalability in which users can easily scale up or scale down as per the needs of the users. And further, it does not need any kind of refreshment in the processing of the operation while scaling.
The next great point is replication. Users are able to copy the data as much as the copies they want. They can even store their data at different nodes. In the case of failure of any nodes, users can back up their data from another node.
It is chosen as the most favored NoSQL DB by many companies and the organizations because of its excellent performance.
Slicing is very easy and simple in Cassandra because it operates on column-oriented. This makes the many more function like accessing data and redemption.
Last but not least it holds schema-free or schema-optional data pattern.

Explain the concept of compaction in Cassandra.

Compaction is very efficient in maintaining the process of arrangement for data update of the data structure on disk. Compaction is beneficial at the time of interaction with Memtable.

Generally, there are two kinds of Compaction

Minor compaction -It is a type of compaction in which equally sized SS Tables are adjusted into one. It does not need to start, as it starts automatically when a fresh SS Table is formed.
Major compaction– It can’t start automatically, there is a node tool used as a trigger. It is used to condense the SS Table of a column family into the one.

Explain the concept of Cassandra Data Model?

Cassandra Data Model is composed of four main components:

Cluster: -It is inclusive of a lot of nodes and key spaces.

Keyspace: It consists of a namespace to the group having a lot of column family, particularly, one per division

Column: It is inclusive of a name of the column, timestamp, and value.

Column family: It consists of a number of the columns with row key referral.

Discuss the precautions that are needed to take care while adding a column?

The column name is not matched with an already present column name
The table is not limited to a compressed storage prospect.

What do you understand by the Super Column in Cassandra?

Cassandra Super Column is used to collect the same kind of data. These are really key-value sets. These values are referred to the column. It is a grouping arrangement of columns. They follow a sequel that is

Key store> column family > super column> column data structure in JSON (JavaScript Object Notation).

Differentiate between the terms: node, a cluster, and data center in Cassandra?

These all are the basic component of Cassandra. A node is a work as an individual machinery, a cluster is an accumulation of a great number of nodes and these nodes have a similar kind of collected data. While at the time of serving the customers where they are located at different locations Data centers are useful. In combination, we can say that it helps to group various nodes of a cluster into various data centers.

Elucidate the concept of CAP theorem?

CAP is efficiently used at the time of handling and managing the scaling tactics. Whenever a desire of scaling is observed, CAP theorem play its vital role. CAP Theory stands for Consistency Availability and Partition tolerance theory which states that in the system same as Cassandra users cannot use all the three characteristics, they have to choose two of them and one is needed to sacrifice.

These three characteristics are: –

Consistency: It gives the warranty for returning of recent write for the user.
Availability-: It is a source of giving a reasonable reply within minimum time.
Partition: It represents that the system will work also at the time when the network barrier or partitions occur.

Explain what do you understand by Cassandra- CQL collections?

Cassandra- CQL collections serve the clients to reserve a large number of values just in one variable. There are many ways to use the CQL collection in Cassandra. These are: –

List- In arranging and managing the system of the data, a list is used. Moreover, it is also useful to store the value numerous times.
SET- In order to keep and returned the group of elements in classified orders, SET is used. MAP- MAP is used to keep the key-value set of components.

Explain the Memtable in Cassandra?

As the name suggests, Memtable is related to memory. The data that is written is in a structure (in-memory) by Cassandra is termed as Memtable. All the content that is stored as key/column takes place in these structures. With the use of the key, it is easy to classify the data. For every Column Family, there is a definite Memtable and it is also useful at the time of regaining the column data from the key.

What is the procedure of data storage in Cassandra?

The data stored in Cassandra is in bytes. When the user or client is sure about the approver, then these bytes are encoded by the Cassandra according to the need. After the completion, a comparator orders the encoding based on the column.

Composites have a particular coding and are patterned in bytes. For each and every component there is always a storage of two-byte length and it is supported by the byte-encoded element which is further accompanied by a termination bit.

Describe the different consistency levels for read operation in Cassandra?

All

It is extremely consistent. It is compulsory to a write needs to be written to memtable and commit log which is on copy nodes in the group

EACH_QUORUM

It is compulsory for a write needs to be written to memtable and commit log on quorum which exists on copy nodes in all data centers

LOCAL_QUORUM

It is compulsory for a write needs to be written to memtable and commit log on the quorum of copy nodes but only in the same center.

ONE

It is compulsory for a write needs to be written to memtable and commit log with one or more replica node.

TWO

It is compulsory for a write needs to be written to memtable and commit log with one or more replica node.

THREE

Same as the above but it should be with three replica nodes, sequentially

Give the differences between the different types of Primary Keys in Cassandra?

- Single primary key

In this case, only one column is used as a primary key. This column is also referred to as partitioning key which is used to partition the data. By virtue of the partition key, data has been spread on various nodes.

- Compound Primary Key

In this, the data is partitioned and then grouped. race_name is referred to as partitioning key while the race_position is referred to as clustering key. Former decides the partition of data and the latter decides the clustering of data.

What do you understand by Logging in Cassandra?

It is in the logging directory where logs are written to the system.log and debug.log file. It is the simplest way to check what’s happening in the database just by changing the logging level. We can configure it by programmatically or by manually.

What are different logging levels in Cassandra?

There are many levels which are described below:

All: It includes all levels addition with custom levels.
DEBUG: To debug an application, it designates fine-grained informational events.
WARN: Detects potentially harmful conditions.
INFO: Indicates informational messages to how the progress.
ERROR: Specifies error events

These are some question which will help you to crack your interview. Of Course, you should also prepare well in this field to get a highly payable job.

What do you understand by the term Snitch in Cassandra? Give some example.

It is the work of snitch that determines to which nodes belong. It can belong to data centers and racks. It provides the information to Cassandra about the replication strategy and network topology for replication schemes. There are several examples of snitches, some of these are:

Simple Snitch	Property File Snitch	Ec2Snitch	Cloud stack Snitch
Dynamic snitching	Rack Inferring Snitch	Gossiping Property File	Google Cloud Snitch

Cassandra was developed by Whom?

Cassandra (NoSQL database management system) was developed by Apache Software Foundation.

What is rack in cassandra?

Apache Cassandra Rack is a classified set of servers and the architecture of Cassandra manages racks so that no duplicate is stored redundantly inside a single rack, guaranteeing that replicas are spread throughout different racks if one rack stops working. Within a data center, there could be various racks with multiple servers.

What is a NoSQL Database?

NoSQL is also referred as Not only SQL to emphasize that they may support SQL-like query language used in relational database.
NoSQL database provides a mechanism to store and retrieve data, which are modeled rather than the tabular relations used in Relational databases.

What are the different types of NoSQL Databases?

There are majorly 4 types of NoSQL Databases,

Key Value Store
Document Store
Column Store
Graph Databases

What is Key-Value Store DB? Explain with an example.

All of the data within database consists of an indexed key and a value. A key may correspond to one or multiple values (hash table). Provides a great performance and can be very easily scaled as per business needs.

What is Document Store DB? Explain with an example.

The data record is the JSON/XML representation of key-value pairs. Every record can have a different set of fields.
Document DBs are similar to Key-value pairs, But the difference is that the key is associated with a document

What is Column Store DB? Explain with an example.

Data is stored in cells are grouped in columns of data rather than as rows of data. Columns are logically grouped into column families.
One row may have one or multiple data records, which is indexed by a partition key.

What is Graph DB? Explain with an example.

The type of NoSQL database in which a flexible graphical representation is used. The key purpose is to store relationships between nodes.

Here, Nodes are Id 1, 2 and 3. Properties for Node 1 are Name and Age
Edges are : Id 100, 101, 102, 103, 104 and 105

What is Apache Cassandra?

Ans: Cassandra is an open-source, distributed and decentralized database. It is also used for managing a large amount of structured data which is spread out everywhere.

Describe the benefits of using Cassandra?

Ans: Cassandra has features which are very beneficial as it is easy to work with; Some of those are high performance, fault tolerance, predictable scaling, distributed database. It has high scores on these parameters, and it is also preferred because it’s an open-source distributed and NoSQL database management system.

What are the main components of Cassandra?

The components of Cassandra include:

Node
Data cluster
Commit log
Cluster
Meme-table
SSTable
Bloom filter

What is the data centre?

Ans: A data centre is a collection of Cassandra nodes.The data in a datacenter is stored in the form of a cluster, where the cluster is also referred to as a collection of nodes.

Discuss and explain the various types of Partitioners in Cassandra?

Murmur3 Partitioner: It is the default and the most important partitioner as it is better and well performed than the others. Its speed is more than Random Partitioner. With all of this, it is also functional for even distribution. It uses 64-bit hash values with Range: 263 to 263-1

Random Partitioner: Before the arrival of Cassandra 1.2, Random Partitioner was identified as the default. It is worked together with vnodes. As same as above, it is also functional as for even distribution. MD5 hash values partition key with Range: 0 to 2127-1

Byte Ordered Partitioner: Byte Ordered Partitioner is a system that are beneficially to organize the location of the keys in the Cassandra. raw byte array value in Byte ordered Partitioner of the row key checks and make the decision regarding the storage of rows on the nodes.

What are the key features of any NoSQL Database?

Features of NoSQL Database
Feature	Description
*Schema Agnostic*	Information can be stored without doing any upfront schema design
*Auto-Sharding & Elastic*	NoSQL allows the workload to automatically spread across any number of servers
*Highly Distributable*	A cluster of servers can be used to hold a single large database.
*Easily Scalable*	Allows easy scaling to adapt to the data volume and complexity of cloud applications
*Integrated Caching*	Cached data in system memory is transparent to the application developers & operations team.

What are the Different types of Data Model?

There are majorly 3 types/stages of Data Model

Conceptual Data Model
Logical Data Model
Physical Data Model

What is CQLSH? And why is it used?

Cassandra-Cqlsh is a query language that enables users to communicate with its database. By using Cassandra cqlsh, you can do following things:

Define a schema
Insert a data, and
Execute a query

What is a YAML file in Cassandra?

The cassandra.yaml file is the main configuration file for Cassandra. After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.

What are Clusters in Cassandra?

The outermost structure in Cassandra is the cluster. A cluster is a container for Keyspaces
Sometimes called the ring, because Cassandra assigns data to nodes in the cluster by arranging them in a ring
A node holds a replica for a different range of data.

What is a Keyspace in Cassandra?

A keyspace is the outermost container for data in Cassandra. Like a relational database, a keyspace has a name and a set of attributes that define keyspace-wide behaviour. The keyspace is used to group Column families together.

How is a Keyspace created in Cassandra? & What are the parameters used?

CREATE KEYSPACE ABC
WITH replication = { ‘class ’: ‘SimpleStrategy’, ‘replication_factor’: ‘3’}
AND durable_writes = ‘TRUE’;

The parameters used while creating a keyspace are:

Keyspace Name
Replication Strategy
Replication Factor &
Durable Writes

What is Network Topology Strategy?

This is used when we deploy a cluster across Multiple Datacenters. It is the primary consideration to insert replicas. Can satisfy reads, locally without incurring cross Data-Center Latency and also Handle Failure Scenarios.

What is a Column Family?

A column family is a container for an ordered collection of rows, each of which is itself an ordered collection of columns. We can freely add any column to any column family at any time, depending on your needs. The comparator value indicates how columns will be sorted when they are returned to you in a query.

What is a Row in Cassandra? and What are the different elements of it?

A row is a collection of sorted columns. It is the smallest unit that stores related data in Cassandra. Any component of a Row can store data or metadata

The different elements/parts of a row are the

Row Key
Column Keys
Column Values

Differentiate between the various types of Primary Keys in Cassandra.

In the Single Primary Key, there is only a single column as a Primary Key.

The column is also called partitioning key. Data is partitioned on the basis of that column. Data is spread on different nodes on the basis of the partition key.

In Compound Primary Key, data is partitioned and then clustered

race_name is the partitioning key and race_position is the Clustering key. Data will be partitioned on the basis of race_name and data will be clustered on the basis of race_position. Clustering is the process that sorts data in the partition. Retrieval of rows is very efficient when rows for a partition key are stored in order, based on the clustering column.

Composite partitioning key is used to create multiple partitions for the data

race_year and race_name are the composite partition key and data will be partitioned on the basis of both columns. Data will be clustered on the basis of the rank. It is used when too much data is present on the single partition.

How does gossip Protocol help in Failure Detection?

The process of Acknowledging messages helps in failure detection. When a node is down/failing it is unable to send or receive messages and hence the Acknowledgements are not received.

What are partitions and Tokens in Cassandra?

Partition: It is a hash function located on each node which hashes tokens from designated values in rows being added. It converts a variable length input to a fixed length value.
Token: Integer value generated by a hashing algorithm, identifying a partition’s location within a cluster

What are the different types of Partitioners in Cassandra? Explain.

Murmur3Partitioner is the default partitioner. It is both improved and faster than RandomPartitioner. Uniformly distributes data based on MurmurHash function.

64-bit hash value partition key with Range: 263 to 263-1

RandomPartitioner was the default partitioner prior to Cassandra 1.2. It is used with vnodes. It has a Uniform Distribution.

It uses MD5 hash values with Range: 0 to 2127-1

ByteOrderedPartioner is used for ordered partitioning. It orders rows lexically by key bytes. Using the ordered partitioner allows ordered scans by primary key. This means we can scan rows as though we were moving a cursor through a traditional index.

How does Cassandra perform Read operation? Explain

What do you mean by Compaction?

It is the process of freeing up space by merging largely accumulated datafiles. It improves performance by reducing the number of required seeks.

What is Anti-Entropy and How is it associated with Merkel Tree?

Anti-entropy is the replica synchronization mechanism, ensuring that data on different nodes is updated to the newest version
Cassandra uses Merkle treeIn for anti-entropy repair. A Merkel Tree is a hash tree where leaves are hashes of the values of individual keys.

What is Hinted Handoff?

Hinted Handoff is a mechanism to ensure availability, fault-tolerance and graceful degradation in Cassandra. The node that receives the hint will know when the unavailable node comes back online again, because of Gossip.

What do you mean by Logging in Cassandra?

Logs are written to the system.log and debug.log file in the Cassandra logging directory
We can configure logging programmatically or manually. The simplest way to get a picture of what’s happening in your database is to just change the logging level to make the output more verbose, by default it is set at INFO.

Explain the different Logging levels in Cassandra.

ALL: All levels including custom levels
TRACE: Designates finer-grained informational events than the DEBUG
DEBUG: Designates fine-grained informational events that are most useful to debug an application
INFO: Designates informational messages that highlight the progress at a coarse-grained level
WARN: Designates potentially harmful situations
ERROR: Designates error events that might still allow the application to continue running
OFF: The highest possible rank and is intended to turn off logging

What is JMX? And How is it useful in Cassandra?

JMX (Java Management Extension) is a Java technology that supplies tools for managing and monitoring Java applications and services. Cassandra makes use of JMX to enable remote management of the servers.

What are snapshots and how do you create one in Cassandra?

Snapshot represents the state of the data files at a particular point in time. Snapshot command is used while taking a backup and creates hard links for SSTables in the snapshots folder which can later be used to restore the node,

Why is JConsole used? What is it’s different elements?

JConsole is used to Monitor and perform analysis on the Server activities. Once you’ve connected to a server, the default view includes four major categories about your server’s state, which are updated constantly:

What is the use of Cassandra and why to use Cassandra?

Cassandra was designed to handle big data workloads across multiple nodes without any single point of failure. The various factors responsible for using Cassandra are

It is fault tolerant and consistent
Gigabytes to petabytes scalabilities
It is a column-oriented database
No single point of failure
No need for separate caching layer
Flexible schema design
It has flexible data storage, easy data distribution, and fast writes
It supports ACID (Atomicity, Consistency, Isolation, and Durability) properties
Multi-data center and cloud capable
Data compression

How Cassandra stores data?

All data stored as bytes
When you specify validator, Cassandra ensures those bytes are encoded as per requirement
Then a comparator orders the column based on the ordering specific to the encoding
While composite are just byte arrays with a specific encoding, for each component it stores a two-byte length followed by the byte encoded component followed by a termination bit.

List out the other components of Cassandra?

The other components of Cassandra are

Node
Data Center
Cluster
Commit log
Mem-table
SSTable
Bloom Filter

Explain what is a keyspace in Cassandra?

In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster consist of one keyspace per node.

What is the syntax to create keyspace in Cassandra?

Syntax for creating keyspace in Cassandra is

CREATE KEYSPACE <identifier> WITH <properties>

Mention what are the values stored in the Cassandra Column?

In Cassandra Column, basically there are three values

Column Name
Value
Time Stamp

Mention what needs to be taken care while adding a Column?

While adding a column you need to take care that the

Column name is not conflicting with the existing column names
Table is not defined with compact storage option

Mention what is Cassandra- CQL collections?

Cassandra CQL collections help you to store multiple values in a single variable. In Cassandra, you can use CQL collections in following ways

List: It is used when the order of the data needs to be maintained, and a value is to be stored multiple times (holds the list of unique elements)
SET: It is used for group of elements to store and returned in sorted orders (holds repeating elements)
MAP: It is a data type used to store a key-value pair of elements

Explain what is Bloom Filter is used for in Cassandra?

A bloom filter is a space efficient data structure that is used to test whether an element is a member of a set. In other words, it is used to determine whether an SSTable has data for a particular row. In Cassandra it is used to save IO when performing a KEY LOOKUP.

Explain how Cassandra writes changed data into commitlog?

Cassandra concatenate changed data to commitlog
Commitlog acts as a crash recovery log for data
Until the changed data is concatenated to commitlog write operation will be never considered successful

Data will not be lost once commitlog is flushed out to file

Explain the concept of tunable consistency in Cassandra.

Tunable consistency is a phenomenal character that makes Cassandra a favored database choice of Developers, Analysts, and Big data Architects. Consistency refers to the up-to-date and synchronized data rows on all their replicas. Cassandra’s tunable consistency allows users to select the consistency level best suited for their use cases. It supports two consistencies: eventual consistency and strong consistency.

The former guarantees consistency when no new updates are made on a given data item, i.e., all accesses return the last updated value eventually. Systems with eventual consistency are known to have achieved replica convergence.

For strong consistency, Cassandra supports the following condition:
R + W > N where,
N – Number of replicas
W – Number of nodes that need to agree for a successful write
R – Number of nodes that need to agree for a successful read

How does Cassandra write?

Cassandra performs the write function by applying two commits: first, it writes to a commit log on the disk, and then it commits to an in-memory structure known as memtable. Once the two commits are successful, the write is achieved. Writes are written in the table structure as SSTables (sorted string tables). Cassandra offers faster write performance.

Explain CAP Theorem.

With a strong requirement to scale systems when additional resources are needed, CAP Theorem plays a major role in maintaining the scaling strategy. It is an efficient way to handle scaling in distributed systems. Consistency, availability, and partition tolerance (CAP) theorem states that in distributed systems like Cassandra, users can enjoy only two out of these three characteristics.

One of them needs to be sacrificed. Consistency guarantees the return of most recent write for the client; availability returns a rational response within minimum time; and in partition tolerance, the system will continue its operations when network partitions occur. The two options available are AP and CP.

Does Cassandra support ACID transactions?

Unlike relational databases, Cassandra does not support ACID transactions.

What is the difference between Column and Super Column?

Both elements work on the principle of tuples having name and value. However, the former’s value is a string, while the value of the latter is a map of columns with different data types.

Unlike Columns, Super Columns do not contain the third component of timestamp.

What is Column Family?

As the name suggests, a column family refers to a structure having an infinite number of rows. Those are referred by a key–value pair, where the key is the name of the column and the value represents the column data. It is much similar to a hashmap in Java or a dictionary in Python. Remember, the rows are not limited to a predefined list of columns here. Also, the column family is absolutely flexible with one row having 100 columns while the other having only 2 columns.

Define the use of the source command in Cassandra.

Source command is used to execute a file consisting of CQL statements.

What is Thrift?

Thrift is a legacy RPC protocol or API unified with a code generation tool for CQL. The purpose of using Thrift in Cassandra is to facilitate access to the DB across the programming language.

Can you add or remove column families in a working cluster?

Yes, but while doing that we need to keep in mind the following processes:

Do not forget to clear the commitlog with ‘nodetool drain’
Turn off Cassandra to ensure that there is no data left in the commitlog
Delete the SSTable files for the removed CFs

What is replication factor in Cassandra?

Replication factor is the measure of the number of data copies existing. It is important to increase the replication factor to log into the cluster.

Can we change the replication factor on a live cluster?

Yes, but it will require running repair to alter the replica count of the existing data.

How to iterate all rows in a Column Family?

Using get_range_slices. You can start iteration with an empty string, and after each iteration the last key read serves as the start key for the next iteration.

Why is Apache Cassandra developed?

Ans: Cassandra is a distributed database management system. It is initially developed at Facebook to improve its performance, and it is a tool made to power the Facebook inbox search feature. Due to its outstanding technical features, Cassandra became very popular and a top-level project.

How is data distribution done?

Ans: Cassandra database is a highly-available database, and it stores data by evenly dividing the data around its nodes. For this, it uses the Murmur3 partitioning function to distribute given data in nodes evenly.

How does Cassandra store data?

Ans: The data storage path in Cassandra begins with the memtable where the data is stored temporarily and is also called a commit log. And once committed, the data is periodically flushed and written into SSTable

What are the general operations of Cassandra CQL?

Ans: There are two types of operations carried by Cassandra:

Read operation and
Write operation

What is a write operation?

Ans: There is step by step operations in writing, which goes as follows.

Step1: It is as soon as it receives its request it sends the data to the commit log to save the data.

Step2: Data is inserted upon request and then sent to commit log to save data.

Step3: If the memtable reaches its limit then data is flushed to SSTable.

What are the best monitor tools of Cassandra?

Ans: Although Cassandra comes with built-in tolerance features, it still needs to be monitored for effective results. Here are some tools which Cassandra uses to monitor its databases:

Solarwind server and application monitor
Instana
Instaclustr
AppDynamics
Dynatrace
Machine engine applications manager.

Name the key roles of CQL?

It is very necessary to provide different types of users with different kinds of roles depending upon their requirements. It ensures the security of database users. and their key roles goes as follows:

Create a role
Alter role
Drop role
Grant role
Revoke role
List role

What are the characteristics of a column family?

Ans: There are many characteristics of a column family, and few of them goes as follows:

Key cached
Rows cached
Preload row cache

What do you mean by Tunable Consistency?

Tunable Consistency is used to keep the fresh and co-exist data rows on all their replicas. It permits the clients a better option in which they can select a consistency level as per their requirement. Tunable consistency is one of a kind features that make the users, developers, and architects having Cassandra their primary choice. Basically, it supports two kinds of consistencies

Eventual consistency- In this consistency, all the data is accessible from the last update, it has no new update. It is just mean to achieve replication of data.

Strong consistency- In this type of consistency, it supports some kind of conditions. These are: –

R+ W > N, where

N stands for the number of replications of data

W stands for the number of nodes that demand to grant for a prosperous write

R stands for the number of nodes that demand to grant for a prosperous read.

What do you mean by Column Family?

Column family as the name suggests it relates to a structure that has a large number of rows. These are associated with a key-value set. Key represents the title of the column while value suggests the column data. You can relate it to the hash map exist in Java. The Column family is very manageable as it provides one rows having a hundred of columns while the others provide just 2 columns. There are no limitations to list of columns.

What do you mean by SS Table and explain how it is different from the other original tables?

SS Table stands for Sorted String Table which indicates the presence of an important file in Cassandra and it accepts the repeated number of written memtables. These memtables are stockpiled on disk. It remains for every Cassandra table. A main feature of the SS Table is that it provides stability to the data files as it does not allow any changes once the data is written. Moreover, Cassandra generates three split files. These files are like bloom filter, partition summary and partition index.

Is it possible to add or delete Column Families in a working group?

Yes, it is possible to add or delete Column Families in a working group but before doing it, there has some precaution or procedure that the client has to follow. These precautions are: –

Very first, users must assure that the commit log is clear and it can be done by ‘node tool drain’.
No data should be left in the commit log. For this Cassandra has to be turned off.
Lastly, it is vital to delete the SS Table files for the raised CFs.

What are the Key Differences between Cassandra and Traditional RDBMS?

What are the different Database Elements of Cassandra?

There are 4 main Cassandra Database Elements:

What are durable writes?

Durable Writes provides a means to instruct Cassandra whether to use commitlog for updates on the current KeySpace or not.
This option is not mandatory. The default value for durable writes is TRUE.

What do you mean by replication factor?

Cassandra stores copies (called replicas) of each row based on the row key. The replication factor refers to the number of nodes that will act as copies (replicas) of each row of data.

What do you mean by replication Strategy?

The replica placement strategy refers to how the replicas will be placed in the ring
There are different strategies that ship with Cassandra for determining which nodes will get copies of which keys
There are mainly two types of Strategies:

Simple Strategy
Network Topology Strategy

What is Simple Strategy?

It uses Simple Single Datacenter Clusters. It places the first Replica on a node determined by the Partitioner. Additional Replicas are placed on the next nodes in clockwise (in a Ring) manner without considering Rack or Datacenter location.

What is a Primary Key? And what are it’s different types?

The Primary Key is a column that is used to uniquely identify a row

There are 3 types of Primary Keys:

- Single Primary Key
- Compound Primary Key
- Composite Partitioning Key

Differentiate between Static and Dynamic CQL Tables.

A Static Table uses a relatively static set of column names and is similar to Relational Database Table.
A dynamic table allows you to pre-compute result sets and stores them in a single row for efficient data retrieval.

Differentiate between Drop and Truncate in CQLSH

The Drop table command drops specified table including all the data from the keyspace.
The Truncate table command is used to truncate a table and deletes all the rows of the table permanently.

What is Gossip Protocol?

Gossip Protocol in Cassandra is a peer-to-peer communication protocol in which nodes can choose among themselves with whom they want to exchange their state information. The nodes exchange information about themselves and about the other nodes that they have gossiped about, so all nodes quickly learn about all other nodes in the cluster.

How does gossip Protocol Work?

What do you mean by Snitch? Name a few

A snitch determines which datacenters and racks, nodes belong to. They inform Cassandra about the network topology and allows Cassandra to distribute replicas specifically, the Replication strategy places the replicas based on the information provided by the new snitch.

There are many types of snitches, to name a few:

Dynamic snitching
SimpleSnitch
RackInferringSnitch
Ec2Snitch
PropertyFileSnitch
GossipingPropertyFile
Ec2MultiRegionSnitch
GoogleCloudSnitch
CloudstackSnitch

How does Cassandra perform write operations?

When write request comes to the node:

Firstly, it logs in the Commit Log. Data will be captured and stored in the Mem-Table.
When mem-table is full, data is flushed to the SSTable data file.

All writes are automatically partitioned and replicated throughout the cluster Cassandra periodically consolidates the SSTables, discarding unnecessary data.

Explain the terms Memtable, CommitLog and SSTables.

Commit log: The Commit log is a crash-recovery mechanism that supports Cassandra’s durability goals
MemTable: MemTable is an in-memory data structure that corresponds to a CQL table
SSTable: The contents of the memtable are flushed to disk in a file called an SSTable.

What is the use of Coordinator Node in Read?

Read Operation is easy because clients can connect to any node in the cluster to perform reads. If a client connects to a node that doesn’t have the data it’s trying to read, the node it’s connected to will act as the coordinator node.

Explain the different types of Repairs.

Anti Entropy: Anti-entropy Repair is a process of comparing the data of all replicas and updating them with the newest version of data using Merkle Tree. Anti-entropy repair is triggered manually. It has two phases to the process:
- Building a Merkle tree for each replica
- Comparing the Merkle trees to discover differences

Anti-entropy repair is very useful and is often recommended to run periodically to keep data in sync.

Read Repair: Read Repair is the process of fixing inconsistencies among the replica nodes at the time of read request. In a read operation, if some nodes respond with data that is inconsistent with the response of newer nodes, a Read Repair is performed on the old nodes. It ensures consistency throughout the node ring. Done by pulling all of the data from the node and performing a merge, and then writing it back to the nodes that were out of sync.
Nodetool Repair: Nodetool repair command against a node, initiates repair for some range of tokens. The range being repaired depends on what options are specified. The default options, just calling “nodetool repair”, initiate a repair of every token range owned by the node
Full Repair: Full Repairs operate over all of the data in the token range
Incremental Repair: Incremental Repair only repairs the data that’s been written since the previous incremental repair. Incremental repairs are the default repair type, and if run regularly, can significantly reduce the time and I/O cost of performing a repair. It splits the data into repaired and unrepaired SSTables, and only repairs unrepaired data. Once an incremental repair marks data as repaired, it won’t try to repair it again. Incremental Repair is not recommended instead Full Repair should be performed.

Explain Nodetool Utility.

The Nodetool Utility is a command-line utility that comes out of the box with Cassandra and is a great tool for administration and monitoring. It communicates with JMX to perform operational and monitoring tasks exposed by MBeans.

What are Roles in CQLSH?

Roles enable authorization management on a larger scale than security per user can provide. A role is created and may be granted to other roles. Hierarchical sets of permissions can be created with the help of it.

What is Python Stress test in Cassandra?

Cassandra comes with a popular utility called py_stress that can be used to run a stress test on Cassandra cluster. The Cassandra-stress tool is a Java-based stress testing utility for basic benchmarking and load testing a Cassandra cluster. This is an effective tool for populating a cluster and stress testing CQL tables and queries.

So, this brings us to the end of the Apache Cassandra Interview Questions blog.This Tecklearn ‘Top Apache Cassandra Interview Questions and Answers’ helps you with commonly asked questions if you are looking out for a job in Apache Cassandra or Big Data Domain. If you wish to learn Apache Cassandra and build a career in Big Data domain, then check out our interactive, Apache Cassandra Training, that comes with 24*7 support to guide you throughout your learning period.

https://www.tecklearn.com/course/apache-cassandra-training/

Apache Cassandra Training

About the Course

Take your career to the next level as a certified Apache Cassandra developer by acquiring all the skills through our hands-on training sessions. Tecklearn’s Apache Cassandra Certification Training is designed by professionals as per the industry requirements and demands. This Cassandra Certification Training helps you to master the concepts of Apache Cassandra including Cassandra Architecture, its features, Cassandra Data Model, and its Administration. Our Cassandra certification training course lets you master the high availability NoSQL distributed database.

Why Should you take Apache Cassandra Training?

The average salary of a Software Engineer with Apache Cassandra skill is $120,500 per year. – Payscale.com
Cassandra is in use at Constant Contact, CERN, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, The Weather Channel, and over 1500 more companies that have large, active data sets.
Apache Cassandra is one of the most widely used NoSQL database. It offers features such as Fault Tolerance, Scalability, Flexible Data Storage and its efficient writes, which makes it the perfect database for various purposes.

What you will Learn in this Course?

Introduction to Big Data, and Cassandra

What is Big Data
Limitations of RDBMS
NoSQL and it’s Characteristics
CAP Theorem
Basic concepts of Cassandra
Features of Cassandra

Cassandra Data model, Installation and setup

Installation of Cassandra
Key concepts and deployment of non-relational database, column-oriented database, Data Model – column, column family

Cassandra Architecture

Explain the Architecture of Cassandra
Different Layers of Cassandra Architecture
Partitioning and Snitches
Explain Vnodes and How Read and Write Path works
Understand Compaction, Anti-Entropy and Tombstone
Describe Repairs in Cassandra

Deep Dive into Cassandra Database

Describe Different Data Types Used in Cassandra
Explain Collection Types
Describe What are CRUD Operations
Implement Insert, Select, Update and D elete of various elements
Implement Various Functions Used in Cassandra
Describe Importance of Roles and Indexing

Backup & Restore and Performance Tuning

Learn backup and restore functionality and its importance
Create a snapshot using Nodetool utility
Restore a snapshot
Understand how to choose the right balance of the following resources: memory, CPU, disks, number of nodes, and network.
Understand all the logs created by Cassandra
Explain the purpose of different log files
Configure the log files
Learn about Performance Tuning
Integration with Spark and Kafka

Advance Modelling

Rules of Cassandra data modelling
Modelling data around queries
Creating table for data queries

Deploying the IDE for Cassandra applications

Learning key drivers
Deploying the IDE for Cassandra applications and cluster connection
Data query implementation

Cassandra Administration

Understanding Node Tool Utility
Cluster management using Command Line Interface
Management and Monitoring using DataStax Ops Center

Cassandra API and Summarization

Cassandra client connectivity
Connection pool internals
Cassandra API
Features and concepts of Hector client
Thrift, JAVA code and Summarization

Got a question for us? Please mention it in the comments section and we will get back to you.

628