Overview of HBase, its Advantages, Features and history

Last updated on May 30 2022
Sonali Singh

Table of Contents

Overview of HBase, its Advantages, Features and history

Introduction to HBase

Since 1970, RDBMS is that the solution for data storage and maintenance related problems. After the advent of big data, companies realized the advantage of processing big data and began opting for solutions like Hadoop.

Hadoop uses distributed file system for storing big data, and MapReduce to process it. Hadoop excels in storing and processing of gaint data of various formats such as arbitrary, semi-, or even unstructured.

Limitations of Hadoop

Hadoop can perform only execution, and data will be accessed only in a sequential manner. That means one has to search the entire dataset even for the simplest of jobs.

A gaint dataset when processed results in another gaint data set, which should also be processed sequentially. At this point, a new solution is needed to access any point of data in a single unit of time (random access).

Hadoop Random Access Databases

Applications such as HBase, Cassandra, couchDB, Dynamo, and MongoDB are some of the databases that store gaint amounts of data and access the data in a random manner.

What is HBase?

HBase may be distributed column-oriented database built on top of the Hadoop file system. It may ben open-source project and is horizontally scalable.

HBase may be data model that is similar to Google’s big table designed to provide quick random access to gaint amounts of structured data. It leverages the fault tolerance provided by the Hadoop File System (HDFS).

It may be part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.

One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the Hadoop File System and provides read and write access.

image001 27
HBase

HBase and HDFS

HDFS HBase
HDFS may be distributed file system suitable for storing large files. HBase may be database built on top of the HDFS.
HDFS does not support fast individual record lookups. HBase provides fast lookups for larger tables.
It provides high latency execution; no concept of execution. It provides low latency access to single rows from billions of records (Random access).
It provides only sequential access of data. HBase internally uses Hash tables and provides random access, and it stores the data in indexed HDFS files for faster lookups.

Storage Mechanism in HBase

HBase may be column-oriented database and the tables in it are sorted by row. The table schema defines only column families, which are the key value pairs. A table have multiple column families and each column family can have any number of columns. Subsequent column values are stored contiguously on the disk. Each cell value of the table has a timestamp. In short, in an HBase:

  • Table may be collection of rows.
  • Row may be collection of column families.
  • Column family may be collection of columns.
  • Column may be collection of key value pairs.

Given below may ben example schema of table in HBase.

Rowid Column Family Column Family Column Family Column Family
col1 col2 col3 col1 col2 col3 col1 col2 col3 col1 col2 col3
1
2
3

Column Oriented and Row Oriented

Column-oriented databases are those that store data tables as sections of columns of data, rather than as rows of data. Shortly, they will have column families.

Row-Oriented Database Column-Oriented Database
It’s suitable for Online Transaction Process (OLTP). It’s suitable for Online Analytical Processing (OLAP).
Such databases are designed for small number of rows and columns. Column-oriented databases are designed for gaint tables.

The following image shows column families in a column-oriented database:

image002 31
Column Oriented and Row Oriented

HBase and RDBMS

HBase RDBMS
HBase is schema-less, it doesn’t have the concept of fixed columns schema; defines only column families. An RDBMS is governed by its schema, which describes the whole structure of tables.
It’s built for wide tables. HBase is horizontally scalable. It’s thin and built for small tables. Hard to scale.
No transactions are there in HBase. RDBMS is transactional.
It has de-normalized data. It will have normalized data.
It’s good for semi-structured as well as structured data. It’s good for structured data.

Features of HBase

  • HBase is linearly scalable.
  • It has automatic failure support.
  • It provides consistent read and writes.
  • It integrates with Hadoop, both as a source and a destination.
  • It has easy java API for client.
  • It provides data replication across clusters.

Where to Use HBase

  • Apache HBase is used to have random, real-time read/write access to Big Data.
  • It hosts very large tables on top of clusters of commodity hardware.
  • Apache HBase may be non-relational database modeled after Google’s Bigtable. Bigtable acts up on Google File System, likewise Apache HBase works on top of Hadoop and HDFS.

Applications of HBase

  • It’s used whenever there may be need to write heavy applications.
  • HBase is used whenever we need to provide fast random access to available data.
  • Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase internally.

HBase History

Year Event
Nov 2006 Google released the paper on BigTable.
Feb 2007 Initial HBase prototype was created as a Hadoop contribution.
Oct 2007 The first usable HBase along with Hadoop 0.15.0 was released.
Jan 2008 HBase became the sub project of Hadoop.
Oct 2008 HBase 0.18.1 was released.
Jan 2009 HBase 0.19.0 was released.
Sept 2009 HBase 0.20.0 was released.
May 2010 HBase became Apache top-level project.

 

So, this brings us to the end of blog. This Tecklearn ‘Overview of HBase , it’s Benefits , Features and history’ helps you with commonly asked questions if you are looking out for a job in HBase and No-SQL Database Domain.

If you wish to learn HBase and build a career in HBase or No-SQL Database domain, then check out our interactive, Apache HBase Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/apache-hbase-training/

Apache HBase Training

About the Course

Tecklearn Apache HBase training will master the powerful NoSQL distributed database. You will learn HBase architecture, data analytics using HBase, integration with Hive, monitoring cluster using ZooKeeper and working on real-life industry projects. Build your career as a certified HBase professional through our hands-on training with real-world examples. Upon completion of this online training, you will hold a solid understanding and hands-on experience with Apache HBase.

Why Should you take Apache HBase Training?

  • HBase is now the largest data-driven service serving top websites including Facebook Messaging Platform.
  • There is Strong demand for HBase qualified professionals and they are paid big bucks for the right skills.
  • According to indeed.com, the average pay of an HBase developer stands at $81,422 per annum.

What you will Learn in this Course?

Introduction to HBase and NoSQL

  • Introduction to HBase
  • Fundamentals of HBase
  • What is NoSQL
  • NoSQL Vs RDBMS
  • Why HBase
  • Where to use HBase

HBase Data Modelling

  • Data Modelling
  • HDFS vs. HBase
  • HBase Use Cases

HBase Architecture and Components

  • HBase Architecture
  • Components of HBase Cluster

HBase Installation

  • Prerequisites for HBase Installation
  • Installation Steps

Programming in HBase

  • Create an Eclipse Project for HBase
  • Simple Table Creation from Java in HBase
  • HBase API
  • HBase Shell
  • Primary operations and advanced operations

Integration of Hive with HBase

  • Create a table and insert data into it
  • Integration of Hive with HBase
  • HBase Mapping

Deep Dive into HBase

  • Input Data into HBase
  • File Loading
  • HDFS File
  • HBase handling files in File System
  • WAL
  • Seek Vs Transfer
  • HBase ACID Properties

Got a question for us? Please mention it in the comments section and we will get back to you.

 

0 responses on "Overview of HBase, its Advantages, Features and history"

Leave a Message

Your email address will not be published. Required fields are marked *