blog image

HBase

HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable. HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data. It leverages the fault tolerance provided by the Hadoop File System (HDFS). One can store the data in HDFS either directly or through HBase. Data consumers read/access the data in HDFS randomly using HBase. HBase sits on top of the Hadoop File System and provides read and write access.

HBase provides a dual approach to data access. While it's row key based table scans provide consistent and real-time reads/writes, it also leverages Hadoop MapReduce for batch jobs. This makes it great for both real-time querying and batch analytics. Hbase also automatically manages sharding and failover support.

Why do we need this technology and what is the problem that it is solving?

HBase is a very progressive NoSQL database that is seeing increased use in today’s world that is overwhelmed with Big Data. It has a very simple Java programming roots which can be deployed for scaling HBase on a big scale. There are a lot of business scenarios wherein we are exclusively working with sparse data which is to look for a handful of data fields matching a certain criteria within data fields that are numbering in the billions. It is extremely fault-tolerant and resilient and can work on multiple types of data making it useful for varied business scenarios.