In this article we have discussed on the key elements of HBase and its history.
Apache HBase is a non-relational, open source database made to run on Hadoop Distributed File System (HDFS), HBase is written in Java and is particularly known for allowing access to large volumes of sparse data. Sparse data which refers to small but valuable information detectable within a huge volume of unimportant data has become extremely important for Big Data analytics. The features of HBase include in-memory computing, compression, and per-column Bloom filters. Along with these features it also adds abilities of transaction to Hadoop and thus facilitate updating, inserting and deleting. For real time read and write capability with your Big Data operation HBase offers a useful platform. The motivation behind the creation of HBase was basically hosting large data tables comprising billions of rows and columns across diverse commodity hardware. Google’s Big Table shown the path and within a short span of time HBase began to be developed.
HBase modeled after Google's BigTable way back in 2006 had been officially released in the end of 2007. Written in Java it was developed as part of Apache Hadoop project from Apache Software Foundation. It can provide capabilities similar to Big Table for Hadoop. Which means it is capable to provide a fault-tolerant way of storing huge volume of sparse data. Some of the key features of HBase include:
- Linear as well as modular scalability
- Highly fault tolerant storage for storing large quantities of sparse data.
- Highly flexible data model.
- Automatic sharding allows HBase tables to be distributed on the cluster of devices via regions. The regions get further split and re-distributed as the data grows.
- Easy to use, accessible Java API.
- HBase supports Hadoop and HDFS.
- HBase supports parallel processing via MapReduce.
- Almost real time lookups.
- Automatic failover allowing high availability.
- For high amount of query optimization HBase offers support to both Block Cache and Bloom Filters.
- Filters and co-processors allow massive server side processing.
- HBase allows massive Replication throughout the data center.
The coming of HBase as a highly scalable database system with ability to handle massive volume of sparse data incorporated big change in the Big Data landscape. Relational database management systems (RDBMSes) have already been there from as early as 1970s. They have helped too many companies and organizations implementing data centric solution to given problems. These relational databases are equally helpful today in variety of environments and use cases. For an array of use cases relational model offers the perfect aid, but there are also arrays of problems that cannot be solved with this model. HBase came as the most advanced archetype of this data model so far.
Google almost a decade ago first came up with its paper on Big Table and the HBase development started in 2006. In the beginning an initial prototype of HBase was created as a contributing data model for the Hadoop. That was in the beginning of the year 2007 and in the end of the same year first usable HBase was released. During 2008, Hadoop earned focus as the top level Apache project and HBase became the subproject. The subsequent HBase version 0.18 and 0.19 was released in October 2008. In 2010, it became top level Apache project and in 2011 HBase 0.92 was released. The latest version currently in use is HBase 0.96.