What is HBase and how it is useful in Big Data Environment?
HBase - Distributed database for Big Data
HBase is open source, non-relational, column oriented, distributed database management system on the top of Hadoop HDFS. This database is NoSQL database management system which modeled after Google's Big Table. HBase server is written in Java and depends on the Hadoop libararies.
HBase uses underline HDFS file system of Hadoop to store the database data. HBase is well suited for the the parse data sets which is very common in Big Data environment. HBase is non-relational database and does not support SQL query for performing data operations. HBase comes with hbase shell which can be used for inserting, updating, searching and deleting the data in HBase database.
HBase is a project under Apache Hadoop project, which is developed at Apache Software Foundation and the goal of this project is to provide Big Table capabilities to Hadoop system. HBase enables the users to store data in fault-tolerant way irrespective of data size and volume.
It also provides the functionality of random, fast access of data based on the query/criteria. It uses full feature of Hadoop and uses map reduce technology internally to run search query.
The Analytics system can be further used for getting the real value of data in real time by generating meaningful reports. There are technologies and software system for generating various type data processing and report generation.
How data is stored in HBase?
HBase is columnar database which stores the data in rows and columns much like a traditional database management system. Data in the HBase is stored with a primary key which also known as row key. The row key is used for any access or update/insert operations. A record in HBase is defined by row key as primary key and data is further grouped into column family. In the column family data is stored in key value pair.
In HBase we should first define a schema such as:
Table Name: testtable
|Row Key||Column Family 1||Column Family 2|
Later on you can add new column family also at any time. Schema of HBase is flexible and can change to adopt the application requirements.
Node Types in HBase?
BHase has two types of nodes, one Master Node and many region servers. Master node manages all the region serves, region servers stores part of tables and perform the work of managing data. Master node is very important in HBase as it is the single point of failure in the system. HBase system is sensitive to the loss of master node. So, its important to have good server for the Master node in the BHase system.
Next: HBase shell tutorial