History of Hadoop

History of Hadoop


Posted in : Big Data Posted on : May 3, 2016 at 3:16 PM Comments : [ 0 ]

In this article we have explained the backdrop and history of Hadoop. Hadoop is distributed file system for Big data that can span through thousands of computer.

The origination of Hadoop can be attributed to the Google File System paper published in October 2003. The said paper is also related to another Google research paper, namely the MapReduce. Named as Simplified Data Processing on Large Clusters the development began in the Apache Nutch project and was subsequently moved under the new Hadoop subproject. That was way back in January 2006, almost a decade before. Doug Cutting, the mastermind behind Hadoop who was working that time at Yahoo! named it as Hadoop following the name of the toy elephant of his son. Hadoop 0.1.0 was released following the contribution of Owen O'Malley in April 2006 and from then on it continued to evolve through the valuable contributions of many contributors in the now famous Apache Hadoop project.

We all know that Hadoop was particularly created for managing Big Data. Google as the most popular search engine and repository of online information had worked as the launching ground for Hadoop. For delivering enormous variety of keyword specific search results to users Google needed storage for overwhelming volume of data. Throughout 1990s, Google continued to search for smarter and scalable ways to store and process this ever increasing amount of data. At last in 2003 they came up with a new Big Data storage system called GFS or Google File System equipped to store and process huge amount of data. In the following year they again made a breakthrough with another data processing technique called MapReduce, which made processing the GFS contained data easier than ever before.

All these data processing techniques at that time had come as massive breakthroughs that will shape the processing and storage of huge volume of digital data in the time to come. In the beginning these new data storage technologies were presented theoretically as white papers for the knowledge of the interested people. Following this move by Google in 2006-07 Yahoo as another major search engine built techniques called HDFS and MapReduce based on the knowledge shared by Google on the white papers. This in the long run made HDFS and MapReduce as the two quintessential concepts of Hadoop.

To sum up the history, Hadoop was actually conceptualized and developed by Doug Cutting. The story behind the choosing of the name and logo of Hadoop is no less interesting. Doug Cutting preferred to name and logo of this new data processing technology after the toy elephant of his little son. Many people also believe that the elephant as the logo symbolizes the effective solution for Big Data. The name and logo apart the history of Hadoop to this present day shows how over the years it had been the mainstay in Big Data landscape giving birth to an array of subsequent data centric technologies and methodologies.

Go to Topic «PreviousHomeNext»

Your Comment:


Your Name (*) :
Your Email :
Subject (*):
Your Comment (*):
  Reload Image
 
 

 
Tutorial Topics