The Apache Software Foundation announced the release of Apache Hadoop version 1.0 on January 4. Hadoop, the software framework for analyzing massive amounts of data, graduated to the version 1.0 designation after six years of gestation. Hadoop’s principal attribute consists of the capability to process massive amounts of data on clusters of computing nodes in parallel. Even though Hadoop stores data evenly across all computing nodes, it distributes the task of processing across the nodes and then synthesizes the results of data processing.
Version 1.0 represents a milestone in terms of scale and enterprise-readiness. The release already features deployments as large as 50,000 nodes and marks the culmination of six years of development, testing and feedback from users and data scientists. Key features of the 1.0 release include the following:
• The integration of Hadoop’s big data table, HBase
• Performance enhanced access to local files for HBase
• Kerberos-based authentication to ensure security
• Support for webhdfs with a read/write HTTP access layer
• Miscellaneous performance enhancements and bug fixes
• All version 0.20.205 and prior 0.20.2xx features
Hadoop represented the Big Data story of 2011 insofar as it was incorporated into almost every 2011 Big Data product revealed by enterprises and startups alike. The release of Apache Hadoop 1.0 promises to propel Hadoop even closer to the enterprise and transition the software framework from a tool for supporting web data into an enterprise-grade Big Data software infrastructure, more generally.