Apache Hadoop 2.0.2-alpha Release Features Enhancements To HDFS HA & YARN

The Apache Hadoop community of developers recently announced the release of the second alpha release of Apache Hadoop known as Apache Hadoop 2.0.2-alpha. The release features enhancements such as the following:

•HDFS HA enhancements including support for automated failover using ZooKeeper and support for security
•YARN testing and stabilization

YARN, which stands for “Yet Another Resource Negotiator,” provides a framework for creating distributed processing applications and infrastructures. YARN additionally provides an apparatus for scheduling requests of resources such as CPUs and manages the execution of such requests. Harsh Chouraria’s excellent blog post explains how YARN differs from MapReduce 2.0 by noting that “YARN is a generic platform for any form of distributed application to run on, while MR2 is one such distributed application that runs the MapReduce framework on top of YARN.” YARN has already been deployed on massive clusters totaling almost 6000 nodes at Yahoo.

The Hadoop community is now close to the release of Hadoop-2.x sometime early in 2013, which will feature final tweaks on functionality such as:

•HDFS without shared storage
•YARN ResourceManager
•YARN scheduling enhancements

Developers can download the latest Hadoop release from the Apache Hadoop page or Hortonworks Data Platform 2.0 Alpha, the latter of which integrates with additional frameworks such as Apache Pig and Apache Hive.


Metacloud Comes Out Of Stealth To Offer OpenStack-based IaaS Private Cloud Solution

Today, Metacloud emerged from stealth mode to announce the general availability of an enterprise-grade, OpenStack-based IaaS solution backed by Storm Ventures and former Yahoo CEO Jerry Yang’s AME Cloud Ventures. Metacloud’s offering targets enterprises seeking to deploy IaaS private clouds that afford them the performance and scalability of large public clouds while additionally providing the flexibility, security and privacy specific to a private cloud environment. The Metacloud solution is based on the Essex version of OpenStack although it borrows from Folsom, the latest OpenStack release, as well. The product claims the following key attributes and differentiators:

•High availability that automates failover in order to ensure continued operation and service delivery.
•Automated scalability features that allows IT executives to elastically add to existing resources at both a computational and storage level.
•High performance based on an optimized path to the OpenStack infrastructure’s object storage component.
•Network optimization designed to preserve network performance while preserving the separation of secure tenants.

Metacloud’s CEO Steve Curry spoke of the vision for and place of Metacloud in the cloud landscape as follows:

We founded Metacloud with the firm belief that cloud is not a buzz word, nor should it be hard to deploy and manage. Businesses can’t wait for months or years for results and cloud implementations shouldn’t take that long. Despite popular misconceptions, IT organizations can have the best of both worlds — a cloud customized to their requirements, deployed quickly and cost effectively. IT organizations shouldn’t have to keep buying proprietary hardware or software that locks them into an increasingly expensive cycle. Nor should they have to put up with costly public cloud services that can’t scale with their data center operations or business needs.

Curry makes three points about Metacloud that are worthy of elaboration:

•Metacloud enables enterprises to build a cloud “customized to their requirements” that concomitantly allows for quick deployments and cost savings.
•Metacloud allows customers to avoid vendor lock-in and vendor requirements to purchase proprietary hardware.
•Metacloud allows customers to scale their deployments in ways that respect and recognize the specificity of their IT needs as unique enterprises with correspondingly unique IaaS needs.

Metacloud already boasts production-level enterprise deployments such as Tableau Software and another Fortune 100 company. Metacloud was co-founded in 2011 by Steve Curry and Sean Lynch. Curry formerly managed global storage operations at Yahoo! where he was responsible for “hundreds of petabytes of content and user data.” Prior to Metacloud, co-founder Sean Lynch was Sr. VP of Technology Operations at Ticketmaster Entertainment, where he played a pivotal role in engineering the Ticketmaster platform that currently manages billions of dollars in transactions per year.

Midokura Enters U.S. Market For IaaS Virtualized Networking

Midokura today announced its official entry into the U.S. market with a network virtualization offering geared towards Infrastructure as a Service (IaaS) platforms. Midokura was founded in 2010 with the intent of building a public cloud in Japan. The company’s cofounders Tatsuya Kayo and Dan Mihai Dumitriu quickly recognized the challenge of building cost-effective, efficient networking infrastructures for cloud platforms and decided to focus on networking solutions for IaaS. Midokura’s virtualization platform, Midonet, decouples the cloud from network hardware by creating an abstraction layer that lies between the physical network and hosts. The resulting environment enables the following benefits:

•Simplified network infrastructure
•High availability
•High scalability
•Simplified and reduced network protocols for data transmission
•Network optimization
•Improved fault tolerance

Midonet delivers a “distributed, de-centralized, multi-layer software defined virtual network solution” that overlies an IP connected network and thrusts network intelligence to the network’s edge. Midonet is integrated with the Essex version of OpenStack and features a plugin for Quantum in addition to Nova network drivers. Midonet’s tight integration with OpenStack means that the product is ready for deployment in enterprise-grade OpenStack installations all over the world.

Jonathon Bryce, Executive Director of the OpenStack Foundation, remarked on Midokura’s contribution to OpenStack as follows:

Companies like Midokura are growing the OpenStack user base by advancing the technology and offering real deployment options. The team that built MidoNet has been active in the OpenStack community, and have been consistently contributing to the project. The ability to extend the OpenStack platform with pluggable networking technologies allows greater innovation and choice for our users.

Because Midokura has “been consistently contributing” to the OpenStack project, it stands well poised to cater to the needs to OpenStack deployments that have differentially drawn upon the OpenStack product and its different versions for their own customized installations. Midokura’s team includes engineers from Amazon, DreamHost, Fulcrum Microsystems, Google, NEC and NTT. The company’s core product, Midonet, will target both cloud service providers as well as enterprises.

Zenoss Survey Reveals Open Source Cloud Solutions Still Gaining Market Traction

Zenoss recently revealed the results of a survey about the state of open source cloud adoption based on a poll of more than 100,000 “community members” that engaged topics such as:

•Open source clouds deployed today
•Current open source implementations
•Aspects of an open source cloud that are important
•Decision drivers to migrate to an open source cloud

The survey also featured questions on Platform as a Service adoption amongst Zenoss’s community members that more generally examined the state of PaaS within the cloud landscape, whether from an open source vendor or otherwise.

Highlights of Zenoss’s survey about open source cloud adoption include the following:

•Only 17.2% of all cloud deployments are currently open source. If true, this figure means that OpenStack has a lot of work to do if it hopes to challenge the market dominance of Amazon Web Services and other proprietary cloud solutions. 17.2% constitutes a slender percentage of cloud installations, and OpenStack claims roughly 50% of that percentage according to Zenoss, meaning that its total market share in the cloud space is approximately 8.6%.

•Openstack leads the pack in terms of open source cloud deployments with more than 50.5% of surveyed installations. Cloudstack and Eucalyptus take second and third place with 18.3% and 9.2%, respectively.

•Even though OpenStack claims more than 50% of all open source cloud deployments, the race for market leadership in the open source cloud space still remains wide open. Despite all of the press OpenStack has received both regarding product releases and its Foundation, it still has yet to command the open source cloud market.

•Open source product maturity and lack of support were designated as the key factors mitigating against open source cloud adoption.

•Zenoss claims Google Apps leads PaaS adoption with 51.6% of the market in comparison to 18.7% for Microsoft Azure, 15.4% for VMware’s Cloud Foundry and 14.3% for Red Hat’s OpenShift. These results indicate that many of the other well known PaaS players such as Heroku or Engine Yard have either yet to gain significant traction in terms of numbers of installations, or otherwise that the survey fails to provide as granular a picture of the PaaS space as one might hope.

Data about cloud deployments is tough to come by, so kudos to Zenoss for conducting and releasing the results of this survey. The results of the survey should serve as a welcome point of reference for future debates and discussions about the state of open source cloud products within the larger universe of cloud software and platforms.

See the results of the survey below:

Zenoss Survey On Open Source Cloud Adoption

William Blair Reports Rackspace Stands Poised To Close Wal-Mart Big Data Deal

Research firm William Blair reported that Rackspace is likely to win business from Wal-Mart for the purpose of Big Data analytics in the retail sector. According to William Blair analyst Jim Breen, Wal-Mart is hiring OpenStack technical resources and outsourcing cloud-related services to Rackspace. Wal-Mart is reportedly in the process of combining its EMC and IBM-based retail data platforms into one aggregated Big Data platform. Breen wrote that Wal-Mart’s ten online portals currently use segregated data siloes and that the larger corporate vision is to combine these discrete platforms into one massive data repository that enables richer insights about consumer behavior and operations. Breen spoke of the significance of Rackspace’s collaboration with Wal-Mart by noting:

From a broad perspective, we believe Rackspace’s ability to gain traction with Wal-Mart for big data reflects early success of the OpenStack platform and foreshadows new market opportunities.

What is surprising about Breen’s report is that, while Rackspace is recognized as an OpenStack founder and visionary, the San Antonio-based company is less well known as a key player in the Big Data space. Rackspace may be planning to count on leveraging its ability to aggregate web-based data using OpenStack object and block storage as the infrastructure for a Big Data platform for Wal-Mart, but details of the Big Data analytic and querying tools it plans to use for the collaboration have yet to emerge. In any case, shares of Rackspace closed up 2% in early trading on Wednesday after the William Blair announcement. Rackspace shares are up nearly 60% for the entire year.

Trifacta Closes $4.3 Million In Series A Funding; Seeks To Make Big Data Insights More Accessible

This Thursday, Trifacta came out of stealth mode by announcing $4.3 million in Series A funding led by Accel Partners, with additional participation from X/Seed Capital, Data Collective and angel investors Dave Goldberg, Venky Harinarayan and Anand Rajaraman. Trifacta’s mission is to “radically enhance productivity for data analysis” by delivering a solution catered to the human resources responsible for gleaning business significance out of data analysis. Based on the premise that the cost of skilled data analysts continues to rise while the costs of storage and computation become progressively lower, Trifacta intends to enhance the ability of analysts to more effectively manipulate, mine and derive insights from massive amounts of structured data. In an interview with VentureBeat, TriFacta’s co-founder and CEO Joe Hellerstein elaborated on the company’s mission as follows:

There is a lot of talk about engines and algorithms for unlocking value in data. But real value comes from the people who drive the analysis. The question is how you get data into the form where people can get some value out of it.

Similarly, Ping Li, head of Accel’s Big Data fund elaborated on his fund’s interest in Trifacta by noting:

The world doesn’t need another Hadoop or SQL company. The biggest problem with big data is around the ability to get information out of it. That gap is huge, and it’s not going to be solved anytime soon. This is really the soft underbelly of big data right now.

Hellerstein and Ping Li both point to the importance of facilitating access to business insights from Big Data in contrast to merely delivering an enterprise grade storage solution. Trifacta was founded as a result of collaborations between computer scientists at UC Berkeley and Stanford University. The company’s leadership team features cofounder Joe Hellerstein, former Professor of Computer Science at UC Berkeley, Jeffrey Heer, Co-Founder & Chief Experience Officer and Sean Kandel as CTO, whose Ph.D. dissertation research at Stanford University examined interactive products for manipulating data. CXO Jeffrey Heer is also an Assistant Professor of Computer Science at Stanford University, where he leads the Stanford Visualization Group. Specific details of the company’s solutions remain under wraps at present, though Trifacta’s website reports that the company is busily preparing details of solutions for public release while it gears up for a round of aggressive hiring.

RainStor Finalizes $12 Million In Series C Funding

This week, RainStor announced the finalization of $12 million in Series C funding from Credit Suisse and Rogers Venture Partners, with additional participation from existing investors Doughty Hanson Technology Ventures, Storm Ventures and The Dow Chemical Company. RainStor plans to use the funding to enhance product development and further develop its sales and marketing team. RainStor has two “editions” of a Big Data product that enables enterprises to more effectively store and conduct analytics on massive amounts of structured and unstructured data: Big Data Retention and Big Data Analytics On Hadoop. Big Data Retention allows enterprises to effectively store and access massive amounts of historical data that is used less frequently than mission critical data stores. RainStor’s Big Data Analytics On Hadoop empowers enterprises to perform analytics on petabytes of structured data.

RainStor’s Big Data solutions can be flexibly deployed across a number of IT infrastructures including SAN, NAS, CAS and cloud-based platforms. Moreover, its Big Data products allow for queries using SQL, popular BI products and MapReduce when running on the Hadoop Distributed File System. RainStor’s Big Data platform leverages patented compression technology to store and retrieve massive amounts of structured data at low cost. Peter Norley, Managing Director at Credit Suisse, remarked on the importance of RainStor’s Big Data platform to the financial services industry as follows:

Driven by compliance regulations, banks and financial institutions are now required to retain and analyze petabytes of data. Compounded by rapid growth, current needs exceed the capacity of existing database and data warehouse environments. RainStor has built a unique combination of database capabilities that have proven essential for financial institutions in order to sustain growth levels in the most cost effective way, while meeting regulatory needs.

Here, Norley elaborates on how RainStor’s offering enables financial institutions to comply with regulations that dictate the preservation of massive amounts of data. Compliance regulations dictate that enterprises confront data storage needs that exceed the capacities of current warehousing options and consequently require a Big Data offering such as RainStor’s. Used by over 100 enterprises for Big Data management and analytics, RainStor’s Big Data platform stands poised to build on its unique branding as a nimble, cost effective, customer-centric Big Data platform with “the highest level of compression on the market” in addition to advanced querying capabilities.