MapR Technologies today announced a partnership with HP Vertica that integrates the HP Vertica Analytics Platform with MapR’s enterprise-grade distribution of Apache Hadoop. As a result of the partnership, users of the HP Vertica Analytics Platform on MapR have the capability to leverage the SQL capabilities of the HP Vertica Analytics Platform against data stored in Hadoop clusters. The HP Vertica Analytics Platform constitutes yet another “SQL-on-Hadoop” solution that competes with the likes of Apache Hive, Concurrent’s Lingual, Cloudera’s Impala, Hadapt and the Hortonworks Stinger initiative. As noted in GigaOm, MapR itself leads Apache Drill, an open source initiative to develop a highly scalable, SQL-based interactive query engine for Apache Hadoop, but clearly made a strategic decision to expand the range of users of its Hadoop distribution by partnering with HP Vertica. Today, MapR also announced the release of the latest version of its Hadoop distribution featuring support for Hadoop 2.2 and YARN. Notably, users running Hadoop 1.x can take advantage of YARN’s resource management abilities to preview the functionality of YARN before upgrading to Hadoop 2.0. HP Vertica Analytics Platform on MapR is currently available in early access mode and will be generally available in March.
Dell is incubating a new platform as a service offering built upon the Cloud Foundry PaaS infrastructure. The product, Project Fast PaaS, claims enhancements to the Cloud Foundry PaaS project. Project Fast PaaS boasts compatibility with Ruby, Node.js, Java, PHP and Python in addition to support for MySQL, PostgreSQL, MongoDB and Redis databases as well as the RabbitMQ messaging system. An open-source solution, the product additionally features compatibility with application development frameworks such as Django, Grails, JavaWeb, Lift, Node, Play, Rack, Rails, Sinatra and Spring. Participants must already subscribe to Dell’s IaaS enterprise public cloud, Dell vCloud, in order to preview Dell’s Project Fast PaaS offering.
Dell’s investment in Project Fast PaaS illustrates the emerging currency of the VMware-EMC Cloud Foundry PaaS platform as the de facto standard infrastructure for Platform as a Service offerings. ActiveState’s Stackato, for example, which is similarly based upon the Cloud Foundry platform has recently been licensed by HP for HP’s Cloud Application PaaS offering. The other trend represented by Dell’s PaaS offering consists of the willingness of heavyweight tech behemoths such as Dell and HP to supplement their IaaS public cloud offering with a PaaS solution of some kind. IaaS customers are likely to want a PaaS offering as well, and correspondingly, PaaS may well end us serving as an originator for IaaS customers. The industry should expect to see more IaaS-PaaS combination offerings as public cloud vendors, in particular, strive to accommodate demands for preconfigured development frameworks from their customers alongside their IaaS platforms.
This week, HP made a number of significant announcements related to its HP Cloud Services platform. The company revealed an aggressive pricing strategy for its OpenStack-based, Infrastructure as a Service, public cloud platform known as HP Cloud Compute including a 50% promotion that lasts until January 1, 2013. The aggressive positioning of HP Cloud Compute underscores the technological viability of OpenStack as a key player in the commercial, public cloud IaaS space given that, separate from Rackspace and Red Hat, yet another technology giant has elected to build a public cloud infrastructure on the OpenStack platform.
HP Cloud Compute has now transitioned from Beta to general availability. Pricing starts at $0.04 cents/hour for the “extra small” Linux-based HP instance marked by 1 HP Cloud Compute Unit featuring 1 virtual core w/1 HP Cloud Compute Unit, 1 GB RAM and 30 GB disk space. In comparison, the smallest Amazon Web Services Linux instance features a comparable 1.7 GB memory and a significantly larger storage allocation of 160GB at the rate of $.065/hour. HP Cloud Compute’s price of $.04/hour to $.065/hour for Amazon Web Services amounts to a significant cost savings, particularly if instance disk space beyond 30 GB is not required.
When comparing the two medium-sized offerings, however, Amazon Web Services comes out on top not only in price but with respect to specifications as well. The medium HP Cloud Compute instance features 4 HP Cloud Compute Units containing a total of 2 virtual cores with 2 HP Cloud Compute Units each, 4 GB RAM and 120 GB of disk space. The medium-sized Amazon Web Services Linux instance, in comparison, contains an analogous 4 EC2 compute units via 2 virtual cores containing 2 EC2 compute units each, 7.5 GB memory and 850 GB of instance disk space. Pricing compares at $.16/hour for HP Cloud Compute versus $.13/hour for Amazon Web Services, with the AWS medium-sized offering surpassing HP on memory and storage metrics as well.
Separate from HP Cloud Compute, HP announced the Beta launch of HP Cloud Block Storage. In addition, HP revealed details of its HP Cloud Application PaaS, which provides developers with access to pre-configured technology stacks that support Ruby, PHP, Java, Node.js, Python, and other languages. The platform is based on Vancouver-based ActiveState’s Stackato technology that boasts one of the industry’s leading polyglot PaaS platforms. HP Cloud Application PaaS is currently accepting applications from interested organizations as part of a private Beta launch.
These announcements reveal how HP is making an aggressive push into the IaaS space by luring customers into trying their HP Cloud Compute platform with their 50% discount promotion. Regardless of the promotion, pricing remains highly competitive, and is backed by a 99.95% SLA. The SLA is guaranteed monthly, meaning HP is committing to 100% uptime with the exception of a maximum of 22 minutes of per month, as reported in The Register. Customers that are frustrated with Amazon Web Services’s repeated outages and famed lack of customer support may well consider trying HP Cloud Compute as an option, particularly given the added allure of its interoperability in an increasingly rich commercial OpenStack ecosystem.
According to a New York Times blog post, Hewlett-Packard is getting ready to deploy an Infrastructure as a Service public cloud that parallels Amazon Web Services within the next two months. The platform will differentiate itself from Amazon Web Services by providing a suite of services and business-oriented products that cater to the needs of enterprises. Speaking of the platform, Zorawar “Biri” Singh noted, “We’re not just building a cloud for infrastructure. Amazon has the lead there. We have to build a platform layer, with a lot of third-party services.” The platform will feature structured and unstructured databases that cater to the Big Data needs of enterprises. The HP public cloud will also contain tools that streamline the use of Ruby, Java and PHP, in addition to software that allows customers to automate provisioning and workflow. Moreover, HP will deploy the platform alongside an online store that offers HP-approved products to users of the cloud platform. The launch of the platform promises to serve up even more competition for Amazon Web Services, which already stands to encounter a significant threat to its market share lead from commercial OpenStack deployments. Just this week, Network World reported that Sony had migrated some of its products away from Amazon Web Services to OpenStack, though the reports have yet to be confirmed. What appears to be true is that Sony is using OpenStack alongside Amazon Web Services, though the extent of its use of OpenStack for production-ready deployments remains unclear.
On Tuesday, Oracle declared the availability of the Big Data appliance that it introduced to the world at its October conference Oracle Open World. The appliance runs on Linux and features Cloudera’s version of Apache Hadoop (CDH), Cloudera Manager for managing the Hadoop distribution, the Oracle NoSQL database as well as an open source version of R, the statistical software package. Oracle’s partnership with Cloudera in delivering its Big Data appliance goes beyond the latter’s selection as a Hadoop distributor to include assistance with customer support. Oracle plans to deliver tier one customer support while Cloudera will provide assistance with tier two and tier three customer inquiries, including those beyond the domain of Hadoop.
Oracle will run its Big Data appliance on hardware featuring 864 GB main memory, 216 CPU cores, 648 TB of raw disk storage, 40 Gb/s InfiniBand connectivity and10 Gb/s Ethernet data center connectivity. Oracle also revealed details of four connectors to its appliance with the following functionality:
• Oracle Loader for Hadoop to load massive amounts of data into the appliance by using the MapReduce parallel processing technology.
• Oracle Data Integrator Application Adapter for Hadoop which provides a graphical interface that simplifies the creation of Hadoop MapReduce programs.
• Oracle Connector R which provides users of R streamlined access to the Hadoop Distributed File System (HDFS)
• Oracle Direct Connector for Hadoop Distributed File System (ODCH), which supports the integration of Oracle’s SQL database with its Hadoop Distributed File System.
Oracle’s announcement of the availability of its Big Data appliance comes as the battle for Big Data market share takes shape in a landscape dominated by the likes of Teradata, Microsoft, IBM, HP, EMC, Informatica, MarkLogic and Karmasphere. Oracle’s selection of Cloudera as its Hadoop distributor indicates that it intends to make a serious move into the world of Big Data. For one, the partnership with Cloudera gives Oracle increased access to Cloudera’s universe of customers. Secondly, the partnership enhances the credibility of Oracle’s Big Data offering given that Cloudera represents that most prominent distributor of Apache Hadoop in the U.S.
In October, Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Whereas Oracle chose Cloudera for Hadoop distribution, Microsoft partnered with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. In late November, HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, which features the ability to process large-scale structured data sets in addition to a NoSQL interface for loading and analyzing structured and unstructured data. In December, EMC released its Greenplum Unified Analytics Platform (UAP) marked by the ability to load structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and Chorus, a collaboration and productivity software tool. Bolstered by its partnership with Cloudera, Oracle is set to compete squarely with HP’s Autonomy IDOL 10, EMC’s Greenplum Chorus and IBM’s BigInsights until Microsoft’s appliance officially enters the Big Data doohyoo (土俵) qua sumo ring as well.
If 2011 was the year of Cloud Computing, then 2012 will surely be the year of Big Data. Big Data has yet to arrive in the way cloud computing has, but the framework for its widespread deployment as a commodity emerged with style and unmistakable promise. For the first time, Hadoop and NoSQL gained currency not only within the developer community, but also amongst bloggers and analysts. More importantly, Big Data garnered for itself a certain status and meaning in the technology community even though few people asked about the meaning of big in “Big Data” in a landscape where the circle around the meaning of “big” with respect to “data” is constantly being redrawn. Even though yesterday’s “big” in Big Data morphed into today’s “small” as consumer personal storage transitions from gigabytes to terabytes, the term “Big Data” emerged as a term that everyone almost instantly understood. It was as if consumers and enterprises alike had been searching for years for a long lost term to describe the explosion of data as evinced by web searches, web content, Facebook and Twitter feeds, photographs, log files and miscellaneous structured and unstructured content. Having been speechless, lacking the vocabulary to find the term for the data explosion, the world suddenly embraced the term Big Data with passion.
Below are some of the highlights of 2011 with respect to big data:
•Teradata finalized a deal to acquire Big Data player Aster Data Systems for $263 million.
•Yahoo revealed plans to create Hortonworks, a spin-off dedicated to the commercialization of Apache Hadoop.
•Teradata announced the Teradata Aster MapReduce Platform that combines SQL with MapReduce. The Teradata Aster MapReduce Platform empowers business analysts who know SQL to leverage the power of MapReduce without having to write scripted queries in Java, Python, Perl or C.
•Oracle announced plans to launch a Big Data appliance featuring Apache Hadoop, Oracle NoSQL Database Enterprise Edition and an open source distribution of R. The company’s announcement of its plans to leverage a NoSQL database represented an abrupt about face of an earlier Oracle position that discredited the significance of NoSQL.
•Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Microsoft revealed a strategic partnership with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. Microsoft’s decision not to leverage NoSQL and use instead a Windows based version of Hadoop for SQL Server 2012 constituted the key difference between Microsoft and Oracle’s Big Data platforms.
•IBM announced the release of IBM Infosphere BigInsights application for analyzing “Big Data.” The SmartCloud release of IBM’s BigInsights application means that IBM beat competitors Oracle and Microsoft in the race to deploy an enterprise grade, cloud based Big Data analytics platform.
•Christophe Bisciglia, founder of Cloudera, the commercial distributor of Apache Hadoop, launched a startup called Odiago that features a Big Data product named WibiData. WibiData manages investigative and operational analytics on “consumer internet data” such as website traffic on traditional and mobile computing devices.
•Cloudera announced a partnership with NetApp, the storage and data management vendor. The partnership revealed the release of the NetApp Open Solution for Hadoop, a preconfigured Hadoop cluster that combines Cloudera’s Apache Hadoop (CDH) and Cloudera Enterprise with NetApp’s RAID architecture.
•Big Data player Karmasphere announced plans to join the Hortonworks Technology Partner Program today. The partnership enables Karmasphere to offer its Big Data intelligence product Karmasphere Analytics on the Apache Hadoop software infrastructure that undergirds the Hortonworks Data Platform.
•Informatica released the world’s first Hadoop parser. Informatica HParser operates on virtually all versions of Apache Hadoop and specializes in transforming unstructured data into a structured format within a Hadoop installation.
•MarkLogic announced support for Hadoop, the Apache open source software framework for analyzing Big Data with the release of MarkLogic 5.
•HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, a Next Generation Information Platform that integrates two of its 2011 acquisitions, Vertica and Autonomy. Autonomy IDOL 10 features Autonomy’s capabilities for processing unstructured data, Vertica’s ability to rapidly process large-scale structured data sets, a NoSQL interface for loading and analyzing structured and unstructured data and solutions dedicated to the Data, Social Media, Risk Management, Cloud and Mobility verticals.
•EMC announced the release of its Greenplum Unified Analytics Platform (UAP). The EMC Greenplum UAP contains the The EMC Greenplum platform for the analysis of structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and EMC Greenplum Chorus, a collaboration and productivity software tool that enables social networking amongst constituents in an organization that are leveraging Big Data.
The widespread adoption of Hadoop punctuated the Big Data story of the year so far. Hadoop featured in almost every Big Data story of the year, from Oracle to Microsoft to HP and EMC, while NoSQL came in a close second. Going into 2012, one of the key questions for the Big Data space concerns the ability of OpenStack to support Hadoop, NoSQL, MapReduce and other Big Data technologies. The other key question for Big Data hinges on the user friendliness of Big Data applications for business analysts in addition to programmers. EMC’s Greenplum Chorus, for example, democratizes access to its platform via a user interface that promotes collaboration amongst multiple constituents in an organization by transforming questions into structured queries. Similarly, the Teradata Aster MapReduce Platform allows business analysts to make use of its MapReduce technology by using SQL. That said, as Hadoop becomes more and more mainstream, the tech startup and data intensive spaces are likely to witness a greater number of data analysts trained in Apache Hadoop in conjunction with efforts by vendors to render Hadoop more accessible to programmers and non-programmers alike.