Puppet Labs recently announced a collaboration with EMC Corporation that renders DevOps technology from Puppet Labs more readily accessible to EMC Corporation members. As a result of the partnership, Puppet Enterprise will be available as a component of EMC’s Federation Enterprise Hybrid Cloud that delivers enterprise-grade hybrid cloud solutions that leverage public cloud solutions from vendors such as EMC Cloud Service Providers, vCloud Air and Amazon Web Services. Puppet Enterprise provides a framework for the management of infrastructure as lines of code, thereby increasing the operational agility of development and operations teams by facilitating the execution of multitudinous changes to infrastructure and application deployments. EMC Federation Hybrid Cloud customers can now rely on Puppet Enterprise to bring enhanced IT automation and change management-related consistency to their deployments. While the product integration between Puppet Enterprise and the Federation Hybrid Cloud constitutes the most critical component of this announcement, EMC and Puppet Labs have also agreed to partner to develop a DevOps readiness program to help customers accelerate their adoption of DevOps practices as well as their use of hybrid clouds. EMC customers can access Puppet Enterprise by means of the company’s service catalogue, the EMC Select Global Price List and thereby integrate Puppet Enterprise with any assemblage of EMC hardware and software. The collaboration between EMC and Puppet Labs represents a huge coup for Puppet by opening up Puppet Enterprise to EMC’s channel of customers whereas EMC, on the other hand, benefits from the feather in its cap marked by Puppet Enterprise in addition to the standardization of IT automation it brings to Federation Hybrid Cloud deployments.
Pivotal announced the acquisition of Xtreme Labs, a Toronto-based mobile development and consulting firm on Wednesday. The acquisition complements Pivotal’s cloud and big data platforms by expanding Pivotal’s mobile capabilities and extending the reach of its emerging, behemoth technology platform even further. Pivotal, recall, is a platform as a service based on the Cloud Foundry project that additionally boasts big data capabilities related to the acquisition of Greenplum by its parent company EMC. The acquisition of Xtreme Labs “aligns with Pivotal’s strategy to capitalize on the nexus of converging forces in the industry” and illustrates the seriousness of its intent to build a technology platform called Pivotal One that brings the computing power had by Amazon Web Services, Facebook and Google to the enterprise. Specifically, the acquisition of Xtreme Labs positions Pivotal to build a technology platform marked by the convergence of cloud, big data, mobile and social media applications.
In an April webcast announcing the launch of Pivotal One, Pivotal CEO Paul Maritz remarked on the divide between the IT infrastructures had by select internet giants and traditional enterprise IT. Maritz noted that Amazon Web Services, Facebook and Google excel at storing massive amounts of data, extracting actionable business intelligence from that data, rapidly developing software applications and automating routine procedures. Pivotal One intends to deliver a platform as a service that democratizes the data storage, data analytics and agile application development capabilities currently held by a handful of internet giants to enterprise IT more generally. Recently, Pivotal has made news through strategic partnerships with Piston Cloud to refine the integration of OpenStack with Cloud Foundry, and IBM to develop the governance for Cloud Foundry. Terms of the acquisition of Xtreme Labs were not disclosed although AllThingsD reports Pivotal paid $65 million in cash.
EMC and Juniper recently revealed details of updates to their Software Defined Networking (SDN) platforms and strategies.
Juniper launched a suite of products branded JunosV Contrail featuring the following components:
•The JunosV controller decouples management of the network from the hardware that undergirds the network, enabling vendors to quickly deploy network services and more effectively manage the overall network infrastructure.
•JunosV Contrail virtualizes the entire network, thereby enabling vendors to leverage a more flexible network topology in conjunction with increased network scalability.
•The platform supports both OpenStack and CloudStack.
Meanwhile, EMC revealed details of the ViPR Software-Defined Storage Platform as follows:
•The EMC ViPR Software-Defined Storage Platform allows customers to manage both a software-defined networking infrastructure and data stored within that infrastructure.
•Integration with OpenStack via Swift by means of The EMC ViPR Software-Defined Storage Platform.
•Integration with VMware’s software-defined data center environment in conjunction with APIs that interoperate with OpenStack and Microsoft.
•The EMC ViPR Controller allows customers to use their current storage platforms for existing data, while enabling the provisioning of ViPR Object Data Services for new storage platforms that have the option of leveraging Amazon S3 or HDFS APIs.
Compatibility with OpenStack marks the key point of comparison between the two SDN platforms. Other key players in the SDN space include VMware due to its acquisition of Nicira, Cisco, Midokura, Nexenta Systems and Big Switch Networks. Customers should expect the SDN space to continue to deliver wave upon wave of functionality enhancements as SDN technology matures and becomes increasingly compatible both with a range of cloud platforms from myriad vendors in addition to IT automation software and DevOps platforms.
EMC’s Pivotal One Attempts To Bring IT Infrastructures Of Facebook, Google and Amazon Web Services To Enterprise
This week, EMC and its subsidiary VMware revealed details of the vision behind Pivotal, its spin-off company financed in part by $105 million in capital from GE. In a webcast announcing the launch of Pivotal on Wednesday, Pivotal CEO Paul Maritz, formerly CEO of VMware from 2008 to 2012, remarked that Pivotal attempts to bring to enterprises the technology platforms that have allowed internet giants such as Facebook, Google and Amazon Web Services to efficiently operate IT infrastructures on a massive scale while concurrently demonstrating cost and performance efficiencies in application development and data analytics.
Referring specifically to Facebook, Google and Amazon Web Services, Maritz elaborated on the strengths of their IT infrastructure as follows:
If you look at the way they do IT, it is significantly different than the way enterprises do IT. Specifically, they are good at storing large amounts of data and drawing information from it in a cost-effective manner. They can develop applications very quickly. And they are good at automating routines. They used these three capabilities together to introduce new experiences and business processes that have yielded — depended on how you want to count it — a trillion dollars in market value.
According to Maritz, the internet giants are a cut above everyone else with respect to data storage, data analytics, application development and automation. Enterprises, in contrast, leverage comparatively archaic IT infrastructures marked by on premise data centers and attempts to migrate to the cloud in conjunction with meager data analytics capability and poor or non-existent IT automation and orchestration processes. As a result, the enterprise market represents an opportunity to deploy technology platforms that allow for efficient storage, data integration across disparate data sources and interactive applications with real-time responses to incoming data as Maritz notes below:
It is clear that there is a widespread need emerging for new solutions that allow customers to drive new business value by cost-effectively reasoning over large datasets, ingesting information that is rapidly arriving from multiple sources, writing applications that allow real-time reactions, and doing all of this in a cloud-independent or portable manner. The need for these solutions can be found across a wide range of industries and it is our belief that these solutions will drive the need for new platforms. Pivotal aims to be a leading provider of such a platform. We are honored to work with GE, as they seek to drive new business value in the age of the Industrial Internet.
More specifically, Pivotal will provide a platform as a service infrastructure called Pivotal One that brings the capabilities currently enjoyed by the likes of Facebook and Google to enterprises in ways that allow them to continue their transition to cloud-based IT infrastructures while concurrently enjoying all of the benefits of advanced storage, analytics and agile application development. In other words, Pivotal One marks the confluence of Big Data, Cloud, Analytics and Application Development in a bold play to commoditize the IT capabilities held by a handful of internet giants and render them available to the enterprise through a PaaS platform.
Pivotal One’s key components include the following:
Pivotal Data Fabric
A platform for data storage and analytics based on Pivotal HD, which features an enterprise-grade distribution of Apache Hadoop in addition to Pivotal HD’s HAWQ analytics platform.
Pivotal Cloud and Application Platform
An application development framework for Java for the enterprise based on Cloud Foundry and Spring.
Pivotal Expert Services
Professional services for agile application development and data analytics.
Open Source Support
Active support of open source projects such as but not limited to Spring, Cloud Foundry, RabbitMQ™, Redis, OpenChorus™.
Pivotal currently claims Groupon, EMI, and Salesforce.com among its customer base. The company already has 1250 employees and, given GE’s financing and interests, is poised to take a leadership role in the industrial internet space whereby objects such as automobiles, washers, dryers and other appliances deliver real-time data to a circuit of analytic dashboards that iteratively provide feedback, automation and control. Pivotal One also represents a nascent trend within the Platform as a Service industry whereby PaaS is increasingly evolving into an “everything as a service” platform that sits atop various IaaS infrastructures. For example, CumuLogic recently announced news of a platform that allows customers to build Amazon Web Services-like infrastructures marked by suites of IaaS, Big Data, PaaS and application development infrastructures on top of private clouds behind their enterprise firewall. EMC’s Pivotal One is expected to be generally available by the end of 2013.
This week, EMC launched its own distribution of Hadoop under the branding Pivotal HD. Built on technology that EMC obtained through the acquisition of Greenplum in July 2010, Pivotal HD represents EMC’s next iteration on the Greenplum Unified Analytics Platform (UAP) that it launched in December 2011. The Greenplum UAP featured EMC Greenplum HD, an enterprise-grade distribution of Hadoop and Greenplum’s database for structured data. Greenplum UAP also announced Greenplum Chorus, an innovative platform for collaboration amongst data scientists in an organization leveraging Big Data. Pivotal HD, however, marks a significant new chapter in EMC’s Hadoop technology as indicated by its array of features and architectural complexity.
Like many recent Hadoop distributions and technologies, Pivotal HD integrates with SQL to facilitate its maximal usage by developers and business analysts who lack familiarity with MapReduce. But the real innovation of Pivotal HD runs deeper than its integration of SQL with Hadoop and concerns the positioning of Greenplum’s analytic engine alongside HDFS in ways that enable performance enhancements to Hadoop querying over and beyond the simple appendage of a SQL interface. Pivotal HD’s Advanced Database Services (HAWQ) allows for the delivery of a high-performance SQL engine that permits of greater SQL functionality and performance than analogous SQL interfaces such as Hive, Hadapt and Impala. Coupled with Pivotal HD’s virtualization and pluggable storage compatibility features, the platform represents a distinct moment of innovation in the Hadoop space as evinced by the following three features:
Advanced Database Services (HAWQ)
Pivotal HD’s Advanced Database Services (HAWQ) functionality brings Greenplum’s Massively Parallel Processing (MPP) functionality to Hadoop. The result means that HAWQ allows Pivotal HD users to perform complex joins, MADlib in-database analytics and transactions. Moreover, users have the luxury of leveraging virtually any BI tool on the marketplace to obtain advanced reporting and visualization of data as required. HAWQ-based SQL queries outperform Hive in terms of response time by as much as 100x according to EMC benchmarking data.
The Advanced Database Service interfaces with other components of Pivotal HD as follows:
Given the recent proliferation of SQL-Hadoop interfaces throughout the industry, customers and analysts should expect more data about the comparative efficiencies of SQL-Hadoop interfaces to emerge as more and more SQL-trained analysts start using SQL to operate on data saved in HDFS.
Hadoop Virtualization Extensions
Hadoop Virtualization Extensions enable the provisioning of Hadoop clusters on VMware virtualized platforms in both public cloud and on-premise environments. HVE provides customers increased flexibility of deployment and enables the construction of high availability infrastructures for the access of Hadoop data.
Pluggable HDFS Storage
Customers can multiply their data storage options by using standard Hadoop direct attached storage in addition to EMC Isilon OneFS Scale-Out NAS Storage, the latter of which features streamlined loading, backup, replication, snapshotting and elastic scalability functionality.
Overall, EMC’s launch into the Hadoop-distribution world represents a stunning and significant move to grab Hadoop market share from Cloudera, Hortonworks and MapR. Unlike Intel’s recently launched distribution, EMC’s Pivotal HD claims some proprietary and genuinely innovative Hadoop technology in the form of its Advanced Database Services engine and scale-out storage compatibility. Expect EMC to continue to innovate upon its core technology platform and follow the suit of the likes of Concurrent in developing tools to render Hadoop more accessible to Java-based developers in addition to SQL. What remains unclear, at this point, is the extent to which EMC will open-source its technology as it gains market share within the enterprise. For now, however, the Hadoop world has yet another significant player with cash reserves aplenty to continue to innovate on its platform and disrupt the Hadoop landscape in the process.
On Tuesday, Oracle declared the availability of the Big Data appliance that it introduced to the world at its October conference Oracle Open World. The appliance runs on Linux and features Cloudera’s version of Apache Hadoop (CDH), Cloudera Manager for managing the Hadoop distribution, the Oracle NoSQL database as well as an open source version of R, the statistical software package. Oracle’s partnership with Cloudera in delivering its Big Data appliance goes beyond the latter’s selection as a Hadoop distributor to include assistance with customer support. Oracle plans to deliver tier one customer support while Cloudera will provide assistance with tier two and tier three customer inquiries, including those beyond the domain of Hadoop.
Oracle will run its Big Data appliance on hardware featuring 864 GB main memory, 216 CPU cores, 648 TB of raw disk storage, 40 Gb/s InfiniBand connectivity and10 Gb/s Ethernet data center connectivity. Oracle also revealed details of four connectors to its appliance with the following functionality:
• Oracle Loader for Hadoop to load massive amounts of data into the appliance by using the MapReduce parallel processing technology.
• Oracle Data Integrator Application Adapter for Hadoop which provides a graphical interface that simplifies the creation of Hadoop MapReduce programs.
• Oracle Connector R which provides users of R streamlined access to the Hadoop Distributed File System (HDFS)
• Oracle Direct Connector for Hadoop Distributed File System (ODCH), which supports the integration of Oracle’s SQL database with its Hadoop Distributed File System.
Oracle’s announcement of the availability of its Big Data appliance comes as the battle for Big Data market share takes shape in a landscape dominated by the likes of Teradata, Microsoft, IBM, HP, EMC, Informatica, MarkLogic and Karmasphere. Oracle’s selection of Cloudera as its Hadoop distributor indicates that it intends to make a serious move into the world of Big Data. For one, the partnership with Cloudera gives Oracle increased access to Cloudera’s universe of customers. Secondly, the partnership enhances the credibility of Oracle’s Big Data offering given that Cloudera represents that most prominent distributor of Apache Hadoop in the U.S.
In October, Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Whereas Oracle chose Cloudera for Hadoop distribution, Microsoft partnered with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. In late November, HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, which features the ability to process large-scale structured data sets in addition to a NoSQL interface for loading and analyzing structured and unstructured data. In December, EMC released its Greenplum Unified Analytics Platform (UAP) marked by the ability to load structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and Chorus, a collaboration and productivity software tool. Bolstered by its partnership with Cloudera, Oracle is set to compete squarely with HP’s Autonomy IDOL 10, EMC’s Greenplum Chorus and IBM’s BigInsights until Microsoft’s appliance officially enters the Big Data doohyoo (土俵) qua sumo ring as well.
If 2011 was the year of Cloud Computing, then 2012 will surely be the year of Big Data. Big Data has yet to arrive in the way cloud computing has, but the framework for its widespread deployment as a commodity emerged with style and unmistakable promise. For the first time, Hadoop and NoSQL gained currency not only within the developer community, but also amongst bloggers and analysts. More importantly, Big Data garnered for itself a certain status and meaning in the technology community even though few people asked about the meaning of big in “Big Data” in a landscape where the circle around the meaning of “big” with respect to “data” is constantly being redrawn. Even though yesterday’s “big” in Big Data morphed into today’s “small” as consumer personal storage transitions from gigabytes to terabytes, the term “Big Data” emerged as a term that everyone almost instantly understood. It was as if consumers and enterprises alike had been searching for years for a long lost term to describe the explosion of data as evinced by web searches, web content, Facebook and Twitter feeds, photographs, log files and miscellaneous structured and unstructured content. Having been speechless, lacking the vocabulary to find the term for the data explosion, the world suddenly embraced the term Big Data with passion.
Below are some of the highlights of 2011 with respect to big data:
•Teradata finalized a deal to acquire Big Data player Aster Data Systems for $263 million.
•Yahoo revealed plans to create Hortonworks, a spin-off dedicated to the commercialization of Apache Hadoop.
•Teradata announced the Teradata Aster MapReduce Platform that combines SQL with MapReduce. The Teradata Aster MapReduce Platform empowers business analysts who know SQL to leverage the power of MapReduce without having to write scripted queries in Java, Python, Perl or C.
•Oracle announced plans to launch a Big Data appliance featuring Apache Hadoop, Oracle NoSQL Database Enterprise Edition and an open source distribution of R. The company’s announcement of its plans to leverage a NoSQL database represented an abrupt about face of an earlier Oracle position that discredited the significance of NoSQL.
•Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Microsoft revealed a strategic partnership with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. Microsoft’s decision not to leverage NoSQL and use instead a Windows based version of Hadoop for SQL Server 2012 constituted the key difference between Microsoft and Oracle’s Big Data platforms.
•IBM announced the release of IBM Infosphere BigInsights application for analyzing “Big Data.” The SmartCloud release of IBM’s BigInsights application means that IBM beat competitors Oracle and Microsoft in the race to deploy an enterprise grade, cloud based Big Data analytics platform.
•Christophe Bisciglia, founder of Cloudera, the commercial distributor of Apache Hadoop, launched a startup called Odiago that features a Big Data product named WibiData. WibiData manages investigative and operational analytics on “consumer internet data” such as website traffic on traditional and mobile computing devices.
•Cloudera announced a partnership with NetApp, the storage and data management vendor. The partnership revealed the release of the NetApp Open Solution for Hadoop, a preconfigured Hadoop cluster that combines Cloudera’s Apache Hadoop (CDH) and Cloudera Enterprise with NetApp’s RAID architecture.
•Big Data player Karmasphere announced plans to join the Hortonworks Technology Partner Program today. The partnership enables Karmasphere to offer its Big Data intelligence product Karmasphere Analytics on the Apache Hadoop software infrastructure that undergirds the Hortonworks Data Platform.
•Informatica released the world’s first Hadoop parser. Informatica HParser operates on virtually all versions of Apache Hadoop and specializes in transforming unstructured data into a structured format within a Hadoop installation.
•MarkLogic announced support for Hadoop, the Apache open source software framework for analyzing Big Data with the release of MarkLogic 5.
•HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, a Next Generation Information Platform that integrates two of its 2011 acquisitions, Vertica and Autonomy. Autonomy IDOL 10 features Autonomy’s capabilities for processing unstructured data, Vertica’s ability to rapidly process large-scale structured data sets, a NoSQL interface for loading and analyzing structured and unstructured data and solutions dedicated to the Data, Social Media, Risk Management, Cloud and Mobility verticals.
•EMC announced the release of its Greenplum Unified Analytics Platform (UAP). The EMC Greenplum UAP contains the The EMC Greenplum platform for the analysis of structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and EMC Greenplum Chorus, a collaboration and productivity software tool that enables social networking amongst constituents in an organization that are leveraging Big Data.
The widespread adoption of Hadoop punctuated the Big Data story of the year so far. Hadoop featured in almost every Big Data story of the year, from Oracle to Microsoft to HP and EMC, while NoSQL came in a close second. Going into 2012, one of the key questions for the Big Data space concerns the ability of OpenStack to support Hadoop, NoSQL, MapReduce and other Big Data technologies. The other key question for Big Data hinges on the user friendliness of Big Data applications for business analysts in addition to programmers. EMC’s Greenplum Chorus, for example, democratizes access to its platform via a user interface that promotes collaboration amongst multiple constituents in an organization by transforming questions into structured queries. Similarly, the Teradata Aster MapReduce Platform allows business analysts to make use of its MapReduce technology by using SQL. That said, as Hadoop becomes more and more mainstream, the tech startup and data intensive spaces are likely to witness a greater number of data analysts trained in Apache Hadoop in conjunction with efforts by vendors to render Hadoop more accessible to programmers and non-programmers alike.