Intel recently elaborated details of its Intel Data Platform, a suite of software applications designed to facilitate analytics on big data. The platform will complement the Intel Distribution for Apache Hadoop by providing a wealth of graph analytic and predictive modeling functionality via the Intel Data Platform: Analytics Toolkit that enables data scientists to derive actionable business intelligence from big data sets. In addition to raw analytic and data visualization capabilities, the Intel Data Platform features the ability to process streaming data sets and perform iterative and interactive analytics. The Intel Data Platform’s Analytics Toolkit provides users with algorithms related to graph analytics and machine learning that enable enhanced fraud detection, customer profiling and big data management and processing. For example, China Mobile Guangdong was able to implement online billing to the point where it could add up to 800,000 new records/second or up to 30 terabytes of data per month. Similarly, the platform has been used to help retailers nimbly respond to social media promotions by ensuring shelves are appropriately stocked in response to the spikes in consumer demand that result from promotions on platforms such as Twitter and Facebook, media announcements or seasonal changes including unanticipated weather. The Intel Data Platform exemplifies the proliferation of Big Data analytics solutions that have emerged as more and more enterprises perform experiments with Big Data of varying intensity. The platform will be available in Q2 of 2014 in Enterprise and Premium Editions that differ according to the degree of available customer support.
Infochimps today announced a Big Data platform as a service that integrates with existing enterprise IT infrastructures while adding Big Data management and analytic solutions. The Infochimps platform is based on open source, web-scale technologies in addition to a cloud-based deployment structure. One of the unique features of the Infochimps solution is that it gauges the position of customers with respect to Big Data management and subsequently recommends a path toward effectively operationalizing Big Data in conjunction with customer needs. To help customers understand how to realize their Big Data needs, Infochimps complements its Big Data platform as a service with a suite of consulting services designed to guide customers through the Big Data lifecycle. Jim Kaskade, Director of CSC’s open Big Data solutions, commented on the Infochimps methodology as follows:
We’ve defined distinct phases along the Big Data adoption lifecycle where companies fall. We identify our customers’ current state, and then carefully guide them to organization-wide operationalization of Big Data insights.
Infochimps shares the insight previously articulated by Paul Maritz, CEO of Pivotal, that with the exception of companies such as Google, Facebook and Twitter, few enterprises have come to terms with the project of effectively operationalizing Big Data. In an interview with Raj Dalal of Big Data Insights at Strata 2014, Kaskade claimed that approximately 50% of Big Data initiatives fail due to poorly scoped projects, excessive complexity within the Big Data technology landscape and internal political friction. In response, Infochimps proposes a comprehensive Big Data implementation methodology in addition to its PaaS platform. Details remain scant but we should expect to hear more at Strata and in the coming weeks about the Infochimps methodology for assessing the customer’s current state of Big Data and subsequently designing a programmatic path focused around integrating existing technology stacks with its Big Data PaaS. Infochimps was acquired by CSC in August 2013.
MapR Technologies today announced a partnership with HP Vertica that integrates the HP Vertica Analytics Platform with MapR’s enterprise-grade distribution of Apache Hadoop. As a result of the partnership, users of the HP Vertica Analytics Platform on MapR have the capability to leverage the SQL capabilities of the HP Vertica Analytics Platform against data stored in Hadoop clusters. The HP Vertica Analytics Platform constitutes yet another “SQL-on-Hadoop” solution that competes with the likes of Apache Hive, Concurrent’s Lingual, Cloudera’s Impala, Hadapt and the Hortonworks Stinger initiative. As noted in GigaOm, MapR itself leads Apache Drill, an open source initiative to develop a highly scalable, SQL-based interactive query engine for Apache Hadoop, but clearly made a strategic decision to expand the range of users of its Hadoop distribution by partnering with HP Vertica. Today, MapR also announced the release of the latest version of its Hadoop distribution featuring support for Hadoop 2.2 and YARN. Notably, users running Hadoop 1.x can take advantage of YARN’s resource management abilities to preview the functionality of YARN before upgrading to Hadoop 2.0. HP Vertica Analytics Platform on MapR is currently available in early access mode and will be generally available in March.
Cloudera recently announced the general availability of Apache Spark for Cloudera Enterprise. First developed at UC Berkeley, Apache Spark is a parallel data processing framework that supplements Apache Hadoop by facilitating the development of big data applications related to machine learning, interactive analytics and real-time analytics. Spark allows users to write parallel sets of code in Java, Scala and Python that operate on Hadoop clusters with a speed up to 100 times faster than MapReduce. Moreover, applications developed in Spark tend to require 2 to 10 ten times less code than a corresponding MapReduce application. Spark Streaming, an add-on to Spark, enables analytics to be run on streaming datasets such that developers can derive analytic insights within seconds of data ingestion. Cloudera will offer enterprise-grade support for Spark in partnership with Databricks, the primary sponsor of the open source Apache Spark project, via its Data Hub Edition and Cloudera Enterprise Flex Edition. This release features support for Spark 0.9.0 with CDH 4. Support for Cloudera Enterprise 5, with CDH 5 and YARN, will be forthcoming in subsequent releases. Spark contributes to the Cloudera platform as illustrated by the highlighted blocks in orange below:
Image Source: “Apache Spark — Welcome To The CDH Family”
Big data management vendor Zettaset recently announced an enhancement to its Zettaset Orchestrator platform marked by the addition of a Business Intelligence connector for BI analytics applications. The connector extends Zettaset’s security and encryption functionality to BI applications that use Hadoop datasets integrated with the Zettaset Orchestrator infrastructure. As a result, BI applications leveraging data stored in Hadoop clusters can deliver analytics that are protected by Zettaset’s enterprise-grade security and Hadoop management functionality. Zettaset also announced that BI vendor MicroStrategy has certified the connector for use with its analytics platform. The partnership between MicroStrategy and Zettaset means that customers of both vendors can derive actionable business intelligence from Hadoop-based datasets while taking advantage of the high availability, auditing and compliance, role based access control, encryption and policy enforcement specific to the Zettaset Orchestrator platform. Zettaset’s BI connector announcement suggestively illustrates the gravity of enterprise concerns about data security, be it data at rest, data in transit or data in use. As Hadoop adoption accelerates throughout the enterprise, expect Hadoop distribution vendors and Big Data management vendors such as Zettaset to shore up components of their product offerings related to data security, particularly as they relate to interfaces to third party applications and hosting environments.
On Tuesday, Hortonworks announced the general availability of version 2.0 of the Hortonworks Data Platform for Windows. Hortonworks Data Platform 2.0 for Windows is the first distribution of Apache Hadoop 2.0 certified for Windows Server 2008 R2 and Windows Server 2012. Today’s announcement means that YARN (Yet Another Resource Negotiator), a key feature of Hadoop 2.0, is now available to Windows-based development environments. With HDP 2.0, developers in Windows shops can take advantage of YARN’s transformation of Hadoop from an infrastructure for batch processing to batch and real-time data processing. Moreover, HDP 2.0 features the NameNode High Availability functionality automates failovers and ensures the availability of the full HDP stack. Hortonworks collaborated closely with Microsoft in order to ensure the HDP 2.0 release achieved production-grade status within Windows environments. The release of HDP 2.0 marks yet another milestone in the story of the democratization of Apache Hadoop, the Big Data platform that is being rendered increasingly available to wider circles of users by means of initiatives such as Stinger (Hortonworks), Lingual (Concurrent) and Impala (Cloudera) that allow users to access and manipulate data stored in a Hadoop cluster using SQL.
Qubole recently partnered with Google to make its Hadoop as a Service platform available on the Google Compute Engine. As a result of the partnership, GCE customers can directly take advantage of Qubole’s autoscaling and automated cluster provisioning functionality, in addition to its auto-healing ability to provide replacements for failed GCE instances. Qubole represents the first fully elastic engine based on Hadoop to run on the Google Compute Engine platform. Shrikanth Shankar, VP Engineering at Qubole, remarked on the significance of Qubole’s partnership with Google Compute Engine as follows:
Google File System and Google MapReduce inspired the development of Hadoop. Now, we’re coming full circle with Hadoop available on GCE. We believe that this delivers one of the most solid foundations for cloud-based Big Data processing and are pleased that we can contribute to its performance, ease of use and low cost.
Qubole’s partnership with GCE stands to diversify its customer base further by extending its reach to users of GCE IaaS platform that additionally have Big Data requirements. As a cloud based Big Data service whose customers include Pinterest, Quora and MediaMath, Qubole independently delivers the autoscaling and cloud-based hosting of Hadoop clusters by means of its next generation Big Data platform. Qubole is currently available on Google Compute Engine in Beta as well as on Amazon Web Services via the AWS Marketplace.
Big Data management vendor Zettaset recently announced the availability of encryption functionality in Zettaset Orchestrator, its Hadoop management platform. Zettaset Orchestrator v5 automates Hadoop installation, enables high availability on Hadoop deployments and streamlines the configuration and operational management of Hadoop clusters. Zettaset adds encryption to its Orchestrator platform by using the 256-bit Advanced Encryption Standard (AES-256) and the KMIP protocol, with minimal impact to the performance of the encrypted Hadoop cluster. The addition of encryption to Zettaset Orchestrator means that customers with compliance concerns related to data security are one step closer toward delivering a data environment that satisfies regulatory demands specific to their industry. Jim Vogt, Zettaset CEO, remarked on the significance of encryption functionality within the Hadoop space as follows:
Encryption is a very specialized capability, and there are few viable options available today for Hadoop users. When it comes to risk management, Zettaset Orchestrator with data-at-rest encryption gives customers the upper hand, supporting compliance mandates such as HIPAA, BSA/AML and PCI-DSS, for example, and provides assurance that their Hadoop cluster data is protected against malicious attacks.
Zettaset Orchestrator also provides role-based access control as well as support for LDAP and Active Directory. Orchestrator’s support of Hadoop encryption in conjunction with its role-based access functionality represents a significant advance for data security within the Hadop space, particularly because the solution is for data-at-rest as opposed to data in transit, over the wire. Other vendors with Hadoop data encryption solutions include Gazzang and Dataguise. That said, Zettaset’s combination of automation and security on Hadoop renders it a key player in the Hadoop management space. Zettaset Orchestrator integrates with all major open source Hadoop distributions.