Last week, IBM announced an agreement to acquire NoSQL database as a service vendor Cloudant for an undisclosed sum. An active contributor to the Apache CouchDB project, Cloudant delivers a JSON document database-based platform that claims high availability, scalability and elasticity amongst its attributes. Cloudant customers can take advantage of its JSON-based database as a service to store and mine structured and unstructured data from a variety of sources. Because the JSON database format is so widely used by developers of mobile and web applications, IBM’s acquisition of Cloudant stands to strengthen its positioning with respect to the development of applications for mobile devices in conjunction with the build out of its OpenStack-based cloud solution for the enterprise. The acquisition of Cloudant will be central to IBM’s MobileFirst solutions as well as its Worklight application for developing mobile applications. From an industry perspective, the acquisition represents a huge coup for the NoSQL space in general. CouchDB has historically not had the traction of MongoDB, Cassandra and Couchbase, so we should expect brand name tech companies to make similar offerings for the likes of MongoDB in the ensuing few months. Moreover, IBM’s acquisition of Cloudant testifies to the increasing emergence of cloud and big data behemoths with solutions for both hosting infrastructure, as well as database solutions that accommodate enterprise needs for scalability and the ability to store unstructured data. Cloudant CEO Derek Schoettle surmised the significance of Cloudant’s contribution to IBM’s SoftLayer cloud platform as follows:
Cloudant’s decision to join IBM highlights that the next wave of enterprise technology innovation has moved beyond infrastructure and is now happening at the data layer. Our relationship with IBM and SoftLayer has evolved significantly in recent years, with more connected devices generating data at an unprecedented rate. Cloudant’s NoSQL expertise, combined with IBM’s enterprise reliability and resources, adds data layer services to the IBM portfolio that others can’t match.
Schoettle notes that IBM is extending its infrastructure innovations to the “data layer” and as such, follows in the footsteps of Amazon Web Services and EMC/VMware spin-off Pivotal, which similarly deliver a combination of cloud and big data solutions in their platform and product offerings. The notable consequence of this convergence of cloud and big data product offerings is that only large enterprises with the requisite capital and resources can afford to cobble together combined cloud-big data product offerings. As a result, cloud startups and smaller data vendors will need to continue to compete by way of their agility, responsiveness, consultative support and superior technology. In effect, the IBM acquisition of Cloudant signals a Walmart effect in technology, of sorts, whereby large, well capitalized vendors have the ability to create marts of diverse data and analytics products that threaten the viability of cloud, big data and analytics startups in the same way that massive retailers such as Walmart threaten the viability of independent stores or small chains. Oracle’s recent acquisition of Blue Kai, a big data management platform geared toward marketing, constitutes another example of the way in which tech giants are continuing to integrate diverse data products into increasingly heterogeneous product portfolios. The question that remains unanswered, however, is whether the emerging Walmart technology maze is sufficiently easy to navigate that enterprises opt to partner either with one vendor for all of their technology needs, or whether they feel more comfortable shopping from a diverse range of technology vendors in order to avoid vendor lock-in and locate products that richly respond to the specificities of their industry-vertical and customer needs.
Categories: Big Data, Cloud Computing, Cloudant, Couchbase, IBM, MongoDB, NoSQL
Tags: big data startups, Cassandra, cloud startups, JSON, JSON document, Walmart
Treasure Data, a cloud-based big data acquisition and analytics vendor, recently elaborated details of its first solution for the digital gaming industry. The solution takes advantage of Treasure Data’s managed solution for Big Data to provide game developers actionable business intelligence regarding the usage of their products. Customers can define their own rules for data collection and the kinds of user interactions of interest without being constrained to generic, templates that govern data acquisition. Moreover, customers can nimbly update the rules for data collection as their analytic interest in user behavior evolves over time. Because setup can typically be completed in less than two weeks, Treasure Data gaming customers can stand to derive rich, analytic insights about game user behavior in less than a month from the date of configuration of their game/application. Subsequent analytics will be delivered in real-time through a combination of dashboards, queries and data visualization technology.
The speed and frequency at which insights about user behavior can be delivered to business and product development stakeholders means that customers can shorten their product development lifecycles as a result of the analytic insights delivered by the Treasure Data platform. Treasure Data’s gaming solution represents a specific use case for its platform for acquiring, storing and analyzing streaming data. Data analysis takes place either via SQL, Treasure Data’s proprietary dashboards or connections to business intelligence applications such as Tableau for data visualization purposes. Importantly, the platform is deployed via a fully managed service whereby customers make no investment in the hardware required for data acquisition and storage but are nevertheless able to export their data at any time for deeper dives and customized analytics. Other use cases for the Treasure Data platform include acquisition and analysis of meteorological, cartographic or telemetry-based data, web-based data, or data from the internet of things.
Intel recently elaborated details of its Intel Data Platform, a suite of software applications designed to facilitate analytics on big data. The platform will complement the Intel Distribution for Apache Hadoop by providing a wealth of graph analytic and predictive modeling functionality via the Intel Data Platform: Analytics Toolkit that enables data scientists to derive actionable business intelligence from big data sets. In addition to raw analytic and data visualization capabilities, the Intel Data Platform features the ability to process streaming data sets and perform iterative and interactive analytics. The Intel Data Platform’s Analytics Toolkit provides users with algorithms related to graph analytics and machine learning that enable enhanced fraud detection, customer profiling and big data management and processing. For example, China Mobile Guangdong was able to implement online billing to the point where it could add up to 800,000 new records/second or up to 30 terabytes of data per month. Similarly, the platform has been used to help retailers nimbly respond to social media promotions by ensuring shelves are appropriately stocked in response to the spikes in consumer demand that result from promotions on platforms such as Twitter and Facebook, media announcements or seasonal changes including unanticipated weather. The Intel Data Platform exemplifies the proliferation of Big Data analytics solutions that have emerged as more and more enterprises perform experiments with Big Data of varying intensity. The platform will be available in Q2 of 2014 in Enterprise and Premium Editions that differ according to the degree of available customer support.
Infochimps today announced a Big Data platform as a service that integrates with existing enterprise IT infrastructures while adding Big Data management and analytic solutions. The Infochimps platform is based on open source, web-scale technologies in addition to a cloud-based deployment structure. One of the unique features of the Infochimps solution is that it gauges the position of customers with respect to Big Data management and subsequently recommends a path toward effectively operationalizing Big Data in conjunction with customer needs. To help customers understand how to realize their Big Data needs, Infochimps complements its Big Data platform as a service with a suite of consulting services designed to guide customers through the Big Data lifecycle. Jim Kaskade, Director of CSC’s open Big Data solutions, commented on the Infochimps methodology as follows:
We’ve defined distinct phases along the Big Data adoption lifecycle where companies fall. We identify our customers’ current state, and then carefully guide them to organization-wide operationalization of Big Data insights.
Infochimps shares the insight previously articulated by Paul Maritz, CEO of Pivotal, that with the exception of companies such as Google, Facebook and Twitter, few enterprises have come to terms with the project of effectively operationalizing Big Data. In an interview with Raj Dalal of Big Data Insights at Strata 2014, Kaskade claimed that approximately 50% of Big Data initiatives fail due to poorly scoped projects, excessive complexity within the Big Data technology landscape and internal political friction. In response, Infochimps proposes a comprehensive Big Data implementation methodology in addition to its PaaS platform. Details remain scant but we should expect to hear more at Strata and in the coming weeks about the Infochimps methodology for assessing the customer’s current state of Big Data and subsequently designing a programmatic path focused around integrating existing technology stacks with its Big Data PaaS. Infochimps was acquired by CSC in August 2013.
MapR Technologies today announced a partnership with HP Vertica that integrates the HP Vertica Analytics Platform with MapR’s enterprise-grade distribution of Apache Hadoop. As a result of the partnership, users of the HP Vertica Analytics Platform on MapR have the capability to leverage the SQL capabilities of the HP Vertica Analytics Platform against data stored in Hadoop clusters. The HP Vertica Analytics Platform constitutes yet another “SQL-on-Hadoop” solution that competes with the likes of Apache Hive, Concurrent’s Lingual, Cloudera’s Impala, Hadapt and the Hortonworks Stinger initiative. As noted in GigaOm, MapR itself leads Apache Drill, an open source initiative to develop a highly scalable, SQL-based interactive query engine for Apache Hadoop, but clearly made a strategic decision to expand the range of users of its Hadoop distribution by partnering with HP Vertica. Today, MapR also announced the release of the latest version of its Hadoop distribution featuring support for Hadoop 2.2 and YARN. Notably, users running Hadoop 1.x can take advantage of YARN’s resource management abilities to preview the functionality of YARN before upgrading to Hadoop 2.0. HP Vertica Analytics Platform on MapR is currently available in early access mode and will be generally available in March.
Hot on the heels of its $12M Series B funding in December, Trifacta recently announced the general availability of the Trifacta Data Transformation Platform. Based on its innovative Predictive Interaction™ technology, the Trifacta Data Transformation Platform uses visualization and machine learning to streamline and enrich user-level interactions with Big Data such as the type experienced by data scientists and business analysts. Trifacta’s Predictive Interaction technology features three components: (1) visualization of big data that empowers analysts to specify trends, values or analytics of interest; (2) interaction whereby the analyst responds to the data visualizations; and (3) prediction of the data transformations suggested by user interactions, with corresponding visualizations of the data transformation. The platform’s machine learning capability iteratively responds to user behavior to generate analytics of increasing value and interest. As a result, users can swiftly proceed from a raw, unprocessed archive of big data to incisive analytics and visualizations without the pre-processing, data cleansing and data transformation steps that are typically necessary to obtain deeper insights into about the data in question. The Trifacta Data Transformation Platform enables business analysts without scripting experience to derive nuanced insights about big data and additionally amplifies analyst productivity by means of its unique visualization and machine learning technology platform. Trifacta Customers include Lockheed Martin and Accretive Health, both of which remarked on the way in which the Trifacta Data Transformation Platform accelerates the data analysis lifecycle and streamlines user workflows. Trifacta’s technology is unique in the Big Data industry because of its focus on streamlining and enhancing the end user of big data analysis. Given the ubiquity of data visualization in the industry, much of the platform’s ability to differentiate itself will hinge on the sophistication of its predictive modeling and machine learning capabilities.
Big data management vendor Zettaset recently announced an enhancement to its Zettaset Orchestrator platform marked by the addition of a Business Intelligence connector for BI analytics applications. The connector extends Zettaset’s security and encryption functionality to BI applications that use Hadoop datasets integrated with the Zettaset Orchestrator infrastructure. As a result, BI applications leveraging data stored in Hadoop clusters can deliver analytics that are protected by Zettaset’s enterprise-grade security and Hadoop management functionality. Zettaset also announced that BI vendor MicroStrategy has certified the connector for use with its analytics platform. The partnership between MicroStrategy and Zettaset means that customers of both vendors can derive actionable business intelligence from Hadoop-based datasets while taking advantage of the high availability, auditing and compliance, role based access control, encryption and policy enforcement specific to the Zettaset Orchestrator platform. Zettaset’s BI connector announcement suggestively illustrates the gravity of enterprise concerns about data security, be it data at rest, data in transit or data in use. As Hadoop adoption accelerates throughout the enterprise, expect Hadoop distribution vendors and Big Data management vendors such as Zettaset to shore up components of their product offerings related to data security, particularly as they relate to interfaces to third party applications and hosting environments.
On Tuesday, Hortonworks announced the general availability of version 2.0 of the Hortonworks Data Platform for Windows. Hortonworks Data Platform 2.0 for Windows is the first distribution of Apache Hadoop 2.0 certified for Windows Server 2008 R2 and Windows Server 2012. Today’s announcement means that YARN (Yet Another Resource Negotiator), a key feature of Hadoop 2.0, is now available to Windows-based development environments. With HDP 2.0, developers in Windows shops can take advantage of YARN’s transformation of Hadoop from an infrastructure for batch processing to batch and real-time data processing. Moreover, HDP 2.0 features the NameNode High Availability functionality automates failovers and ensures the availability of the full HDP stack. Hortonworks collaborated closely with Microsoft in order to ensure the HDP 2.0 release achieved production-grade status within Windows environments. The release of HDP 2.0 marks yet another milestone in the story of the democratization of Apache Hadoop, the Big Data platform that is being rendered increasingly available to wider circles of users by means of initiatives such as Stinger (Hortonworks), Lingual (Concurrent) and Impala (Cloudera) that allow users to access and manipulate data stored in a Hadoop cluster using SQL.