Concurrent Releases Pattern To Facilitate Predictive Analytics On Hadoop

Today, Concurrent Inc. announces the release of Pattern, an open source tool designed to enable developers to build machine-learning applications on Hadoop by leveraging the Predictive Model Markup Lanaguage (PMML), the standard export format for popular predictive modeling tools such as R, MicroStrategy and SAS. Data scientists can use Pattern to export applications to Hadoop clusters and thereby run them against massive data sets. Pattern simplifies the process of building predictive models that operate on Hadoop clusters and lowers the barrier to the adoption of Apache Hadoop for advanced data mining and modeling use cases.

An example of a use case for Pattern includes evaluating the efficacy of models for a “predictive marketing intelligence solution” as illustrated below by Antony Arokiasamy, Senior Software Architect at AgilOne:

Pattern facilitates AgilOne to deploy a variety of advanced machine-learning algorithms for our cloud-based predictive marketing intelligence solution. As a self-service SaaS offering, Pattern allows us to evaluate multiple models and push the clients’ best models into our high performance scoring system. The PMML interface allows our advanced clients to deploy custom models.

Here, Arokiasamy remarks on the way in which Pattern facilitates scoring of predictive models that enables the selection of one model amongst others. AgilOne uses Pattern to run multiple predictive models in parallel against large data sets and additionally illustrates the efficacy of Pattern’s operation on a Hadoop cluster deployed in a cloud-based environment.

Pattern runs on the popular Cascading framework for simplifying the deployment and management of Hadoop clusters that is used by the likes of Twitter, eBay, Etsy and Razorfish. A free, open source application, Pattern constitutes yet another pillar in Concurrent’s array of applications for streamlining the use of Apache Hadoop alongside Cascading and Lingual, the ANSI-standard interface that enables developers to leverage SQL to query Hadoop clusters without having to learn MapReduce. The release of Pattern consolidates the positioning of Concurrent as a pioneer in the Big Data management space given its thought leadership in designing applications that facilitate enterprise adoption of Hadoop. Enterprises can now use Concurrent’s Cascading framework to operate on Hadoop clusters using JAVA APIs, SQL and predictive models written in PMML compatible analytics applications.

IBM Releases Big Data Software On SmartCloud; Cognos for iPad

On Monday, IBM announced the release of the Infosphere BigInsights application for analyzing massive volumes of structured and unstructured data on its SmartCloud environment. The SmartCloud release of IBM’s BigInsights application means that IBM beat competitors Oracle and Microsoft in the race to deploy an enterprise grade, cloud based Big Data analytics platform. Over the past month, Oracle and Microsoft have revealed plans to release cloud based Big Data applications that leverage Apache Hadoop, although in the case of both companies, plans for a live release are scheduled for 2012. BigInsights was previously accessed via the IBM Smart Business Development and Test Cloud environment that served as the testing ground for IBM’s SmartCloud which was deployed in April 2011.

IBM developed its Big Data analytics platform because organizations across a number of verticals are drowning in the sea of unstructured data such as Facebook and Twitter feeds, internet searches, log files and emails. IBM’s press release quantified the size of the emerging big data space as follows:

Organizations of all sizes are struggling to keep up with the rate and pace of big data and use it in a meaningful way to improve products, services, or the customer experience. Every day, people create the equivalent of 2.5 quintillion bytes of data from sensors, mobile devices, online transactions, and social networks; so much that 90 percent of the world’s data has been generated in the past two years. Every month people send one billion Tweets and post 30 billion messages on Facebook. Meanwhile, more than 1 trillion mobile devices are in use today and mobile commerce is expected to reach $31 billion by 2016.

IBM customers in the banking, insurance and communications verticals are currently using BigInsights to more effectively understand trends from web analytics, social media feeds, text messages and other forms of unstructured data. The availability of BigInsights via IBM’s SmartCloud is likely to accelerate enterprise adoption of the product given enterprise familiarity with the SmartCloud offering and recent publicity about its October 12 upgrade. The deployment of BigInsights on SmartCloud also gives IBM early traction in the Big Data space, with competition from Amazon Elastic MapReduce from Amazon Web Services, EMC, Teradata and HP. Granted, Oracle and Microsoft are set to join the Big Data party soon, but IBM should have at least six months to consolidate its market positioning ahead of its West coast based competitors. The enterprise version of BigInsights is priced at 60 cents per cluster per hour whereas the basic version is free.

Key features of enterprise level IBM Infosphere BigInsights include the following:

• Advanced text analytics to mine massive amounts of textual data
• A spreadsheet-like interface called BigSheets that allows users to create and deploy analytics without writing code
• Web-based management console
• Jaql, a query language for querying structured and unstructured data through an interface that resembles SQL

In tandem with the release of BigInsights on the SmartCloud, IBM announced the availability of IBM Cognos Mobile on the iPad and iPhone. iPad users can now leverage Cognos to run analytics on data and obtain access to a suite of visually rich dashboards. The combination of Cognos on the iPad and BigInsights clearly indicates that portability of access to data analytics constitutes a key component of IBM’s big data strategy. The big question now concerns how Oracle and Microsoft will differentiate themselves from BigInsights in their respective, forthcoming Big Data offerings.

Oracle Continues Big Data Push With Endeca Acquisition

On Tuesday, Oracle announced plans to acquire Big Data player Endeca just weeks after unveiling its Big Data appliance featuring Apache Hadoop and an Oracle NoSQL Database. Endeca’s proprietary MDEX technology powers two products: (1) Endeca InFront; and (2) Endeca Latitude. Endeca InFront enables users to understand customer trends and histories by examining online pages viewed, search terms and conversion rates. Endeca Latitude delivers a business intelligence platform for running analytics on structured and unstructured data.

According to Endeca’s website, Endeca Latitude claims the following differentiators from traditional BI solutions:

• No Data Left Behind:
Endeca Latitude incorporates a range of structured and unstructured data including unstructured data from web searches and Facebook and Twitter feeds in addition to traditional, relationally structured data.

• Consumer Ease of Use
Whereas traditional BI is based around reports and dashboards, Latitude’s analytics are delivered through interactive, web-based applications that provide a greater range of user drill-down and customization options.

• Agile Delivery
Endeca Latitude claims faster customization of its product to enterprise requriements than that of its competitors Autonomy and Attivio. Moreover, Endeca Latitude allows for iterative refinement of its analytics through its interactive application that enables enhanced collaboration between technology and business stakeholders.

One of the distinctive features of Endeca’s MDEX analytics engine is its lack of a data schema. Instead of a predefined data model, MDEX leverages a Faceted Data Model in which the schema emerges and morphs in relation to the characteristics of the data. Endeca InFront will be integrated with the Oracle ATG commerce engine to deliver analytics that improve online customer experience and conversion rates. Endeca Latitude will take its place alongside Oracle’s suite of BI tools and its forthcoming Big Data Appliance to analyze massive amounts of structured and unstructured data.

According to a June press release, Endeca “was used to power one of the largest eDiscovery clusters in the world exceeding 20 billion objects for interactive discovery – comparable in size to leading web search indexes of a few years ago.” The company’s customers include IBM, IEEE, Toyota, Ford, Walmart, The Home Depot and The U.S. Department of Justice out of a total of 600. Oracle did not disclose the terms of the acquisition of the Cambridge, Massachusetts based company although GigaOm reports that Endeca took in $65 million in venture capital over the course of four capital raises. The acquisition is expected to be completed by the end of 2011.

Battle for Big Data Heats Up As Microsoft and Oracle Announce Hadoop-based Products

The battle for market share in the big data space is officially underway, with passion. At last week’s Professional Association for SQL Server Summit (PASS), Microsoft announced plans to develop a platform for big data processing and analytics based on Hadoop, the open source software framework that operates under an Apache license. Microsoft’s announcement comes roughly ten days after Oracle’s unveiling of its Big Data Appliance that provides enterprise level capabilities to process structured and unstructured data.

Key features of Oracle’s Big Data Appliance include the following:

•Software
–Apache Hadoop
–Oracle NoSQL Database Enterprise Edition
–Oracle Data Integrator Application Adapter for Hadoop
–Oracle Loader for Hadoop
–Open source distribution of R

•Hardware
–Oracle’s Exadata x86 clusters (Oracle Exadata Database Machine, Oracle Exalytics Business Intelligence Machine)

Oracle’s hardware supports the Oracle 11g R2 database alongside Oracle’s Red Hat Enterprise Linux version and virtualization based on the Xen hypervisor. The company’s announcement of its plans to leverage a NoSQL database represented an abrupt about face of an earlier Oracle position that discredited the significance of NoSQL. In May, Oracle published a whitepaper Debunking the NoSQL Hype that downplayed the enterprise level capability of NoSQL deployments.

Microsoft’s forthcoming Big Data platform features the following:

–Hadoop for Windows Server and Azure
–Hadoop connectors for SQL Server and SQL Parallel Data Warehouse
–Hive ODBC drivers for users of Microsoft Business Intelligence applications

Microsoft revealed a strategic partnership with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. Microsoft’s decision not to leverage NoSQL and use instead a Windows based version of Hadoop for SQL Server 2012 constitutes the key difference between Microsoft and Oracle’s Big Data platforms. The entry of Microsoft and Oracle into the Big Data space suggests that the market is ready to explode as government and private sector agencies increasingly find value in unlocking business value from unstructured data such as emails, log files, twitter feeds and text-centered data. IBM and EMC hold the early market share lead but competition is set to intensify, particularly given the recent affirmation handed to NoSQL by tech giant Oracle.

Red Hat Acquires Gluster for $136 Million In Cash

Red Hat, provider of open source enterprise software solutions, announced Tuesday that it had reached a deal to acquire the online storage company Gluster for $136 million. Gluster was founded in 2005 with the objective of objective of leveraging open source software and commodity hardware to provide enterprise customers with public and private cloud based storage solutions. Gluster FS constitutes the core of Gluster’s offering in the form of a distributed file system that can scale to thousands of client terminals and petabytes of storage. Gluster competes with Sun Microsystem’s open source Lustre file system and IBM’s General Parallel File System. Headquartered in Sunnyvale, CA, the company currently has over 100 enterprise customers including Box.net, the personalized internet radio service Pandora and Deutsche Bank AG.

Red Hat’s acquisition of Gluster means that the commercial Linux distributor enters the $4 billion market for unstructured data storage. More importantly, the deal gives Red Hat the opportunity to define baseline standards for enterprise level management of unstructured data such as emails, log files and documents. Speaking of Red Hat’s motivations for the acquisition, Red Hat CTO Brian Stevens remarked:

The explosion of big data and the new paradigm of cloud computing are converging, forcing IT to re-think storage investments that are cost-effective, manageable and scale for the future. Our customers are looking for software-based storage solutions that manage their file-based data on-premise, in the cloud and bridging between the two.

With unstructured data growth (such as log files, virtual machines, email, audio, video and documents), the 90’s paradigm of forcing everything into expensive, single-system DBMS residing on an internal corporate SAN has become unwieldy and impractical.

Stevens indicates how the proliferation of unstructured data at the enterprise level renders cloud based solutions increasingly attractive and viable. Whereas Red Hat excels with open source open source operating systems, virtualization and cloud computing platforms such as CloudForms and OpenShift, its acquisition of Gluster fills a critical infrastructure need involving the management of on premise data in the cloud.

Gluster founder and CTO Anand Babu Periasamy commented on the synergies of the acquisition by noting: “We believe this is a perfect combination of technologies, strategies and cultures and is a great development for our customers, employees, investors and community. Gluster started off with a goal to be the Red Hat of storage. Now, we are the storage of Red Hat.” Red Hat acquired Gluster for $136 million in cash. As part of the acquisition, Red Hat will assume ownership of unvested Gluster equity and offer equity retention incentives to Gluster’s employees.