Infochimps today announced a Big Data platform as a service that integrates with existing enterprise IT infrastructures while adding Big Data management and analytic solutions. The Infochimps platform is based on open source, web-scale technologies in addition to a cloud-based deployment structure. One of the unique features of the Infochimps solution is that it gauges the position of customers with respect to Big Data management and subsequently recommends a path toward effectively operationalizing Big Data in conjunction with customer needs. To help customers understand how to realize their Big Data needs, Infochimps complements its Big Data platform as a service with a suite of consulting services designed to guide customers through the Big Data lifecycle. Jim Kaskade, Director of CSC’s open Big Data solutions, commented on the Infochimps methodology as follows:
We’ve defined distinct phases along the Big Data adoption lifecycle where companies fall. We identify our customers’ current state, and then carefully guide them to organization-wide operationalization of Big Data insights.
Infochimps shares the insight previously articulated by Paul Maritz, CEO of Pivotal, that with the exception of companies such as Google, Facebook and Twitter, few enterprises have come to terms with the project of effectively operationalizing Big Data. In an interview with Raj Dalal of Big Data Insights at Strata 2014, Kaskade claimed that approximately 50% of Big Data initiatives fail due to poorly scoped projects, excessive complexity within the Big Data technology landscape and internal political friction. In response, Infochimps proposes a comprehensive Big Data implementation methodology in addition to its PaaS platform. Details remain scant but we should expect to hear more at Strata and in the coming weeks about the Infochimps methodology for assessing the customer’s current state of Big Data and subsequently designing a programmatic path focused around integrating existing technology stacks with its Big Data PaaS. Infochimps was acquired by CSC in August 2013.
This week, Infochimps revealed details of its Infochimps Platform for deriving business value from Big Data sets. The Infochimps Platform complements its data marketplace for over 200 data sets from companies such as Twitter, FourSquare and OkCupid for enterprises seeking to embed these data sets into their own software applications. Hosted on the Amazon Web Services cloud or within a customer’s private cloud, the heart of the Infochimps Platform consists of Ironfan, a technology that allows customers to configure and operationalize a Big Data stack quickly. Users can leverage Ironfan to determine which databases and analytic applications need to be selected to optimize their Big Data analyses.
An open-source technology developed by Infochimps, Ironfan allows business users to seamlessly scale and re-architect their Big Data processing infrastructure as their Big Data stack evolves. Ironfan constitutes the technology that enabled Infochimps to develop a “special social influence score” called Trstrank based on 20 million tweets per day. Manually configuring the server infrastructure for Trstrank on Amazon Web Services would have taken Infochimps four weeks. Configuring the technology stack for Trstrank using Ironfan, however, took only two hours.
Key features of the Infochimps Platform include:
• Apache Flume to manage delivery of data sets from point A to point B
• Support for databases such as HBase, Cassandra, Elastic Search, MongoDB and MySQL
• Elastic Hadoop that allows users to access only as much Hadoop resources as required
• Analytics that leverage Pig, Wukong (Infochimps) and other Hadoop-compatible analytics software frameworks
Infochimps will continue to offer its data marketplace alongside its turnkey Big Data platform to promote its mission of democratizing access to Big Data. The platform aptly illustrates the growing trend of the convergence of Big Data and cloud platforms. IBM, Microsoft, Oracle and Karmasphere join Infochimps in promoting cloud-based Big Data platforms. The Infochimps Platform for Big Data also represents the most recent example of a Platform as a Service infrastructure running on an Infrastructure as a Service public cloud.
A schematic of the Infochimps Platform architecture can be found below: