This week, Infochimps revealed details of its Infochimps Platform for deriving business value from Big Data sets. The Infochimps Platform complements its data marketplace for over 200 data sets from companies such as Twitter, FourSquare and OkCupid for enterprises seeking to embed these data sets into their own software applications. Hosted on the Amazon Web Services cloud or within a customer’s private cloud, the heart of the Infochimps Platform consists of Ironfan, a technology that allows customers to configure and operationalize a Big Data stack quickly. Users can leverage Ironfan to determine which databases and analytic applications need to be selected to optimize their Big Data analyses.
An open-source technology developed by Infochimps, Ironfan allows business users to seamlessly scale and re-architect their Big Data processing infrastructure as their Big Data stack evolves. Ironfan constitutes the technology that enabled Infochimps to develop a “special social influence score” called Trstrank based on 20 million tweets per day. Manually configuring the server infrastructure for Trstrank on Amazon Web Services would have taken Infochimps four weeks. Configuring the technology stack for Trstrank using Ironfan, however, took only two hours.
Key features of the Infochimps Platform include:
• Apache Flume to manage delivery of data sets from point A to point B
• Support for databases such as HBase, Cassandra, Elastic Search, MongoDB and MySQL
• Elastic Hadoop that allows users to access only as much Hadoop resources as required
• Analytics that leverage Pig, Wukong (Infochimps) and other Hadoop-compatible analytics software frameworks
Infochimps will continue to offer its data marketplace alongside its turnkey Big Data platform to promote its mission of democratizing access to Big Data. The platform aptly illustrates the growing trend of the convergence of Big Data and cloud platforms. IBM, Microsoft, Oracle and Karmasphere join Infochimps in promoting cloud-based Big Data platforms. The Infochimps Platform for Big Data also represents the most recent example of a Platform as a Service infrastructure running on an Infrastructure as a Service public cloud.
A schematic of the Infochimps Platform architecture can be found below: