10gen Raises $42 Million For MongoDB, Open Source NoSQL Database

10gen, creators of the NoSQL database product MongoDB, today announced the finalization of Series E funding totaling $42 million. The capital raise was led by New Enterprise Associates with participation from existing investors Sequoia Capital, Flybridge Capital Partners and Union Square Ventures. The capital will be used to enhance product development as well as to support “its rapidly growing community and user base worldwide.” Today’s announcement brings 10gen’s net funding to $73 million over the course of five rounds.

10gen’s funding raise means that the battle for NoSQL market share is likely to heat up as MongoDB attempts to consolidate its position as the leading distributor of a NoSQL database product. 10gen claims that “MongoDB is the dominant NoSQL database, with top enterprises in Telecommunications, Financial Services, Media, Government and Technology standardizing on MongoDB.” Moreover, the company boasts growth of 50% every quarter for the last five quarters. Meanwhile, the 10gen team has grown 400% since January 2011 with a majority of its 130 employees still housed within the technology and product development departments.

10gen’s CEO Dwight Merriman remarked that one of the company’s goals was to disrupt the database landscape with MongoDB:

“We want to change the database market, to make MongoDB the best way for companies to build new applications. Our goal is to give tech teams not only a database that scales to any big data level required but also helps developers be productive and more nimble. That has been the vision of the MongoDB open source community and we want to continue to help make that happen.”

As Merrimen points out, scalability represents one of the key selling points of 10gen although, ironically, scalability constitutes one of the attributes along which MongoDB intends to improve with its most recent capital raise, as reported by GigaOm. Enterprise customers of 10gen that use MongoDB include Craigslist, Disney, Foursquare and The New York Times. Craigslist uses MongoDB to archive records that number in the billions.

MongoDB is an open source NoSQL database product with commercial support and licensing options. The free version of the product is available through a GNU Affero General Public License. MongoDB competes with the likes of Amazon Web Services’s DynamoDB, CouchBase, Redis, Riak and Neo4j in addition to DataStax’s commercialized variant of Apache Cassandra NoSQL. 10gen’s recent capital raise may strongly position the company for acquisition by HP, IBM or Dell, all of which could well be interested in a robust NoSQL database, or otherwise an IPO.

Infochimps Platform Represents PaaS For Big Data

This week, Infochimps revealed details of its Infochimps Platform for deriving business value from Big Data sets. The Infochimps Platform complements its data marketplace for over 200 data sets from companies such as Twitter, FourSquare and OkCupid for enterprises seeking to embed these data sets into their own software applications. Hosted on the Amazon Web Services cloud or within a customer’s private cloud, the heart of the Infochimps Platform consists of Ironfan, a technology that allows customers to configure and operationalize a Big Data stack quickly. Users can leverage Ironfan to determine which databases and analytic applications need to be selected to optimize their Big Data analyses.

An open-source technology developed by Infochimps, Ironfan allows business users to seamlessly scale and re-architect their Big Data processing infrastructure as their Big Data stack evolves. Ironfan constitutes the technology that enabled Infochimps to develop a “special social influence score” called Trstrank based on 20 million tweets per day. Manually configuring the server infrastructure for Trstrank on Amazon Web Services would have taken Infochimps four weeks. Configuring the technology stack for Trstrank using Ironfan, however, took only two hours.

Key features of the Infochimps Platform include:

• Apache Flume to manage delivery of data sets from point A to point B
• Support for databases such as HBase, Cassandra, Elastic Search, MongoDB and MySQL
• Elastic Hadoop that allows users to access only as much Hadoop resources as required
• Analytics that leverage Pig, Wukong (Infochimps) and other Hadoop-compatible analytics software frameworks

Infochimps will continue to offer its data marketplace alongside its turnkey Big Data platform to promote its mission of democratizing access to Big Data. The platform aptly illustrates the growing trend of the convergence of Big Data and cloud platforms. IBM, Microsoft, Oracle and Karmasphere join Infochimps in promoting cloud-based Big Data platforms. The Infochimps Platform for Big Data also represents the most recent example of a Platform as a Service infrastructure running on an Infrastructure as a Service public cloud.

A schematic of the Infochimps Platform architecture can be found below:

Pentaho Open-Sources Pentaho Kettle 4.3 Big Data Analytics

Pentaho open-sourced its Pentaho Kettle Big Data analytic tools to the Apache Software Foundation under an Apache 2.0 license on Tuesday. Pentaho’s decision to open-source Pentaho Kettle is intended to accelerate market adoption of Big Data technologies such as Hadoop and NoSQL according to Matt Casters, founder and chief architect of the Pentaho Kettle Project. Because the Apache Software Foundation incubates the open-source progress of Hadoop and several NoSQL databases, developers now have one organizational infrastructure through which to access Big Data software frameworks and analytic tools such as Pentaho Kettle. Users of Pentaho Kettle 4.3 can leverage Big Data stored using Hadoop HDFS, Hadoop MapReduce, Hadapt, HBase, Hive, HPCC Systems, Cassandra and MongoDB.

Pentaho Kettle 4.3 supports commercial and non-commercial distributions of Hadoop, including “Amazon Elastic MapReduce, Apache Hadoop, Cloudera’s Distribution including Apache Hadoop (CDH), Cloudera Enterprise, EMC Greenplum HD, HortonWorks Data Platform powered by Apache Hadoop, and MapR’s M3 Free and M5 Edition.”

Pentaho Kettle offers users the ability to:

• Import, extract, transform and report on data from a variety of Big Data technologies
• Leverage Pentaho Kettle’s visualization interface to orchestrate jobs such as Hadoop MapReduce jobs, Pentaho MapReduce jobs, Pig scripts, Hive queries and HBase queries
• Make use of its integration with Big Data technologies to take advantage of immanent functionality specific to the relevant Big Data software framework
• Quickly harness the power of the Pentaho Business Analytics full suite of reporting and analytics tools

Pentaho’s decision to open-source Pentaho Kettle 4.3 was widely applauded by the Big Data community. Executives from 10gen, Cloudera and Hadapt hailed the open-sourcing of Pentaho Kettle 4.3 by noting its exceptional analytic capabilities and potential to increase market adoption of Big Data technologies. The open-sourcing of Pentaho Kettle also renders Big Data technologies more widely accessible to the developer community such as users who lack sophisticated coding skills in Java MapReduce jobs and Pig.

Nine Startups Compete For IBM Global Entrepreneur Of The Year Award For Solutions Benefiting Cities

IBM recently announced nine startups as finalists in the IBM Global Entrepreneur of the Year competition. The competition features startups that tackle problems faced by cities and urban environments. Each of the nine finalists placed first in regional competitions spanning the nine cities of Barcelona, Tel Aviv, Bangalore, Rio de Janeiro, New York City, Shanghai, London, Austin and Istanbul. Finalists will meet with IBM and venture capital firms in San Francisco from January 31 to February 2 in order to compete for the title of IBM Global Entrepreneur of the Year.

The nine finalists are:

BitCarrier: (winner, SmartCamp Barcelona)
Bitcarrier provides traffic management solutions that leverage data from Bluetooth and public WiFI mobile devices.

C-B4 Context Based 4Casting: (winner, SmartCamp Tel Aviv)
C-B4 specializes in pattern recognition and predictive “Big Data” analytics that provide customers with actionable insights for making strategic business decisions.

ConnectM: (winner, SmartCamp Bangalore)
ConnectIM delivers business intelligence on domain specific analytics in the machine to machine (M2M) space.

IDXP: (winner, SmartCamp Rio de Janeiro)
IDXP analyzes data about consumer behavior in the brick and mortar retail space.

Localytics: (winner, SmartCamp New York City)
Localytics provides analytics that enable developers of applications for smartphones and tablets to improve the usage of their mobile apps.

Palmap: (winner, SmartCamp Shanghai)
Palmap’s technology maps indoor spaces that are not presently covered by GIS data to help enable businesses and consumers more effectively navigate spaces such as airports and shopping malls.

Profitero: (winner, SmartCamp London)
Profitero provides analytics on competitor pricing, product inventory and strategy in order to help organizations develop more effective pricing strategies.

SecureWaters: (winner, SmartCamp Austin)
SecureWaters sells a product that detects toxic products in drinking water sources at concentrations far below those resulting from accidental contamination or a terrorist attack.

SkinScan: (winner, SmartCamp Istanbul)
SkinScan provides an Apple iPhone integrated product for tracking and analyzing skin moles in order to proactively guard against skin cancer.

Business intelligence on Big Data represents the clear theme for over half of the finalists. Only one finalist examined problems specific to the healthcare (SkinScan) and counter-terrorism (SecureWaters) verticals, respectively. The IBM Global Entrepreneur Winner For 2010 was Streetline, which received $15 million in venture capital for its analytic solutions that empower cities, garages, airports and universities to more effectively manage parking and increase parking-related revenue.

IBM Releases Netezza Customer Intelligence Appliance For Big Data Analysis For Retailers

IBM revealed the availability of its Netezza Customer Intelligence Appliance today. The appliance empowers retailers to analyze petabytes of data about customer interactions regarding their products across multiple segments and points of interaction. Retailers can use the Netezza Customer Intelligence Appliance to obtain a 360 degree picture of customer behavior spanning the internet, mobile and purchases in brick and mortar stores. Based on the premise that 70 percent of a customer’s initial interactions with a product take place online, IBM partnered with Aginity to develop actionable analytics that assist retailers to understand and predict customer behavior by aggregating data from multiple channels. Leslie Weber, CIO of Bass Pro Shops, testified that the appliance had enabled the retailer to analyze data from “retail stores, boat dealerships, Internet and catalog sales, wholesale and hospitality,” thereby enabling it to “deliver more targeted promotions, circulars and catalogs to create a better shopping experience.” The Netezza Customer Intelligence Appliance is a SQL based Big Data product designed to deliver analytics on structured data. IBM is still in the process of integrating Netezza with its BigInsights, Hadoop-based appliance for analyzing structured and unstructured data.

Amazon DynamoDB Offers Big Data Cloud Processing With Managed Services

This week, Amazon Web Services announced the availability of Amazon DynamoDB, a fully managed cloud-based database service for Big Data processing. The announcement represents yet another move by Amazon Web Services to consolidate enterprise market share by providing an offering that can store massive amounts of data with ultra-fast, predictable rates of performance and low latency waiting times. Amazon DynamoDB is a NoSQL database built for customers that do not require complex querying capabilities such as indexes, transactions, or joins. DynamoDB constitutes a greatly enhanced version of Amazon SimpleDB. One of Amazon SimpleDB’s principal limitations is its 10 GB limit on data within containers known as domains. Moreover, Amazon SimpleDB suffered from performance issues due to indexing all of the attributes for an object within a domain and a commitment to eventual consistency of the database taken to an extreme. Amazon DynamoDB builds upon the company’s prior experience with SimpleDB and Dynamo, the precursor to NoSQL, by offering the following features:

• Managed services

Amazon DynamoDB managed services take care of processes such as provisioning servers, configuring a cluster, and dealing with scaling, partition and replication issues.

• No Upper Bound On Data

Customers can store as much data as they would like. Data will be spread out across multiple servers spanning multiple Availability Zones.

• Speed

The solid state drives on which Amazon DynamoDB is built help optimize performance and ensure low latencies. Applications running in the EC2 environment should expect to see latencies in the “single-digit millisecond range for a 1KB object.” Another reason performance is optimized involves a design whereby all attributes are not indexed.

• Flexible schemas and data models

Data need not adopt a particular schema and can have multiple attributes, including attributes that themselves have multiple values.

• Integration with Amazon Elastic MapReduce (Amazon EMR)

Because DynamoDB is integrated with the Hadoop-based, Amazon Elastic MapReduce technology, customers can analyze data in DynamoDB and store the results in S3, thereby preserving the original dataset in DynamoDB.

• Low cost

Pricing starts at $1 per GB per month.

With this set of features, Amazon DynamoDB represents a dramatic entrant to the Big Data party that features Oracle, HP, Teradata, Splunk and others. The product underscores Amazon Web Services’s strategic investment in becoming a one-stop service for cloud and Big Data processing. Moreover, the managed services component of Amazon DynamoDB represents a clear change of pace by Jeff Bezos’s spin-off because of its recognition of the value of managed services at the enterprise level for technology deployments. Amazon DynamoDB’s managed services offering is expected to appeal to enterprises that would rather invest technical resources in innovation and software development as opposed to the operational maintenance of a complex IT ecosystem. Assuming that AWS can quantify the degree to which DynamoDB’s managed services offering ends up being responsible for sales, expect to see more managed service offerings from Amazon Web Services in both the cloud computing and Big Data verticals. Going forward, the technology community should also expect partnerships between Amazon Web Services and business intelligence vendors that mimic the deal between Jaspersoft and Red Hat’s OpenShift given how Amazon Web Services appears intent on retaining customers within their ecosystem for all of their cloud hosting, Big Data and business intelligence analytics needs.