NoSQL vendor Couchbase today announced the finalization of $25 million in Series D funding in a round led be Adams Street Partners with additional participation from existing investors Accel Partners, Mayfield Fund, North Bridge Venture Partners, and Ignition Partners. The funding will be used to support strategic product initiatives and the expansion of the company’s sales and marketing team. With regard to its international growth, Couchbase has specific plans to open new offices in Brazil, Argentina, India and China and grow its existing operations in North America, Europe, Japan, Korea and Israel. The funding raise comes soon after the release of Couchbase 2.0 and skyrocketing 2013 sales on the order of 400%, including the closure of deals with several prized enterprise customers, according to the company’s press release. Couchbase is the company behind the Couchbase Open Source Project marked by its trademark product Couchbase Server, a distributed NoSQL document-oriented database used by the likes of AOL, LinkedIn, Orbitz, Salesforce.com and Zynga. The capital raise and Couchbase’s impressive growth point underscore the industry’s increasing acceptance of NoSQL as the proliferation of semi-structured data renders non-relational databases increasingly critical to the Big Data revolution.
On Wednesday, the Apache Software Foundation announced the release of Cassandra version 1.2, the high performance, highly scalable, Big Data distributed NoSQL database. Cassandra is capable of managing thousands of data requests per second and is used by organizations such as Adobe, Cisco, Constant Contact, Digg, Disney, eBay, Netflix, Rackspace and Twitter.
Key components of the latest release include the following:
•Virtual nodes and clustering across virtual nodes
•Node to node communication
•Version 3 of the Cassandra Query Language (CQL) to simplify the modeling of applications, enable more powerful mapping and facilitate superior database design
Jonathan Ellis, Vice President of Apache Cassandra, reflected on the significance of the Cassandra 1.2 release as follows:
We are pleased to announce Cassandra 1.2. By improving support for dense clusters —powering multiple terabytes per node— as well as simplifying application modeling, and improving data cell storage/design/representation, systems are able to effortlessly scale petabytes of data.
Here, Ellis notes that one of the key functionality upgrades specific to Cassandra consists of enhanced support for dense clusters featuring several terabytes per node. The conjunction of the platform’s improved support for dense clusters with its streamlined application modeling capability and superior design abilities allows for vastly improved scalability for petabytes of data.
Cassandra users expressed particular enthusiasm for the virtual node and atomic batch components of the new release. Software developer Kelly Sommers elaborated on the significance of Cassandra 1.2’s improved handling of virtual nodes as follows:
In Cassandra v1.2 the introduction of vnodes will simplify managing clusters while improving performance when adding and rebuilding nodes. v1.2 also includes many new features, performance improvements and further heap reduction to alleviate the burden on the JVM garbage collector.
Virtual nodes improves performance, notes Sommers. Meanwhile, reducing the burden on the JVM garbage collector similarly enables notable performance enhancements as detailed by a recent blog post by Twitter, which noted how JVM garbage collector optimization significantly reduced CPU time for Twitter.com, separate from any direct reference to Cassandra.
Improved performance, increased scalabilty and simplified application development represent the three recurring themes from user experiences of the Cassandra 1.2 release. In contrast to Hadoop, Cassandra is known for its ability to handle massive amounts of real-time operational data whereas Hadoop is famed for its ability to deal with batch-based volumes of data. The latest release means that Big Data just got even bigger by virtue of Cassandra 1.2’s performance enhancements and application modeling and database design simplifications.
10gen, creators of the NoSQL database product MongoDB, today announced the finalization of Series E funding totaling $42 million. The capital raise was led by New Enterprise Associates with participation from existing investors Sequoia Capital, Flybridge Capital Partners and Union Square Ventures. The capital will be used to enhance product development as well as to support “its rapidly growing community and user base worldwide.” Today’s announcement brings 10gen’s net funding to $73 million over the course of five rounds.
10gen’s funding raise means that the battle for NoSQL market share is likely to heat up as MongoDB attempts to consolidate its position as the leading distributor of a NoSQL database product. 10gen claims that “MongoDB is the dominant NoSQL database, with top enterprises in Telecommunications, Financial Services, Media, Government and Technology standardizing on MongoDB.” Moreover, the company boasts growth of 50% every quarter for the last five quarters. Meanwhile, the 10gen team has grown 400% since January 2011 with a majority of its 130 employees still housed within the technology and product development departments.
10gen’s CEO Dwight Merriman remarked that one of the company’s goals was to disrupt the database landscape with MongoDB:
“We want to change the database market, to make MongoDB the best way for companies to build new applications. Our goal is to give tech teams not only a database that scales to any big data level required but also helps developers be productive and more nimble. That has been the vision of the MongoDB open source community and we want to continue to help make that happen.”
As Merrimen points out, scalability represents one of the key selling points of 10gen although, ironically, scalability constitutes one of the attributes along which MongoDB intends to improve with its most recent capital raise, as reported by GigaOm. Enterprise customers of 10gen that use MongoDB include Craigslist, Disney, Foursquare and The New York Times. Craigslist uses MongoDB to archive records that number in the billions.
MongoDB is an open source NoSQL database product with commercial support and licensing options. The free version of the product is available through a GNU Affero General Public License. MongoDB competes with the likes of Amazon Web Services’s DynamoDB, CouchBase, Redis, Riak and Neo4j in addition to DataStax’s commercialized variant of Apache Cassandra NoSQL. 10gen’s recent capital raise may strongly position the company for acquisition by HP, IBM or Dell, all of which could well be interested in a robust NoSQL database, or otherwise an IPO.
This week, Amazon Web Services announced the availability of Amazon DynamoDB, a fully managed cloud-based database service for Big Data processing. The announcement represents yet another move by Amazon Web Services to consolidate enterprise market share by providing an offering that can store massive amounts of data with ultra-fast, predictable rates of performance and low latency waiting times. Amazon DynamoDB is a NoSQL database built for customers that do not require complex querying capabilities such as indexes, transactions, or joins. DynamoDB constitutes a greatly enhanced version of Amazon SimpleDB. One of Amazon SimpleDB’s principal limitations is its 10 GB limit on data within containers known as domains. Moreover, Amazon SimpleDB suffered from performance issues due to indexing all of the attributes for an object within a domain and a commitment to eventual consistency of the database taken to an extreme. Amazon DynamoDB builds upon the company’s prior experience with SimpleDB and Dynamo, the precursor to NoSQL, by offering the following features:
• Managed services
Amazon DynamoDB managed services take care of processes such as provisioning servers, configuring a cluster, and dealing with scaling, partition and replication issues.
• No Upper Bound On Data
Customers can store as much data as they would like. Data will be spread out across multiple servers spanning multiple Availability Zones.
The solid state drives on which Amazon DynamoDB is built help optimize performance and ensure low latencies. Applications running in the EC2 environment should expect to see latencies in the “single-digit millisecond range for a 1KB object.” Another reason performance is optimized involves a design whereby all attributes are not indexed.
• Flexible schemas and data models
Data need not adopt a particular schema and can have multiple attributes, including attributes that themselves have multiple values.
• Integration with Amazon Elastic MapReduce (Amazon EMR)
Because DynamoDB is integrated with the Hadoop-based, Amazon Elastic MapReduce technology, customers can analyze data in DynamoDB and store the results in S3, thereby preserving the original dataset in DynamoDB.
• Low cost
Pricing starts at $1 per GB per month.
With this set of features, Amazon DynamoDB represents a dramatic entrant to the Big Data party that features Oracle, HP, Teradata, Splunk and others. The product underscores Amazon Web Services’s strategic investment in becoming a one-stop service for cloud and Big Data processing. Moreover, the managed services component of Amazon DynamoDB represents a clear change of pace by Jeff Bezos’s spin-off because of its recognition of the value of managed services at the enterprise level for technology deployments. Amazon DynamoDB’s managed services offering is expected to appeal to enterprises that would rather invest technical resources in innovation and software development as opposed to the operational maintenance of a complex IT ecosystem. Assuming that AWS can quantify the degree to which DynamoDB’s managed services offering ends up being responsible for sales, expect to see more managed service offerings from Amazon Web Services in both the cloud computing and Big Data verticals. Going forward, the technology community should also expect partnerships between Amazon Web Services and business intelligence vendors that mimic the deal between Jaspersoft and Red Hat’s OpenShift given how Amazon Web Services appears intent on retaining customers within their ecosystem for all of their cloud hosting, Big Data and business intelligence analytics needs.
If 2011 was the year of Cloud Computing, then 2012 will surely be the year of Big Data. Big Data has yet to arrive in the way cloud computing has, but the framework for its widespread deployment as a commodity emerged with style and unmistakable promise. For the first time, Hadoop and NoSQL gained currency not only within the developer community, but also amongst bloggers and analysts. More importantly, Big Data garnered for itself a certain status and meaning in the technology community even though few people asked about the meaning of big in “Big Data” in a landscape where the circle around the meaning of “big” with respect to “data” is constantly being redrawn. Even though yesterday’s “big” in Big Data morphed into today’s “small” as consumer personal storage transitions from gigabytes to terabytes, the term “Big Data” emerged as a term that everyone almost instantly understood. It was as if consumers and enterprises alike had been searching for years for a long lost term to describe the explosion of data as evinced by web searches, web content, Facebook and Twitter feeds, photographs, log files and miscellaneous structured and unstructured content. Having been speechless, lacking the vocabulary to find the term for the data explosion, the world suddenly embraced the term Big Data with passion.
Below are some of the highlights of 2011 with respect to big data:
•Teradata finalized a deal to acquire Big Data player Aster Data Systems for $263 million.
•Yahoo revealed plans to create Hortonworks, a spin-off dedicated to the commercialization of Apache Hadoop.
•Teradata announced the Teradata Aster MapReduce Platform that combines SQL with MapReduce. The Teradata Aster MapReduce Platform empowers business analysts who know SQL to leverage the power of MapReduce without having to write scripted queries in Java, Python, Perl or C.
•Oracle announced plans to launch a Big Data appliance featuring Apache Hadoop, Oracle NoSQL Database Enterprise Edition and an open source distribution of R. The company’s announcement of its plans to leverage a NoSQL database represented an abrupt about face of an earlier Oracle position that discredited the significance of NoSQL.
•Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Microsoft revealed a strategic partnership with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. Microsoft’s decision not to leverage NoSQL and use instead a Windows based version of Hadoop for SQL Server 2012 constituted the key difference between Microsoft and Oracle’s Big Data platforms.
•IBM announced the release of IBM Infosphere BigInsights application for analyzing “Big Data.” The SmartCloud release of IBM’s BigInsights application means that IBM beat competitors Oracle and Microsoft in the race to deploy an enterprise grade, cloud based Big Data analytics platform.
•Christophe Bisciglia, founder of Cloudera, the commercial distributor of Apache Hadoop, launched a startup called Odiago that features a Big Data product named WibiData. WibiData manages investigative and operational analytics on “consumer internet data” such as website traffic on traditional and mobile computing devices.
•Cloudera announced a partnership with NetApp, the storage and data management vendor. The partnership revealed the release of the NetApp Open Solution for Hadoop, a preconfigured Hadoop cluster that combines Cloudera’s Apache Hadoop (CDH) and Cloudera Enterprise with NetApp’s RAID architecture.
•Big Data player Karmasphere announced plans to join the Hortonworks Technology Partner Program today. The partnership enables Karmasphere to offer its Big Data intelligence product Karmasphere Analytics on the Apache Hadoop software infrastructure that undergirds the Hortonworks Data Platform.
•Informatica released the world’s first Hadoop parser. Informatica HParser operates on virtually all versions of Apache Hadoop and specializes in transforming unstructured data into a structured format within a Hadoop installation.
•MarkLogic announced support for Hadoop, the Apache open source software framework for analyzing Big Data with the release of MarkLogic 5.
•HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, a Next Generation Information Platform that integrates two of its 2011 acquisitions, Vertica and Autonomy. Autonomy IDOL 10 features Autonomy’s capabilities for processing unstructured data, Vertica’s ability to rapidly process large-scale structured data sets, a NoSQL interface for loading and analyzing structured and unstructured data and solutions dedicated to the Data, Social Media, Risk Management, Cloud and Mobility verticals.
•EMC announced the release of its Greenplum Unified Analytics Platform (UAP). The EMC Greenplum UAP contains the The EMC Greenplum platform for the analysis of structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and EMC Greenplum Chorus, a collaboration and productivity software tool that enables social networking amongst constituents in an organization that are leveraging Big Data.
The widespread adoption of Hadoop punctuated the Big Data story of the year so far. Hadoop featured in almost every Big Data story of the year, from Oracle to Microsoft to HP and EMC, while NoSQL came in a close second. Going into 2012, one of the key questions for the Big Data space concerns the ability of OpenStack to support Hadoop, NoSQL, MapReduce and other Big Data technologies. The other key question for Big Data hinges on the user friendliness of Big Data applications for business analysts in addition to programmers. EMC’s Greenplum Chorus, for example, democratizes access to its platform via a user interface that promotes collaboration amongst multiple constituents in an organization by transforming questions into structured queries. Similarly, the Teradata Aster MapReduce Platform allows business analysts to make use of its MapReduce technology by using SQL. That said, as Hadoop becomes more and more mainstream, the tech startup and data intensive spaces are likely to witness a greater number of data analysts trained in Apache Hadoop in conjunction with efforts by vendors to render Hadoop more accessible to programmers and non-programmers alike.
Cloudant named Derek Schoettle as its new CEO. Prior to Cloudant, Schoettle was VP of Sales at the HP acquisition Vertica. Cloudant provides a data management platform for the analysis of multi-petabyte, “Big Data” sets. Its BigCouch data management platform leverages the open source, NoSQL Apache CouchDB technology either via a “database as a service” through a public cloud such as Amazon EC2 or Rackspace, or a licensed offering for a private cloud. Cloudant made headlines in October when it reached a deal with agriculture company Monsanto to target genetic pathways that result in increased yield and tolerance of stress in corn, soy and other crops. Cloudant’s Big Data platform plans to house and run analytics on Monsanto’s growing body of data in order to accelerate genomic sequencing analysis of crops. CouchDB, Cloudant’s underlying technology, is a NoSQL data storage platform commercially distributed by Couchbase in addition to Cloudant. Late last week, Cloudant announced it raised $2.1 million in an equity funding filed with the Securities and Exchange Commission.