On Wednesday, the Apache Software Foundation announced the release of Cassandra version 1.2, the high performance, highly scalable, Big Data distributed NoSQL database. Cassandra is capable of managing thousands of data requests per second and is used by organizations such as Adobe, Cisco, Constant Contact, Digg, Disney, eBay, Netflix, Rackspace and Twitter.
Key components of the latest release include the following:
•Virtual nodes and clustering across virtual nodes
•Node to node communication
•Version 3 of the Cassandra Query Language (CQL) to simplify the modeling of applications, enable more powerful mapping and facilitate superior database design
Jonathan Ellis, Vice President of Apache Cassandra, reflected on the significance of the Cassandra 1.2 release as follows:
We are pleased to announce Cassandra 1.2. By improving support for dense clusters —powering multiple terabytes per node— as well as simplifying application modeling, and improving data cell storage/design/representation, systems are able to effortlessly scale petabytes of data.
Here, Ellis notes that one of the key functionality upgrades specific to Cassandra consists of enhanced support for dense clusters featuring several terabytes per node. The conjunction of the platform’s improved support for dense clusters with its streamlined application modeling capability and superior design abilities allows for vastly improved scalability for petabytes of data.
Cassandra users expressed particular enthusiasm for the virtual node and atomic batch components of the new release. Software developer Kelly Sommers elaborated on the significance of Cassandra 1.2’s improved handling of virtual nodes as follows:
In Cassandra v1.2 the introduction of vnodes will simplify managing clusters while improving performance when adding and rebuilding nodes. v1.2 also includes many new features, performance improvements and further heap reduction to alleviate the burden on the JVM garbage collector.
Virtual nodes improves performance, notes Sommers. Meanwhile, reducing the burden on the JVM garbage collector similarly enables notable performance enhancements as detailed by a recent blog post by Twitter, which noted how JVM garbage collector optimization significantly reduced CPU time for Twitter.com, separate from any direct reference to Cassandra.
Improved performance, increased scalabilty and simplified application development represent the three recurring themes from user experiences of the Cassandra 1.2 release. In contrast to Hadoop, Cassandra is known for its ability to handle massive amounts of real-time operational data whereas Hadoop is famed for its ability to deal with batch-based volumes of data. The latest release means that Big Data just got even bigger by virtue of Cassandra 1.2’s performance enhancements and application modeling and database design simplifications.