Amazon DynamoDB Offers Big Data Cloud Processing With Managed Services

This week, Amazon Web Services announced the availability of Amazon DynamoDB, a fully managed cloud-based database service for Big Data processing. The announcement represents yet another move by Amazon Web Services to consolidate enterprise market share by providing an offering that can store massive amounts of data with ultra-fast, predictable rates of performance and low latency waiting times. Amazon DynamoDB is a NoSQL database built for customers that do not require complex querying capabilities such as indexes, transactions, or joins. DynamoDB constitutes a greatly enhanced version of Amazon SimpleDB. One of Amazon SimpleDB’s principal limitations is its 10 GB limit on data within containers known as domains. Moreover, Amazon SimpleDB suffered from performance issues due to indexing all of the attributes for an object within a domain and a commitment to eventual consistency of the database taken to an extreme. Amazon DynamoDB builds upon the company’s prior experience with SimpleDB and Dynamo, the precursor to NoSQL, by offering the following features:

• Managed services

Amazon DynamoDB managed services take care of processes such as provisioning servers, configuring a cluster, and dealing with scaling, partition and replication issues.

• No Upper Bound On Data

Customers can store as much data as they would like. Data will be spread out across multiple servers spanning multiple Availability Zones.

• Speed

The solid state drives on which Amazon DynamoDB is built help optimize performance and ensure low latencies. Applications running in the EC2 environment should expect to see latencies in the “single-digit millisecond range for a 1KB object.” Another reason performance is optimized involves a design whereby all attributes are not indexed.

• Flexible schemas and data models

Data need not adopt a particular schema and can have multiple attributes, including attributes that themselves have multiple values.

• Integration with Amazon Elastic MapReduce (Amazon EMR)

Because DynamoDB is integrated with the Hadoop-based, Amazon Elastic MapReduce technology, customers can analyze data in DynamoDB and store the results in S3, thereby preserving the original dataset in DynamoDB.

• Low cost

Pricing starts at $1 per GB per month.

With this set of features, Amazon DynamoDB represents a dramatic entrant to the Big Data party that features Oracle, HP, Teradata, Splunk and others. The product underscores Amazon Web Services’s strategic investment in becoming a one-stop service for cloud and Big Data processing. Moreover, the managed services component of Amazon DynamoDB represents a clear change of pace by Jeff Bezos’s spin-off because of its recognition of the value of managed services at the enterprise level for technology deployments. Amazon DynamoDB’s managed services offering is expected to appeal to enterprises that would rather invest technical resources in innovation and software development as opposed to the operational maintenance of a complex IT ecosystem. Assuming that AWS can quantify the degree to which DynamoDB’s managed services offering ends up being responsible for sales, expect to see more managed service offerings from Amazon Web Services in both the cloud computing and Big Data verticals. Going forward, the technology community should also expect partnerships between Amazon Web Services and business intelligence vendors that mimic the deal between Jaspersoft and Red Hat’s OpenShift given how Amazon Web Services appears intent on retaining customers within their ecosystem for all of their cloud hosting, Big Data and business intelligence analytics needs.

Big Data 2011: The Year in Review

If 2011 was the year of Cloud Computing, then 2012 will surely be the year of Big Data. Big Data has yet to arrive in the way cloud computing has, but the framework for its widespread deployment as a commodity emerged with style and unmistakable promise. For the first time, Hadoop and NoSQL gained currency not only within the developer community, but also amongst bloggers and analysts. More importantly, Big Data garnered for itself a certain status and meaning in the technology community even though few people asked about the meaning of big in “Big Data” in a landscape where the circle around the meaning of “big” with respect to “data” is constantly being redrawn. Even though yesterday’s “big” in Big Data morphed into today’s “small” as consumer personal storage transitions from gigabytes to terabytes, the term “Big Data” emerged as a term that everyone almost instantly understood. It was as if consumers and enterprises alike had been searching for years for a long lost term to describe the explosion of data as evinced by web searches, web content, Facebook and Twitter feeds, photographs, log files and miscellaneous structured and unstructured content. Having been speechless, lacking the vocabulary to find the term for the data explosion, the world suddenly embraced the term Big Data with passion.

Below are some of the highlights of 2011 with respect to big data:

March
•Teradata finalized a deal to acquire Big Data player Aster Data Systems for $263 million.

July
•Yahoo revealed plans to create Hortonworks, a spin-off dedicated to the commercialization of Apache Hadoop.

September
Teradata announced the Teradata Aster MapReduce Platform that combines SQL with MapReduce. The Teradata Aster MapReduce Platform empowers business analysts who know SQL to leverage the power of MapReduce without having to write scripted queries in Java, Python, Perl or C.

October
Oracle announced plans to launch a Big Data appliance featuring Apache Hadoop, Oracle NoSQL Database Enterprise Edition and an open source distribution of R. The company’s announcement of its plans to leverage a NoSQL database represented an abrupt about face of an earlier Oracle position that discredited the significance of NoSQL.
Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Microsoft revealed a strategic partnership with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. Microsoft’s decision not to leverage NoSQL and use instead a Windows based version of Hadoop for SQL Server 2012 constituted the key difference between Microsoft and Oracle’s Big Data platforms.
IBM announced the release of IBM Infosphere BigInsights application for analyzing “Big Data.” The SmartCloud release of IBM’s BigInsights application means that IBM beat competitors Oracle and Microsoft in the race to deploy an enterprise grade, cloud based Big Data analytics platform.

November
•Christophe Bisciglia, founder of Cloudera, the commercial distributor of Apache Hadoop, launched a startup called Odiago that features a Big Data product named WibiData. WibiData manages investigative and operational analytics on “consumer internet data” such as website traffic on traditional and mobile computing devices.
Cloudera announced a partnership with NetApp, the storage and data management vendor. The partnership revealed the release of the NetApp Open Solution for Hadoop, a preconfigured Hadoop cluster that combines Cloudera’s Apache Hadoop (CDH) and Cloudera Enterprise with NetApp’s RAID architecture.
•Big Data player Karmasphere announced plans to join the Hortonworks Technology Partner Program today. The partnership enables Karmasphere to offer its Big Data intelligence product Karmasphere Analytics on the Apache Hadoop software infrastructure that undergirds the Hortonworks Data Platform.
Informatica released the world’s first Hadoop parser. Informatica HParser operates on virtually all versions of Apache Hadoop and specializes in transforming unstructured data into a structured format within a Hadoop installation.
MarkLogic announced support for Hadoop, the Apache open source software framework for analyzing Big Data with the release of MarkLogic 5.
HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, a Next Generation Information Platform that integrates two of its 2011 acquisitions, Vertica and Autonomy. Autonomy IDOL 10 features Autonomy’s capabilities for processing unstructured data, Vertica’s ability to rapidly process large-scale structured data sets, a NoSQL interface for loading and analyzing structured and unstructured data and solutions dedicated to the Data, Social Media, Risk Management, Cloud and Mobility verticals.

December
EMC announced the release of its Greenplum Unified Analytics Platform (UAP). The EMC Greenplum UAP contains the The EMC Greenplum platform for the analysis of structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and EMC Greenplum Chorus, a collaboration and productivity software tool that enables social networking amongst constituents in an organization that are leveraging Big Data.

The widespread adoption of Hadoop punctuated the Big Data story of the year so far. Hadoop featured in almost every Big Data story of the year, from Oracle to Microsoft to HP and EMC, while NoSQL came in a close second. Going into 2012, one of the key questions for the Big Data space concerns the ability of OpenStack to support Hadoop, NoSQL, MapReduce and other Big Data technologies. The other key question for Big Data hinges on the user friendliness of Big Data applications for business analysts in addition to programmers. EMC’s Greenplum Chorus, for example, democratizes access to its platform via a user interface that promotes collaboration amongst multiple constituents in an organization by transforming questions into structured queries. Similarly, the Teradata Aster MapReduce Platform allows business analysts to make use of its MapReduce technology by using SQL. That said, as Hadoop becomes more and more mainstream, the tech startup and data intensive spaces are likely to witness a greater number of data analysts trained in Apache Hadoop in conjunction with efforts by vendors to render Hadoop more accessible to programmers and non-programmers alike.

Cloudant names Derek Schoettle As CEO

Cloudant named Derek Schoettle as its new CEO. Prior to Cloudant, Schoettle was VP of Sales at the HP acquisition Vertica. Cloudant provides a data management platform for the analysis of multi-petabyte, “Big Data” sets. Its BigCouch data management platform leverages the open source, NoSQL Apache CouchDB technology either via a “database as a service” through a public cloud such as Amazon EC2 or Rackspace, or a licensed offering for a private cloud. Cloudant made headlines in October when it reached a deal with agriculture company Monsanto to target genetic pathways that result in increased yield and tolerance of stress in corn, soy and other crops. Cloudant’s Big Data platform plans to house and run analytics on Monsanto’s growing body of data in order to accelerate genomic sequencing analysis of crops. CouchDB, Cloudant’s underlying technology, is a NoSQL data storage platform commercially distributed by Couchbase in addition to Cloudant. Late last week, Cloudant announced it raised $2.1 million in an equity funding filed with the Securities and Exchange Commission.