Big Data

Q&A With Dave McCrory, CTO of Basho Technologies, Regarding Riak, Riak CS and the NoSQL Landscape

Cloud Computing Today recently had the privilege of speaking with Dave McCrory, CTO of Basho Technologies, about the NoSQL space and Basho’s competitive differentiation within the NoSQL landscape. McCrory elaborated on Basho’s Riak “open source, distributed database” by noting its high availability, scalability and ability to handle any type of data as follows:

Cloud Computing Today: How do you envision the NoSQL space? What are your high level impressions of the competitive landscape amongst NoSQL vendors?

Dave McCrory (Basho Technologies): The NoSQL industry has many players for various use cases, but overall it is still young, especially from the enterprise point of view. I’ve been involved in big data for quite some time, and as data continues to grow, the NoSQL industry will grow with it. As the early adopters begin to move to the early majority – we are positioned in that space for crossing that chasm. Looking at how people want to build applications and data we will see, as an industry, in the next few years nearly half of enterprises will embrace NoSQL technologies to deal with the problems that traditional databases cannot deal with. Other NoSQL providers like MongoDB have an amazing presence in the market as it has made it easy for developers to give it a try. From my understanding from the market view, at the same time, it is limited in the actual applications that can be used. With so many companies offering NoSQL solutions for specific use cases and the high demand for data management, I can only see the industry continuing to expand and thrive.

Cloud Computing Today: Where do you see Basho within the larger NoSQL space at present?

Dave McCrory (Basho Technologies): We’re looking to provide the strongest key value solution and object store we can – that’s our priority right now. Although we at Basho are still a fairly young company, I think our technology speaks for itself. Since starting at Basho in the spring, I’ve been able to work with the outstanding Basho engineers and I’m amazed by what they have accomplished. Riak and Riak CS use simplified administrative features and a key/value system which enable anyone with command line experience to build a cluster in less than 15 minutes. I believe that Riak’s simplicity and usability are what separates it from other companies in the NoSQL space.

Some of that usability is our differentiation expressed in terms of high availability, fault tolerance and the ability to scale well beyond many of our competitors.

Cloud Computing Today: What are the key differentiators of Riak? What does Basho have planned for Riak in subsequent releases in the near future?

Dave McCrory (Basho Technologies): Riak’s key differentiators are its ability to offer high availability, massive scale and a variety of data types. Since Riak stores data as binary it is able to handle any type of data, unlike other solutions. Its top features include operational ease at large scales, always-on availability, and the ability to add and remove nodes easily and quickly as needed.

We are unique in that we have built object storage on our foundation and offer both key value and object store from the same platform. We have a thriving community, but our go to market in very focused on the enterprise. That has resulted in almost 200 enterprise customers including a third of the Fortune 50.

We have a lot planned for Basho and Riak in the coming months. We recently launched Riak CS 1.5 which offers additional Amazon S3 compatibility, performance improvement in garbage collection processes, and new, simplified administrative features. We are releasing Riak 2.0 in the fall which will provide enhanced search capability, expanded data types and more customer control over consistency, and we are hosting the annual RICON conference in Las Vegas in October, so you’ll be hearing a lot from Basho the rest of the year!

Categories: Basho Technologies, Big Data, NoSQL | Tags: ,

Google’s Mesa Data Warehouse Takes Real Time Big Data Management To Another Level

Google recently announced development of Mesa, a data warehousing platform designed to collect data for its internet advertising business. Mesa delivers a distributed data warehouse that can manage petabytes of data while delivering high availability, scalability and fault tolerance. Mesa is designed to update millions of rows per second, process billions of queries and retrieve trillions of rows per day to support Google’s gargantuan data needs for its flagship search and advertising business. Google elaborated on the company’s business need for a new data warehousing platform by commenting on its evolving data management needs as follows:

Google runs an extensive advertising platform across multiple channels that serves billions of advertisements (or ads) every day to users all over the globe. Detailed information associated with each served ad, such as the targeting criteria, number of impressions and clicks, etc. are recorded and processed in real time…Advertisers gain fine-grained insights into their advertising campaign performance by interacting with a sophisticated front-end service that issues online and on-demand queries to the underlying data store…The scale and business critical nature of this data result in unique technical and operational challenges for processing, storing and querying.

Google’s advertising platform depends upon real-time data that records updates about advertising impressions and clicks in the larger context of analytics about current and potential advertising campaigns. As such, the data model requires the ability to accommodate atomic updates to advertising components that cascade throughout an entire data repository, consistency and correctness of data across datacenters and over time, the ability to support continuous updates, low latency query performance, scalability as illustrated by the ability to support petabytes of data and data transformation functionality that accommodates changes to data schemas. Mesa utilizes Google products as follows:

Mesa leverages common Google infrastructure and services, such as Colossus, BigTable and MapReduce. To achieve storage scalability and availability, data is horizontally partitioned and replicated. Updates may be applied at granularity of a single table or across many tables. To achieve consistent and repeatable updates, the underlying data is multi-versioned. To achieve update scalability, data updates are batched, assigned a new version number and periodically incorporated into Mesa. To achieve update consistency across multiple data centers, Mesa uses a distributed synchronization protocol based on Paxos.

While Mesa takes advantage of technologies from Colossus, BigTable, MapReduce and Paxos, it delivers a degree of “atomicity” and consistency lacked by its counterparts. In addition, Mesa features “a novel version management system that batches updates to achieve acceptable latencies and high throughput for updates.” All told, Mesa constitutes a disruptive innovation in the Big Data space that extends the attributes of atomicity, consistency, high throughput, low latency and scalability on the scale of trillions of rows toward the end of a “petascale data warehouse.” While speculation proliferates about the possibilities for Google to append Mesa to its Google Compute Engine offering or otherwise open-source it, the key point worth noting is that Mesa represents a qualitative shift with respect to the ability of a Big Data platform to process petabytes of data that experiences real-time flux. Whereas the cloud space is accustomed to seeing Amazon Web Services usher in breathtaking innovation after innovation, time and time again, Mesa conversely underscores Google’s continuing leadership in the Big Data space. Expect to hear more details about Mesa at the Conference on Very Large Data Bases next month in Hangzhou, China.

Categories: Big Data, Google | Tags: , , , , , ,

Metanautix Emerges From Stealth With $7M In Series A Funding For Streamlined Big Data Processing And Analytics

Metanautix emerged from stealth today by announcing the finalization of $7M in Series A funding in a round led by Sequoia Capital. Additional investors include the Stanford University endowment fund and Shiva Shivakumar, former VP of Engineering at Google. Metanautix delivers a Big Data analytics platform composed of a SQL interface for querying Hadoop data in conjunction with data discovery functionality that empowers analysts to more easily navigate massive amounts of structured and unstructured data. The platform focuses on simplifying the data pipeline between data acquisition and the production of data analytics. As such, Metanautix removes the necessity of combining disparate data sources and thereby delivers the benefits of distributed computing alongside the simplicity of SQL. Users of Metanautix can perform analytics in parallel on structured and unstructured data by taking advantage of an interface that allows users to understand the topography of the data that they are navigating. Founded by veterans of Google and Facebook, Metanautix intervenes in the big data analytics space by allowing users to run analytics on multiple streams of Big Data as noted by CEO Theo Vassilakis below:

The modern enterprise operates on a plethora of data sources. There is great value in using all of these data sources and in providing superior access to ask questions of any data. We’ve made it fast and simple for anyone in an organization to work with any number of data sources at any scale and at a speed that enables rapid business decisions.

Vassilakis notes the ability of Metanautix to manage “any number” of datasets “at any scale” toward the larger end of delivering actionable business intelligence from disparate data sources. Given Vassilakis’s background at Google working on Dremel and the experience of Metanautix’s CTO Apostolos Lerios with processing frameworks for billions of photographic images during his tenure at Facebook, the industry can expect Metanautix to deliver a truly multivalent processing and analytics engine capable of managing heterogeneous data sources of all kinds. Expect more details about the platform to emerge in forthcoming months but, based on the experience of its founders, the Big Data space should brace for the entry of a disruptive Big Data analytics and processing engine that can deliver analytics on massive datasets by means of a radically streamlined operational process. That said, Metanautix will need to find its niche quickly in order to outshine competitors such as Pivotal and Infochimps, the former of which recently announced a collaboration with Hortonworks to enhance Apache Ambari.

Categories: Big Data, Hadoop, Metanautix, Venture Capital

Pivotal And Hortonworks Collaborate To Advance Apache Ambari For Hadoop Management

Pivotal and Hortonworks will collaborate to accelerate development of Apache Ambari, the open source framework for provisioning, managing and monitoring Hadoop clusters. Pivotal will dedicate engineers toward advancing the “installation, configuration and management capabilities” of Apache Ambari as part of the larger project of contributing to software that promotes adoption of Apache Hadoop. In a blog post, Pivotal’s Jamie Buckley elaborated on the value of Apache Ambari to the Hadoop ecosystem as follows:

Apache Hadoop projects are central to our efforts to drive the most value for the enterprise. An open source, extensible and vendor neutral application to manage services in a standardized way benefits the entire ecosystem. It increases customer agility and reduces operational costs and can ultimately help drive Hadoop adoption.

Here, Buckley remarks on the way in which Ambari enhances the process of deploying and managing Hadoop by reducing costs and increasing the flexibility of customer choices regarding the operationalization of Hadoop. Meanwhile, Shaun Connolly, VP Strategy at Hortonworks, commented on the significance of Pivotal’s contribution to the Apache Ambari project as follows:

Pivotal has a strong record of contribution to open source and has proven their commitment with projects such as Cloud Foundry, Spring, Redis and more. Collaborating with Hortonworks and others in the Apache Hadoop ecosystem to further invest in Apache Ambari as the standard management tool for Hadoop will be quite powerful. Pivotal’s track record in open source overall and the breadth of skills they bring will go a long way towards helping enterprises be successful, faster, with Hadoop.

Connolly highlights Pivotal’s historical commitment to open source projects such as Cloud Foundry and its track record of success helping enterprises effectively utilize Apache Hadoop. Hortonworks stands to gain from Pivotal’s extraordinary engineering talent and reputation for swiftly releasing production-grade code for Big Data management and analytics applications. Meanwhile, Pivotal benefits from enriching an open source project that both vendors refer to in the context of a “standard” management tool for the Apache Hadoop ecosystem. The real winner, however, is Hortonworks, who now can claims the backing of Pivotal for the open source project Ambari incubated by some of its engineers, but also reaps the benefits of dedicated engineering staff from Pivotal that will almost certainly accelerate the rate of development of Ambari. The only qualification, here, is that Pivotal’s collaboration with Hortonworks is likely to ensure the optimization of Ambari for both the Pivotal HD and Hortonworks distribution, with the ancillary consequence that Ambari may be less suited for other Hadoop distributions such as Cloudera and MapR. Regardless, the collaboration between Hortonworks and Pivotal promises to serve as a huge coup for the Big Data industry at large both with respect to expediting development of Apache Ambari, and constituting a model for collaboration between competitors in the Big Data space that will ultimately enhance Hadoop adoption and effective utilization.

Categories: Big Data, Hadoop, Pivotal | Tags:

Databricks Closes $33M In Series B Funding And Launches Databricks Cloud Powered By Apache Spark

Databricks, the company founded by the team that developed Apache Spark, recently announced the finalization of $33M in Series B funding in a round led by New Enterprise Associates with existing participation from Andreessen Horowitz. The company also revealed plans for commercializing Apache Spark by means of the newly launched Databricks Cloud that simplifies the data pipeline for data storage, ETL processing and thereupon running analytics and data visualizations on cloud-based Big Data. Powered by Apache Spark, the Databricks Cloud leverages Spark’s array of capabilities for operating on Big Data such as its ability to operate on streaming data, perform graph processing, offer SQL on Hadoop as well as its machine learning functionality. The platform aims to deliver a streamlined data pipeline for ingesting, analyzing and visualizing Hadoop-based data in a way that dispels the need to utilize a combination of heterogeneous technologies. Databricks will initially offer the Databricks Cloud on Amazon Web Services but plans to expand its availability to other clouds in subsequent months.

Categories: Big Data, Databricks, Hadoop

Actian Announces “Right To Deploy” Pricing Model Marked By Freedom From Vendor Lock-In For Big Data Analytics

Big data analytics vendor Actian today announced the availability of customer-friendly pricing options that render it easier for customers to take advantage of its analytics platform for Apache Hadoop. Actian’s latest pricing options feature “capacity-based and subscription models” in addition to a Right to Deploy option that confers an expanded range of flexibility regarding deployment options for the Actian Analytics Platform. The Actian Analytics Platform delivers actionable business intelligence and advanced data visualization for Hadoop-based data that takes advantage of the platform’s proprietary predictive analytics algorithms and low latency. Moreover, the Actian Analytics platform’s Hadoop SQL Edition provides a SQL compliant Hadoop analytics platform that allows users to perform data discovery, data profiling and analytics via SQL in contrast to MapReduce. As of today’s announcement, Actian’s Right to Deploy option allows customers unlimited usage of the platform for a period of one, two or three years in addition to the right to use whatever has been deployed, forever. The Right to Deploy choice represents a particularly attractive option for customers that anticipate significant expansions in their business that dictate the need for enhanced infrastructure and application scalability. Moreover, the Right to Deploy option gives customers freedom from vendor lock-in by empowering customers to use their deployments whether they continue to partner with Actian or choose another vendor for their Hadoop analytics needs. Actian’s simplified platform pricing offers some of the greatest flexibility regarding Big Data analytics in the industry, in a red hot space marked by an increasing number of vendors large and small. That said, few vendors have streamlined and simplified the process of operationalizing Big Data analytics in a way that lays out programmatic approaches to obtaining meaningful analytics on Hadoop that vary in conjunction with the specific use case in mind. Expect increasing competition in the Hadoop analytics space to drive more and more vendors to differentiate themselves from the pack, although the main task, for the industry at large, consists of delivering a turnkey solution for big data analytics featuring machine learning-based, best practice recommendations for extracting meaningful analytics from massive, ever increasing amounts of data.

Categories: Actian, Big Data, Hadoop | Tags: ,

Trifacta Partners With Hortonworks To Certify Trifacta Data Transformation Platform On Hortonworks Data Platform

Trifacta today announced that its Trifacta Data Transformation Platform has been certified for use with Hortonworks Data Platform 2.1 (HDP) by means of the Hortonworks Certified Technology Program. The certification ensures the compatibility of the Trifacta Data Transformation Platform with the latest Hortonworks Data Platform and thereby positions Trifacta’s technology to integrate with enterprise-grade deployments of the Hortonworks Hadoop distribution. Today’s announcement further validates the value of the Trifacta Data Transformation Platform as a technology platform that facilitates the derivation of actionable business intelligence from Hadoop by rendering it easier for analysts to visualize and engage with Hadoop-based data in conjunction with machine learning-based suggestions regarding data transformations and analytics. Trifacta’s partnership with Hortonworks builds upon recent news of its $25M Series C raise and the finalization of an analogous collaboration with Hadoop vendor Cloudera. In March, Trifacta announced a partnership with Cloudera that ensures the compatibility of Trifacta’s Data Transformation Platform with the Cloudera Hadoop ecosystem.

Now that Trifacta has inked deals to certify its Data Transformation Platform with the two Hadoop market share leaders, Cloudera and Hortonworks, the Big Data space should expect enterprise deployments of its platform to accelerate as Trifacta solidifies its branding as the de facto platform for the transformation, cleansing and guided exploration of Hadoop-based data. The platform’s value proposition consists in the reduction of time to insight with respect to actionable business intelligence derived from Hadoop-based data, its ability to enhance analyst productivity and to iteratively deliver more nuanced guidance regarding data transformations of interest by means of its machine learning-based technology. Expect Trifacta to continue expanding its range of strategic partnerships in the forthcoming months as it leverages its recent funding to position itself at the forefront of enterprise technologies regarding the effective operationalization of Big Data.

Categories: Big Data, Cloudera, Hadoop, Hortonworks, Trifacta

Create a free website or blog at WordPress.com. The Adventure Journal Theme.