On June 5, Big Data analytics vendor DataTorrent announced that it will be open sourcing the DataTorrent RTS platform under an Apache 2.0 license as Project Apex. In addition, DataTorrent released DataTorrent RTS 3 marked by advanced data visualization and graphical user interface-based data analysis capabilities. DataTorrent RTS 3 contains a free, Community Edition in addition to a Standard Edition as well as an Enterprise Edition. Project Apex constitutes a highly significant moment in the history of big analytics platforms because it represents the first open source platform for streaming and batch big data, alike. By open sourcing Project Apex, DataTorrent positions itself within the same open source landscape as Apache Spark and Apache Storm and thereby stands to differentiate itself by way of its production-grade ability to operate on Hadoop as well as its rich analytics GUI and data visualization platform, available within its paid editions. Moreover, the decision to open source the DataTorrent core engine promises to render the DataTorrent platform more accessible to more developers and organizations that are interested in the platform’s in-memory performance and scalability alongside the ease of its visual interface for deriving actionable business intelligence from massive amounts of real-time streaming and batch Big Data.
DataTorrent today announces the finalization of $15M in Series B funding. The funding round is led by Singtel Innov8, with additional participation from GE Ventures and Series A investors August Capital, AME Cloud Ventures and Morado Venture Partners. DataTorrent’s platform provides an infrastructure for processing, storing and running analytics on streaming big data sets. The platform can ingest and analyze massive amounts of data by using over 75 connectors as well as 400 Java operators that allow data scientists to perform advanced analytics on multiple datasets in parallel. DataTorrent differentiates itself architecturally by performing in-memory processing that runs directly on Hadoop without the overhead that results from scheduled batches of Hadoop data for processing. The platform boasts massive scalability at sub-second latency while maintaining the capability to process batch and streaming datasets alike. Use cases for DataTorrent include internet of things analytics as well as web-analytics that push the limits of the platform’s ability to scale and ingest massive amounts of data. Today’s capital raise brings the total funding raised by DataTorrent to $23.8M. Building on its recent distinction as a Gartner Cool Vendor, DataTorrent stands to consolidate its early traction in the heavily contested Big Data analytics space with today’s infusion of capital and the guidance brought to its team by Innov8 Managing Director Jeff Karras, who joins DataTorrent’s board of directors as a result of the finalization of the Series B funding round.
Cloud Computing Today recently spoke to John Fanelli, DataTorrent’s VP of Marketing, about Big Data, real-time analytics on Hadoop, DataTorrent RTS 2.0 and the challenges specific to performing analytics on streaming Big Data sets. Fanelli commented on the market reception of DataTorrent’s flagship product DataTorrent RTS 2.0 and the mainstream adoption of Big Data technologies.
1. Cloud Computing Today: Tell us about the market landscape for real-time analytics on streaming Big Data and describe DataTorrent’s positioning within that landscape. How do you see the market for real-time analytics evolving?
John Fanelli (DataTorrent): Data is being generated today in not only unprecedented volume and variety, but also velocity. Human created data is being surpassed by automatically generated data (sensor data, mobile devices and transaction data for example) at a very rapid pace. The term we use for this is fast big data. Fast big data can provide companies with valuable business insight, but only if they act on them immediately. If they don’t, the business value declines as the data ages.
As a result of this business opportunity, streaming analytics is rapidly becoming the norm as enterprises rush to deliver differentiated offerings to generate revenue or create operational automated efficiencies to save cost. But it’s not just fast big data alone; it’s big data in general. Organizations have plenty of big data already in their Enterprise Data Warehouse (EDW) that is used to enrich and provide greater context to fast big data. Some examples of data that drives business decisions include customer information, location and purchase history.
DataTorrent is leading the way in meeting customer requirements in this market by providing extremely scalable ingestion of data from many sources at different rates (“data in motion” and “data at rest”), combined with fault tolerant, high performing analytics; flexible Java-based action and alerting, delivered in an easy to use and operate product offering, DataTorrent RTS.
The market will continue to evolve toward making analytics easier to use across the enterprise (think non-IT users), cloud-based deployments and even pre-built blueprints for “enterprise configurable” applications.
2. Cloud Computing Today: How would you describe the reception of DataTorrent RTS 2.0? What do customers like most about the product?
John Fanelli (DataTorrent):Customer feedback DataTorrent RTS 2.0 has been phenomenal. There are many aspects of the product that are getting rave reviews. I have to call out that developers have reacted very positively to the Hadoop Distributed Hast Table (HDHT) feature as it provides them with a distributed, fault-tolerant “application scratchpad,” that doesn’t require any external technology or databases. Of course, the marquee features that have the data scientist community abuzz are Project DaVinci (visual streaming application builder) and Project Michelangelo (visual data dashboard). Both enable quick experimentation over real-time data and will emerge from Private Beta over the coming months.
3. Cloud Computing Today: How would you describe the differentiation of DataTorrent RTS from Apache Spark and Apache Storm?
John Fanelli (DataTorrent):DataTorrent provides a complete enterprise-grade solution, not just an event-streaming platform. DataTorrent RTS includes an enterprise-grade platform, a broad set of pre-built operators and visual development and visualization tools. Enterprises are looking for what DataTorrent calls a SHARPS platform. SHARPS is an acronym for Scalability, Highly Availability, Performance and Security. In each of the SHARPS categories, DataTorrent RTS is superior.
4. Cloud Computing Today: What challenges do you foresee for Big Data achieving mainstream adoption in 2015?
John Fanelli (DataTorrent): Fast big data is gaining momentum! Every day I speak with customers and prospects about their fast big data, the use-case requirements and the projected business impact. The biggest challenge they share with me is that they are looking to move faster than they are able due to existing projects and technical skills on their team. DataTorrent RTS’ ease of use and operator libraries supports almost any input/output source/sink and provides pre-built analytics modules to address those challenges.
DataTorrent recently announced the availability of DataTorrent Real-Time Streaming (RTS) 2.0, which builds on its June release of the 1.0 version of by providing enhanced capabilities to run real-time analytics on streaming Big data sets. DataTorrent RTS 2.0 boasts the ability to ingest data from “any source, any scale and any location” by means of over 75 connectors that allow the platform to ingest varieties of structured and unstructured data. In addition, this release delivers over 450 Java operators that allow data scientists to perform queries and advanced analytics on Big datasets including predictive analytics, statistical analysis and pattern recognition. In a phone interview with John Fanelli, DataTorrent’s VP of Marketing, Cloud Computing Today learned that the platform has begun work on a Private Beta of a product, codenamed Project DaVinci, to streamline the design of applications via a visual interface that allows data scientists to graphically select data sources, analytic operators and their inter-relationship as depicted below:
As the graphic illustrates, DataTorrent Project DaVinci (Private Beta) delivers a unique visual interface for the design of applications that leverage Hadoop-based datasets. Data scientists can take advantage of DataTorrent’s 450+ Java operators and the platform’s advanced analytics functionality to create and debug applications that utilize distributed datasets and streaming Big data. Meanwhile, DataTorrent RTS 2.0 also boasts the ability to store massive amounts of data in a “HDFS based distributed hash table” that facilitates rapid lookups of data for analytic purposes. With version 2.0, DataTorrent continues to disrupt the real-time, Big data analytics space by delivering a platform capable of ingesting data at any scale and running real-time analytics in the broader context of a seductive visual interface for creating Big data analytics applications. DataTorrent competes in the hotly contested real-time Big data analytics space alongside technologies such as Apache Spark, but delivers a range of functionality that supersedes Spark Streaming as illustrated by its application design, advanced analytics and flexible data ingestion capabilities.
DataTorrent recently announced the general availability of DataTorrent Real-Time Streaming, a platform that delivers real-time analytic capabilities on Apache Hadoop that allow users to obtain actionable business intelligence from streams of Hadoop data. DataTorrent Real-Time Streaming boasts the ability to run analytics on streams of Hadoop data at volumes of over 1 billion events per second by using in-memory processing with low to zero latency. Whereas comparable technologies such as Spark Streaming from Apache Spark split a stream of Hadoop data into segments and performs in-memory processing, DataTorrent Real-Time Streaming operates directly on Hadoop containers without scheduling batches of Hadoop streams for processing. By avoiding the scheduling overhead associated with processing “mini-batches” of Hadoop data, DataTorrent claims operational efficiencies that allow it to process more Hadoop events with sub-second latency than competing products.
Phu Hoang, co-founder and CEO, DataTorrent, remarked on the innovation enabled by DataTorrent Real-Time Streaming as follows:
Hadoop has made big data analytics a reality; however, the true value of big data is unlocked when it can be acted upon in real-time. DataTorrent Real-Time Streaming is designed specifically to address this need for the enterprise. Through the advances provided by Hadoop 2.0, we are proud to raise the bar on real-time analytics to offer the industry’s first true real-time data ingestion and analysis platform at scale.
Designed specifically for Hadoop 2.0 and the enhancements enabled by YARN, DataTorrent RTS also boasts the ability to perform complex, high performance computation on streaming Hadoop data with high availability. Certified to work with Hadoop distributions from Cloudera, Hortonworks and MapR, DataTorrent RTS represents a commercial product that plays in the increasingly hot space constituted by products intended for real-time analytics on streaming Big Data alongside the likes of Apache Storm, Apache Spark and Amazon Kinesis. Questions of performance aside, one of the keys to DataTorrent’s success will be its ease of implementation and ability to simplify and streamline the derivation of meaningful analytics from streaming Hadoop data. To date, the Santa Clara-based company has raised $8M in Series A funding in a round led by August Capital.