DataRPM Closes $5.1M In Series A Funding For Natural Language Search Big Data Analytics Platform

DataRPM today announced the finalization of $5.1M in Series A funding in a round led by InterWest Partners. DataRPM specializes in a next generation business intelligence platform that leverages machine learning and artificial intelligence to facilitate the delivery of actionable business intelligence by means of a natural language-based search engine that allows customers to dispense with complex, time consuming data modeling and query production. DataRPM stores customer data within a “distributed computational search index” that enables its platform to apply its natural language query interface to heterogeneous data sources without modeling the data into intricate taxonomic relationships or master data management frameworks. Because DataRPM’s distributed computational search index empowers customers to run queries against different data sources without constructing data schemas that organize the constituent data fields and their relationships, it promises to accelerate the speed with which customers can derive insights from their data. Not only does the platform deliver a natural language interface, but it also performs data visualization of the requisite Google-like searches as illustrated below:

In an interview with Cloud Computing Today, DataRPM CEO Sundeep Sanghavi noted that its natural language search functionality is based on proprietary graphing technology analogous to Apache Giraph and Neo4j. The platform operates on data in relational and non-relational formats, although it currently does not support unstructured data. Available via both a cloud-based and on-premise deployment solution, DataRPM promises to disrupt Big Data analytics and contemporary business intelligence platforms by dispensing with the need for complex, time consuming and expensive data modeling as well as empowering business stakeholders with neither SQL nor scripting skills to analyze data. Today’s funding raise is intended to accelerate the company’s go-to-market strategy and correspondingly support product development in conjunction with the platform’s reception by current and future customers.

DataRPM belongs to the rapidly growing space of products that expedite Big Data analytics on Hadoop clusters as exemplified by the constellation of SQL-like interfaces for querying Hadoop-based data. That said, its natural language query interface represents a genuine innovation in a space dominated by products that render Hadoop accessible to SQL developers and analysts, as opposed to data savvy stakeholders with Google-like querying expertise. Moreover, DataRPM’s natural language search capabilities push the envelope of “next generation business intelligence” even further than contemporaries such as Jaspersoft, Talend and Pentaho, which thus far have focused largely on the transition within the enterprise from reporting to analytics and data discovery. Expect to hear more about DataRPM as the battle to streamline and simplify the derivation of actionable business intelligence from Big Data takes shape within a vendor landscape marked by the proliferation of analytic interfaces for petabyte-scale relational and non-relational databases.

Neo Technology Announces Release of Neo4j version 2.0 Graph Database Platform; Notes Use of Neo4j By Zephyr Health

Neo Technology today announced the release in general availability of version 2.0 of its graph database technology platform, Neo4j. The Neo4j graph database platform enables users to find connections between and amongst data points in high velocity and variety datasets “where the relationships between constituent data points are so numerous and dynamic that they cannot easily be captured within a manageable schema or relational database structure. Graph databases contain “nodes” or “vertices” and “edges” that indicate relationships between the different vertices/nodes.” Neo4j 2.0 features the addition of three notable features: (1) labels are now part of the data model and allow data scientists and developers to tag and index data for the purpose of more effectively understanding relationships between datasets; (2) enhancements to Cypher, the declarative query language used for the development of Neo4j graph applications; and (3) an interactive browser and query environment with a visual interface for data discovery.

Today, Neo Technology also announced that Zephyr Health is using Neo4j to power its cloud-based analytics platform:

The Zephyr analytics platform allows pharmaceutical makers, medical device manufacturers, and other health care customers, to discover unique connections across their data that can advance their R&D, clinical trials, and marketing. For instance, Zephyr’s engine helps pharmaceutical companies find the right doctors for a clinical trial by linking private and public data — such as specialty, geography, and clinical trial history.

Zephyr Health chose the Neo4j platform as the basis for its big data analytics environment because of its need to make connections between disparate data sets in real-time, as well as the highly dynamic nature of its datasets about hospitals and physicians. According to Neo Technology’s press release, Neo4j has effectively scaled in conjunction with the exponential growth of Zephyr’s datasets and delivered a solution that allows Zephyr’s business users to “be their own data scientists” by way of its data discovery and interactive browser functionality.

Zephyr Health’s adoption of Neo4j represents just one data point on a larger canvas of enterprise adoption of Neo4j as illustrated below:

The verticals from left to right illustrate Neo4j’s adoption in industries over and beyond verticals that traditionally use graph databases such as social media, online data and transportation. The larger point here is that, graph database technology—whether via Apache Giraph, Neo4j or otherwise—has arrived within the enterprise as a means of managing relationships between richly associative, dynamic, multivalent datasets in ways that enable connections and the inference of probabilistic relationships between nodes within the graph in ways that exceed the analytic capabilities of relational databases. The industry should expect use cases such as Zephyr Health’s elaboration on its use of Neo4j to proliferate as users of graph database technologies becoming increasingly comfortable explaining its business value and significance.

Iterative Computation Between Vertices In Pregel and Apache Giraph

As a follow-up to our post on Facebook’s use of Apache Giraph, I wanted to return to Pregel, the graphing technology on which Giraph was based. Alongside, MapReduce, Pregel is used by Google to mine relationships between richly associative data sets in which the data points have multi-valent, highly dynamic relationships that morph, proliferate, aggregate, disperse, emerge and vanish with a velocity that renders any schema-based data model untenable. In a well known blog post, Grzegorz Czajkowski of Google’s Systems Infrastructure Team elaborated on the importance of graph theory and Pregel’s structure as follows:

Despite differences in structure and origin, many graphs out there have two things in common: each of them keeps growing in size, and there is a seemingly endless number of facts and details people would like to know about each one. Take, for example, geographic locations. A relatively simple analysis of a standard map (a graph!) can provide the shortest route between two cities. But progressively more sophisticated analysis could be applied to richer information such as speed limits, expected traffic jams, roadworks and even weather conditions. In addition to the shortest route, measured as sheer distance, you could learn about the most scenic route, or the most fuel-efficient one, or the one which has the most rest areas. All these options, and more, can all be extracted from the graph and made useful — provided you have the right tools and inputs. The web graph is similar. The web contains billions of documents, and that number increases daily. To help you find what you need from that vast amount of information, Google extracts more than 200 signals from the web graph, ranging from the language of a webpage to the number and quality of other pages pointing to it.

In order to achieve that, we have created scalable infrastructure, named Pregel, to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges’ states, and mutate the graph’s topology (experts in parallel processing will recognize that the Bulk Synchronous Parallel Model inspired Pregel).

The key point worth noting here is that Pregel computation is marked by a “sequence of iterations” whereby the relationship between vertices is iteratively refined and recalibrated with each computation. In other words, Pregel computation begins with an input step, followed by a series of supersteps that successively lead to the algorithm’s termination and finally, an output. During each of the supersteps, the vertices send and receive messages to other vertices in parallel. The algorithm terminates when the vertices collectively stop transmitting messages to each other, or, to put things in another lexicon, vote to halt. As Malewizc, Czajkowsk note in a paper on Pregel, “The algorithm as a whole terminates when all vertices are simultaneously inactive and there are no messages in transit.” Like Pregel, Apache Giraph uses a computation structure whereby computation proceeds iteratively until the relationships between vertices in a graph stabilize.