Neo4j 3.0 Enhances Developer Productivity While Delivering Massive Scalability And Deployment Flexibility

This week, Neo4j announced the release of Neo4j 3.0, a watershed release that focuses on empowering developers to build graph-based applications faster and more effectively. Notably, this release features details of Bolt, a binary protocol that delivers higher throughput and lower latency with respect to access to the graph database. In addition, Neo4j 3.0 announces language drivers for Java, .NET, JavaScript and Python that interact with the Neo4j database in collaboration with the Bolt connectivity protocol. Using Bolt-based language drivers, Neo4j developers can write code in Java, .NET, JavaScript and Python on the Neo4j platform in ways that approximate the structure of the original syntax, thereby empowering developers to write applications in the coding languages with which they are deeply familiar. Neo4j 3.0 also inaugurates the capability to build Java Stored Procedures that enable developers to store and execute complex assemblages of code on the Neo4j database. Java Stored Procedures can be written in any JVM language and interact with the Neo4j database by means of the Bolt binary protocol. The combination of the release of Bolt, language drivers and stored procedures functionality means that Neo4j developers now have an enhanced range of development options for creating graph-based applications at scale. This release also announces the availability of Neo4j Sync, a cloud platform for the Neo4j Browser that synchronizes and stores developer settings and scripts in ways that give developers increased access to scripts as they move from one database or platform to another as shown below:

neo 4j 2

Neo4j’s browser sync also gives developers streamlined access to their library of Cypher queries. Moreover, Neo4j 3.0 delivers the ability to deploy graphs to any cloud environments, containers or on-premise deployments. With the release of Neo4j’s “redesigned data store” architecture, developers can now leverage the platform’s enhanced developer experience functionality to develop applications that scale while nevertheless preserving performance. Overall, the release delivers significant developer-oriented functionality that renders it easier to build, deploy and manage graph-based database applications at scale. In particular, the release of Bolt, language drivers for Java, .NET, Javascript and Python and Java Stored Procedures, in conjunction with Neo4j Sync, mean that developers now have an enriched set of tools for rapid development on the Neo4j platform that variously allows them to re-use their scripts and settings where possible in a scale-out, high performance development environment.


Neo Technology Announces Neo4j 2.3 Marked By Ability To Manage Intelligent Applications At Scale As Neo Technology Partners With IBM And Open Sources Cypher

Graph database leader Neo Technology today announced the availability of Neo4j 2.3, a partnership with IBM as well as the open sourcing of Cypher, its query language for graphs. Neo4j 2.3 features enhanced abilities to create massive graphs for rapidly scaling, intelligent applications that automate the application of business rules to real-time updates to data from disparate sources. The latest release supports the scale-out of the implementation of intelligent rules that enrich data relationships amongst application-specific entities. Neo4j 2.3’s improved ability to manage applications at scale features enhanced capabilities to develop queries in conjunction with improved Cypher performance and a more intelligent query planner. In addition to intelligent management of rapidly scaling applications, this release delivers expanded schema and metadata functionality that allows customers to more effectively manage and perform analytic operations on data. Neo4j 2.3 also features an integration with Spring Data, a slew of improvements to the Cypher query language and support for Docker.

In conjunction with the release of Neo4j 2.3, Neo4j also announces a partnership with IBM to render Neo4j available on IBM POWER8. The partnership features the deployment of Neo4j on a massive in-memory platform that can expediently support use cases that include internet of things data, supply chain or fraud-related analytics and updates to billions of data points from sources spanning the globe via real-time data ingestion. As noted in the press release, “IBM Power Systems can provide up to 56 terabytes of extended memory space with CAPI flash architecture on a single machine,” thereby rendering possible the creation of graphs of a magnitude and scale not seen to date. The IBM POWER 8 allows customers to not only create massive graphs and graphical relationships between data, but to also act upon the insights delivered by those graphs in near real-time, thereby minimizing the time lag between the development of actionable business intelligence and the execution of proactive responses to data-driven events and insights. In yet another announcement, Neo4j will be open sourcing Cypher, its query language for graphs, as openCypher, a project that stands to revolutionize graph analytics in much the same way as SQL did for relational databases several decades ago. openCypher boasts an impressive roster of initial supporters that include Oracle, Databricks, Tableau, GraphAware, GrapheneDB, Graph Story and Information Analysis Incorporated (IAI). Ion Stoica, CEO of Databricks, remarked on the open sourcing of Cypher as follows:

Graph processing is becoming an indispensable part of the modern big data stack. Neo4j’s Cypher query language has greatly accelerated graph database adoption. We look forward to bringing Cypher’s graph pattern matching capabilities into the Spark stack, making graph querying more accessible to the masses.

As Stoica notes, Databricks has plans to integrate Cypher’s functionality into the Spark stack as part of the larger project of creating an integrated set of big data tools and applications. The interest had by Databricks in integrating Cypher into the Spark portfolio underscores the value of the query language developed by Neo4j and illustrates the significance of Neo4j’s graphing technology more generally for contemporary big data analytics. As such, the release of Neo4j 2.3, its partnership with IBM and the open sourcing of its query language Cypher marks a milestone in Neo4j’s evolution as it emphatically asserts its centrality to the big data revolution and demonstrates enhanced abilities to manage massive graphs and the automation that allows their applications to scale. The screenshot below illustrates Neo4j 2.3’s user interface for understanding graph-based data:

Neo Technology Raises $20M In Series C Funding For Its Neo4j Graph Database Technology

Neo Technology today announced the finalization of $20M in Series C funding. Today’s Series C funding raise was led by Creandum with additional participation from Dawn Capital. Existing investors Fidelity Growth Partners Europe, Sunstone Capital and Conor Venture Partners all participated in the round. The funding will be used to expand sales operations, enhance product development and build the open source community supporting the Neo4j platform and its attendant partner ecosystem. The funding comes hot on the heels of a year of explosive growth for Neo Technologies and its vendor-led open source graph database, Neo4j. Neo Technology’s CEO and co-founder Emil Eifrem remarked on the company’s growth as follows:

There are two strong forces propelling our growth: one is the overall market’s increasing adoption of graph databases in the enterprise. The other is proven market validation of Neo4j to support mission-critical operational applications across a wide range of industries and functions.

Eifrem notes how Neo Technology’s growth has been fueled by increasing enterprise-wide adoption of graph databases in conjunction with Neo4j’s consistent demonstration of its ability to support a variety of production-grade environments. In a phone interview with Cloud Computing Today, Eifrem further remarked how one of the challenges for Neo Technology consists of developing an incisive sales outreach strategy given that almost every enterprise could benefit from the adoption of graphing technologies. Eifrem elaborated that Neo Technology has chosen to tackle the challenge of prioritizing its sales outreach efforts by focusing on use cases that include data-driven recommendations (in e-commerce and social networking, for example), master data management, identity and access management, graph based search, network and IT operations, the internet of things and pricing, while nevertheless remaining open to other client requests and interests. Since the launch of Neo4j 2.0 last January, Neo4j has experienced over 500,000 downloads and boasts thousands of enterprise-grade deployments featuring organizations such as Walmart, eBay, Earthlink, CenturyLink, Pitney Bowes and Cisco. Based on its impressive record in 2014 and the explosive proliferation of use cases for graphing technology, 2015 could well represent an inflection point for Neo Technologies as it consolidates its leadership in the graph database space by using its additional funding to gain more market traction while continuing to educate the industry on the value proposition of adopting Neo4j.

Neo Technology Announces Release Of Neo4j 2.1 With Enhanced ETL Functionality

This week, Neo Technology announces the release in general availability of Neo4j 2.1, the graph database that powers graph technology for companies such as eBay, Walmart, HP and National Geographic. Featuring pre-built ETL technology that facilitates the transformation of SQL or relationally-structured data into the Neo4j graph database technology platform, version 2.1 makes it even easier for enterprises to both transition from RDBMS systems to graph technologies as well as to augment existing Neo4j deployments. Version 2.1 features advanced functionality for mapping structured data from csv files into Neo4j with concomitant increases of speed up to a factor of 100. Emil Eifrem, CEO of Neo Technology, remarked on the innovation specific to Neo4j 2.1 as follows:

Neo4j 2.1 represents a major step forward in lowering the bar to graph database adoption for organizations who have massive amounts of data in their relational databases…While Neo4j is already renowned for its ease, scalability, and speed, the new built-in ETL capabilities enable the same ease and speed when moving data from an RDBMS into a graph. This will make it easier than ever for organizations to unlock the hidden value of their data, by leveraging the connections.

Neo4j competes with the likes of Titan, OrientDB, VelocityGraph, Apache Giraph and an increasing number of proprietary graph databases built by startups intent on preserving their intellectual property as part of their product development strategy. This week’s release consolidates Neo4j’s position as the industry’s most popular graph database technology by rendering it easier to transform SQL-based data into its platform, thereby streamlining the process of the production of graph databases based on incoming batches and streams of relational data. Forrester Research estimates that at least 25% of enterprises will have adopted a graph database by 2017.

Neo4j Adopted By Retail Giants eBay and Walmart For Real-Time, E-commerce Analytics

Neo Technology recently announced that retail giants such as eBay and Walmart are using graph database Neo4j in production-grade applications that improve their operations and marketing analytics. In a recently published case study, Neo Technology revealed how eBay’s e-commerce technology platform acquisition, Shutl, leverages Neo4j to expedite delivery to the point where customers can enjoy same day delivery in select cases. Shutl constitutes the technology platform that undergirds eBay Now, a service that delivers products in 1-2 hours from local stores by means of relationships between couriers and stores. eBay decided to make the transition from MySQL to Neo4j because:

Its previous MySQL solution was too slow and complex to maintain, and the queries used to calculate the best route additionally took too long. The eBay development team knew that a graph database could be added to the existing SOA and services structure to solve the performance and scalability challenges. The team turned to Neo4j as the best possible solution on the market.

According to Volker Pacher, Senior Developer at eBay, eBay found that Neo4j enabled dramatic improvements in its computational and querying ability:

We found Neo4j to be literally thousands of times faster than our prior MySQL solution, with queries that require 10-100 times less code. Today, Neo4j provides eBay with functionality that was previously impossible.

eBay’s current ecommerce technology platform leverages Ruby, Sinatra, MongoDB, and Neo4j. Importantly, queries “remain localized to their respective portions on the graph” in order to ensure scalability and performance. Walmart, meanwhile, uses Neo4j to understand the online habits of its shoppers in order to deliver more relevant real-time product recommendations for their online shoppers. Neo4j’s adoption by eBay and Walmart symptomatically illustrates how graph databases are disrupting the nature of real-time analytics, a trend further underscored by Pivotal HD 2.0’s integration of GraphLab into its offerings, and the use of graphing technologies by startups such as Aorato.

Iterative Computation Between Vertices In Pregel and Apache Giraph

As a follow-up to our post on Facebook’s use of Apache Giraph, I wanted to return to Pregel, the graphing technology on which Giraph was based. Alongside, MapReduce, Pregel is used by Google to mine relationships between richly associative data sets in which the data points have multi-valent, highly dynamic relationships that morph, proliferate, aggregate, disperse, emerge and vanish with a velocity that renders any schema-based data model untenable. In a well known blog post, Grzegorz Czajkowski of Google’s Systems Infrastructure Team elaborated on the importance of graph theory and Pregel’s structure as follows:

Despite differences in structure and origin, many graphs out there have two things in common: each of them keeps growing in size, and there is a seemingly endless number of facts and details people would like to know about each one. Take, for example, geographic locations. A relatively simple analysis of a standard map (a graph!) can provide the shortest route between two cities. But progressively more sophisticated analysis could be applied to richer information such as speed limits, expected traffic jams, roadworks and even weather conditions. In addition to the shortest route, measured as sheer distance, you could learn about the most scenic route, or the most fuel-efficient one, or the one which has the most rest areas. All these options, and more, can all be extracted from the graph and made useful — provided you have the right tools and inputs. The web graph is similar. The web contains billions of documents, and that number increases daily. To help you find what you need from that vast amount of information, Google extracts more than 200 signals from the web graph, ranging from the language of a webpage to the number and quality of other pages pointing to it.

In order to achieve that, we have created scalable infrastructure, named Pregel, to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges’ states, and mutate the graph’s topology (experts in parallel processing will recognize that the Bulk Synchronous Parallel Model inspired Pregel).

The key point worth noting here is that Pregel computation is marked by a “sequence of iterations” whereby the relationship between vertices is iteratively refined and recalibrated with each computation. In other words, Pregel computation begins with an input step, followed by a series of supersteps that successively lead to the algorithm’s termination and finally, an output. During each of the supersteps, the vertices send and receive messages to other vertices in parallel. The algorithm terminates when the vertices collectively stop transmitting messages to each other, or, to put things in another lexicon, vote to halt. As Malewizc, Czajkowsk note in a paper on Pregel, “The algorithm as a whole terminates when all vertices are simultaneously inactive and there are no messages in transit.” Like Pregel, Apache Giraph uses a computation structure whereby computation proceeds iteratively until the relationships between vertices in a graph stabilize.

Facebook Leverages Enhanced Apache Giraph To Create 1 Trillion Edge Social Graph

This week, Facebook revealed details of the technology used to power its recently released Graph search functionality. Graph databases are used to analyze relationships between associative data such as social networks, transportation data, search-related content recommendations, fraud detection, molecular biology and, more generally, any data set where the relationships between constituent data points are so numerous and dynamic that they cannot easily be captured within a manageable schema or relational database structure. Graph databases contain “nodes” or “vertices” and “edges” that indicate relationships between the different vertices/nodes.

Facebook intensified their review of graphing technologies in the summer of 2012 and selected Apache Giraph over Apache Hive and GraphLab. Facebook selected Giraph because it interfaces directly with its own version of the Hadoop Distributed File System (HDFS), allows usage of its MapReduce Corona infrastructure, supports a variety of graph application use cases, and features functionality such as master computation and composable computation. Compared to Apache Hive and GraphLab, Apache Giraph was faster. After choosing a graphing platform, the Facebook team modified the Apache Giraph code and subsequently shared their changes with the Apache Giraph open source community.

One of the use cases that Facebook leveraged in order to select Apach Giraph was its performance in a “label propagation” exercise where it probabilistically inferred data fields that are blank or unintelligible in comparison to Apache Hive and GraphLab. Many Facebook users, for example, may elect to leave their hometown or employer blank, but graphing algorithms can probabilistically assign values for the blank fields by analyzing data about a user’s friends, family, likes and online behavior. By empowering data scientists to construct more complete profiles of users, graph technology enables enhanced personalization of data such as a user’s news feed and advertising content. Facebook performed the “label propagation” comparison of Giraph, Hive and GraphLab on a relatively small scale on the order of 25 million edges.

Key attributes of Apache Giraph and its usage by Facebook include the following:

•Apache Giraph is based on Google’s Pregel and Leslie Valiant’s bulk synchronous parallel computing model
•Apache Giraph is written in Java and runs as a MapReduce job
•Facebook chose the production use cases of “label propagation, variants of page rank, and k-means clustering” in order to drive their modification of Apache Giraph code
•Facebook created a 1 trillion edge social graph using 200 commodity machines, in less than four minutes
•Facebook’s creation of 1 trillion edges is roughly two orders of magnitude greater than Twitter’s graph of 1.5 billion edges and AltaVista’s 6.5 billion edges

Facebook performed a number of tweaks on the Apache Giraph code including modification of the input model for data in Giraph, streamlined reading of Hive data, multithreading application code, memory optimization and the use of Netty instead of Zookeeper to enhance scalability. Facebook’s Social Graph was launched in January, although its platform is not nearly as powerful as end users might hope for, as of yet. Open Graph, meanwhile, is used by Facebook developers to correlate real-world actions with objects in their database, such as User X is viewing soccer match Y on network Z. The latest and greatest vesion of Giraph’s code is now available under version 1.0.0 of the project. This week’s elaboration on Facebook’s contribution to Giraph by Avery Ching’s blog post represents one of the first attempts to render mainstream the challenges specific to creating and managing a trillion edge social graph. In response, the industry should expect analogous disclosures about graphing technology from the likes of Google, Twitter and others in subsequent months.