CrateDB 2.0 Adds Clustering Upgrades and SQL Enhancements to Its Database Solution for IoT and Machine Data

On May 16, announced the availability of CrateDB 2.0, an open source SQL database that specializes in IoT and machine data. The innovation of CrateDB consists in leveraging SQL to aggregate and perform real-time analytics on IoT and machine data instead of the NoSQL databases commonly used in the industry for related use cases. CrateDB’s ability to accommodate the ingestion of high velocity streams of data and to perform queries on rapidly changing datasets, with impressive levels of scalability and latency, allows developers to combine their familiarity with SQL alongside a solution specially designed for the unique needs of IoT and machine data applications. CrateDB 2.0 features clustering upgrades that deliver improved query performance by means of faster aggregations and new index structures. In addition, CrateDB 2.0 contains a bevy of SQL enhancements that give developers a greater range of options regarding joins, sub-selects and the renaming and re-indexing of tables. The Enterprise Edition of CrateDB 2.0 offers performance monitoring, enhanced security as well as the ability for end users to create user-defined functions. CrateDB 2.0’s clustering upgrades, SQL enhancements and enterprise-grade security and performance monitoring mark a new milestone in the platform’s evolution that testifies to its readiness to embrace enterprise-grade workloads that include sensor data, GPS data and the industrial internet more generally. Subsequent to news of its general availability in December 2016,’s release of open source and enterprise-grade versions of CrateDB underscores the early traction the platform has received, with over 1.3 million downloads and 50 customers using in production. With the IoT and machine data space gearing up for a rampant proliferation of devices and corresponding datasets in forthcoming years, expect to continue building on its recent momentum, particularly as organizations look for scalable databases that allow organizations to leverage widely available skillsets in SQL.

The graphic below illustrates the platform’s Enterprise Edition user interface for monitoring the performance of clusters gives users real-time visibility into cluster performance with respect to the ingestion and transformation of IoT and machine data:



CrateDB Combines SQL-based Queries And Extreme Scalability For Machine Data Analytics today announces the general availability of CrateDB, an open source SQL-database platform that specializes in storing and analyzing machine data and related applications. CrateDB features a distributed SQL query engine that empowers users to run complex queries in real-time without the diminution of performance specific to “first generation SQL databases”, as noted in a press release. The platform also boasts columnar field caches and enhanced versatility with respect to SQL-based queries on machine data. For example, CrateDB delivers the capability to create outer joins as well as run queries on structured and unstructured data, perform time series analysis and leverage advanced database search functionality. In addition, CrateDB features extreme scalability marked by automated sharding and data redistribution that optimizes data performance and availability in correspondence with the volume of data stored within the platform. Importantly, CrateDB allows organizations to take advantage of SQL-oriented skills and tools to expedite its integration and adoption. As such, the platform represents a SQL-based alternative to NoSQL machine data solutions such as Splunk and Cassandra that empowers organizations to collect and analyze massive volumes of machine data in real-time in conjunction with the platform’s enhanced querying versatility and scalability. Available under an Apache 2.0 license, CrateDB marks the emergence of another key player in the machine data analytics space that promises to disrupt the landscape of machine data analytics platforms, particularly given the nexus of its advanced SQL-based querying functionality and extreme scalability. Organizations with resources versed primarily in SQL will lean toward CrateDB given the richness of its distributed SQL querying engine and ability to query data in real-time without resorting to an ancillary data warehousing option to append to their machine data analytics infrastructure.

Q&A With DBS-H Regarding Its Continuous Big Data Integration Platform For SQL To NoSQL

Cloud Computing Today recently had the privilege of speaking with Amos Shaltiel, CEO and co-founder and Michael Elkin, COO and co-founder of DBS-H, an Israel-based company that specializes in continuous Big Data integration between relational and NoSQL-based data. Topics discussed included the core capabilities of its big data integration platform, typical customer use cases and the role of data enrichment.

Cloud Computing Today: What are the core capabilities of your continuous big data integration platform for integrating SQL data with NoSQL? Is the integration unidirectional or bidirectional? What NoSQL platforms do you support?

DBS-H: DBS-H develops innovative solutions for a continuous data integration between SQL and NoSQL databases. We believe that companies are going to adopt a hybrid model where relational databases such as Oracle, SQL Server, DB2 or MySQL will continue to serve customers alongside new NoSQL engines. The success of Big Data adoption will ultimately rise and fall on how easily information can be accessed by key players in organizations.

The DBS-H solution releases data bottlenecks associated with integrating Big Data with existing SQL data sources, making sure that everyone has access to the data they are looking for transparently and without the need to change existing systems.

Our vision is to make the data integration process simple, intuitive and fully transparent to the customer without a need to hire a highly skilled personnel for expensive maintenance of integration platforms.

Core capabilities of the DBS-H Big Data integration platform are:

1. Continuous data integration between SQL and NoSQL databases. Continuous integration represents a key factor of successful Big Data integration.
2. NoSQL data modeling and linkage to existing relational model. We call it a “playground” where customers can :
a. Link a relational data model to a non-relational structure.
b. Create new data design of NoSQL database
c. Explore “Auto Link” where engine automatically generates 2 options of NoSQL data model based on existing SQL ERD design.
3. Data enrichment – capability that allows to add to each block of data additional information that significantly enriches that data on the target

Currently, we focus on unidirectional integration and avoid some of the conflict resolution scenarios specific to bidirectional continuous data integration. The unidirectional path is from SQL to NoSQL and in the near future we will add the opposite direction of NoSQL to SQL integration. Today, we support Oracle and MongoDB databases and plan to add support for additional database engines such as SQL Server, DB2, MySQL, Couchbase, Cassandra and full integration with Hadoop. We aspire to be the default solution of choice when customers think about data integration across major industry data sources.

Cloud Computing Today: What are the most typical use cases for continuous data integration from SQL to NoSQL?

DBS-H: NoSQL engines offer high performance on relatively low cost and flexible schema model.

Typical use cases of continuous data integration from SQL to NoSQL are driven principally from major NoSQL use cases, such as:

  1. Customer 3600 view – creating and maintaining unified view of a customer from multiple operational systems. Ability to provide consistent customer experience regardless of the channel, capitalize upsell or cross-sell opportunities and deliver better customer service. NoSQL engines provide performance response time required in customer service, scalability and flexible data model. DBS-H solution is an enabler for a “Customer 3600  view” business case by doing transparent and continuous integration from existing SQL based data sources.
  1. User profile management – applications that manage user preferences, authentications and even financial transactions. NoSQL provides high performance, flexible schema model for user preferences, however financial transactions will be usually managed by SQL system. By using DBS-H continuous data integration financial transactions data is found transparently inside NoSQL engines.
  1. Catalog management – applications that manage catalog of products, financial assets, employee or customer data. Modern catalogs often contain user generated data from social networks. NoSQL engines provide excellent capabilities of flexible schema that can be changed on the fly. Catalogs usually aggregate data from different organizational data sources such as online systems, CRM or ERP. DBS-H solution enables transparent and continuous data integration from multiple existing SQL related data sources into new centralized catalog NoSQL based system.

Cloud Computing Today: Do you perform any data enrichment of SQL-data in the process of its integration with NoSQL? If so, what kind of data enrichment does your platform deliver? In the event that customers prefer to leave their data in its original state, without enrichment, can they opt out of the data enrichment process?

DBS-H: The DBS-H solution contains data enrichment capabilities during the data integration process. The main idea of “data enrichment” in our case is to provide a simple way for the customer to add logical information that enriches original data by:

  1. Adding data source identification information, such as: where and when this data has been generated and by whom. This can be used by auditing for example.
  2. Classifying data based on the source. This information can be very useful when customers what to control data access based on different roles and groups inside organization.
  3. Assessing data reliability as low, medium or high. This enrichment is useful for analytic platforms that can make different decisions based on source reliability level.

Customers can create enrichment metrics that can be added to every block of information that goes through the DBS-H integration pipeline. If no enrichment is required then the customer can opt out of the enrichment step.

Couchbase Announces N1QL, SQL-based Query Language For JSON-based NoSQL

This week, Couchbase announced the availability of N1QL (pronounced “nickel”), a “breakthrough query language” that delivers the capabilities of SQL alongside the Couchbase NoSQL database platform. Developers can use N1QL to perform queries on data stored and aggregated within the Couchbase NoSQL platform to facilitate the development of data-driven applications that leverage the data modeling and massive scalability of Couchbase’s JSON-based NoSQL platform. Given the ability of NoSQL to respond to the contemporary need to store massive amounts of data that defies classification into rigidly defined schemas, N1QL gives developers enhanced flexibility regarding the querying of semi-structured and unstructured data. Moreover, N1QL enables organizations to take advantage of the highly mature skillsets of SQL-trained developers in addition to the venerable ecosystem of SQL-compliant tools and products. N1QL conforms to a specification developed by UCSD for a SQL-compliant language that can perform queries on semi-structured data. As such, N1QL stands poised to accelerate NoSQL adoption by empowering developers to bring the familiarity of JOINS and NEST operators to JSON documents. N1QL takes its place within an emerging landscape of SQL-compliant platforms for NoSQL that affirm the enduring supremacy of SQL’s querying ability as well the criticality of developing sophisticated querying functionality for JSON-based NoSQL data stores. Leading technology vendors such as Informatica, Metanautix and Tableau have partnered with Couchbase to develop connectors that take advantage of N1QL’s unique querying functionality. Meanwhile, N1QL represents a key component of what’s new in Couchbase Server 4.0.

MongoDB Reveals Details Of Connector To SQL-Compliant Business Intelligence And Data Visualization Platforms

MongoDB today announced details of a technology that connects MongoDB to business intelligence and data visualization platforms such as Tableau, Business Objects, Cognos and Microsoft Excel. By rendering data stored in MongoDB compatible with SQL-compliant data analysis tools, the connector allows developers to leverage the rich querying ability of SQL to derive actionable business intelligence from MongoDB-based data. MongoDB customers can now directly take advantage of MongoDB’s connector to transform data from MongoDB’s JSON, nested format into the tabular format required of SQL-compliant tools, whereas previously, organizations interested in obtaining business intelligence on MongoDB-based data typically resorted to third party analytics and visualization platforms such as Jaspersoft, Pentaho and Informatica. By giving customers access to a richer, deeper connection between data aggregated in MongoDB and platforms such as Tableau and Business Objects, customers no longer need to consider transforming MongoDB-based data into a relational database prior to performing advanced analytical queries.

At this year’s MongoDB World conference, Tableau and MongoDB leveraged data from the U.S. Federal Aviation Administration to illustrate the likelihood that conference attendees would return home on time. The release of the connector is symptomatic of a broader, industry-wide trend toward deeper integration between NoSQL and SQL as evinced, for example, by the recent integration between Couchbase and Metanautix. Given the contemporary interest in real-time analytics on streaming Big Data, the obvious question raised by the tightened integration between MongoDB and SQL-compliant platforms concerns the degree to which BI platforms such as Tableau will be able to perform real-time queries on streaming data aggregated in MongoDB. Meanwhile, the release of the MongoDB connector illustrates the enduring popularity of SQL as a framework for querying heterogeneous datasets as exemplified by the way in which the convergence of SQL and NoSQL stands to complement the robust ecosystem of SQL on Hadoop platforms such as Lingual, Apache Hive, Pivotal HAWQ and Cloudera Impala.

DataRPM Closes $5.1M In Series A Funding For Natural Language Search Big Data Analytics Platform

DataRPM today announced the finalization of $5.1M in Series A funding in a round led by InterWest Partners. DataRPM specializes in a next generation business intelligence platform that leverages machine learning and artificial intelligence to facilitate the delivery of actionable business intelligence by means of a natural language-based search engine that allows customers to dispense with complex, time consuming data modeling and query production. DataRPM stores customer data within a “distributed computational search index” that enables its platform to apply its natural language query interface to heterogeneous data sources without modeling the data into intricate taxonomic relationships or master data management frameworks. Because DataRPM’s distributed computational search index empowers customers to run queries against different data sources without constructing data schemas that organize the constituent data fields and their relationships, it promises to accelerate the speed with which customers can derive insights from their data. Not only does the platform deliver a natural language interface, but it also performs data visualization of the requisite Google-like searches as illustrated below:

In an interview with Cloud Computing Today, DataRPM CEO Sundeep Sanghavi noted that its natural language search functionality is based on proprietary graphing technology analogous to Apache Giraph and Neo4j. The platform operates on data in relational and non-relational formats, although it currently does not support unstructured data. Available via both a cloud-based and on-premise deployment solution, DataRPM promises to disrupt Big Data analytics and contemporary business intelligence platforms by dispensing with the need for complex, time consuming and expensive data modeling as well as empowering business stakeholders with neither SQL nor scripting skills to analyze data. Today’s funding raise is intended to accelerate the company’s go-to-market strategy and correspondingly support product development in conjunction with the platform’s reception by current and future customers.

DataRPM belongs to the rapidly growing space of products that expedite Big Data analytics on Hadoop clusters as exemplified by the constellation of SQL-like interfaces for querying Hadoop-based data. That said, its natural language query interface represents a genuine innovation in a space dominated by products that render Hadoop accessible to SQL developers and analysts, as opposed to data savvy stakeholders with Google-like querying expertise. Moreover, DataRPM’s natural language search capabilities push the envelope of “next generation business intelligence” even further than contemporaries such as Jaspersoft, Talend and Pentaho, which thus far have focused largely on the transition within the enterprise from reporting to analytics and data discovery. Expect to hear more about DataRPM as the battle to streamline and simplify the derivation of actionable business intelligence from Big Data takes shape within a vendor landscape marked by the proliferation of analytic interfaces for petabyte-scale relational and non-relational databases.