Talend, the open source data integration company, announced the release of version 5.2 of its Open Studio data integration and data management platform on Monday. The release features prominent enhancements such as Hadoop Big data profiling and support for well known NoSQL products in addition to a bevy of other usability, productivity and performance improvements.
Hadoop Big Data Profiling
Talend’s big data profiling functionality demonstrates the ability to “discover and understand data in Hadoop clusters” with a view to:
•Identifying data quality issues such as corrupt, incomplete, duplicate or inconsistent data
•Analyzing data in Hive clusters without extracting it from the Hadoop cluster
•Cleansing, enriching, de-duplicating and creating crosswalks across data sets within the Hadoop cluster itself
Talend version 5.2 features support for NoSQL databases in the form of an initial set of connectors to Cassandra and MongoDB. The product indigenously supports Apache Hadoop and integrates with Hadoop Distributed File System (HDFS), HCatalog, Hive, Oozie, Pig and Sqoop. The support for NoSQL databases complements a total of more than 450 connectors to other products and platforms that are already built into the Talend Open Studio architecture.
Fabrice Bonan, co-founder and Chief Technical Officer of Talend, elaborated on the significance of the new Talend release by noting:
Talend version 5.2 delivers on our vision of simplifying the development, integration and management of big data so that businesses can focus on using that data to make faster and more informed decisions. We provide the most powerful and versatile open source, big data solution to help organizations load, extract and improve disparate data while leveraging the massively parallel processing power of big data technologies including Apache Hadoop and leading NoSQL databases.
According to Bonan, Talend’s 5.2 release delivers on its mission of streamlining big data management while providing solutions to “load, extract and improve disparate data” in conjunction with the “massively parallel processing power” of Hadoop and NoSQL. The underlying vision, as in most Big Data initiatives, is to help organizations make “faster and more informed decisions.”
Talend’s enhancements point to an industry-wide embrace of more sophisticated Hadoop data discovery and cleansing functionality that empowers data scientists to perform more nuanced manipulations of data within a Hadoop cluster, without extraction. Additionally, virtually all big data integration platforms will need to support NoSQL databases such as Cassandra and MongoDB given NoSQL’s rapid uptake by enterprise customers at both a cloud and traditional data center level.
At a product level, however, Talend’s innovations in version 5.2 on the Big data profiling front are geared more toward data scientists than they are to business analysts or business stakeholders that will be consuming the analytical insights themselves. This release focuses on architectural and data processing enhancements while leaving business-focused functionality upgrades such as enhanced data visualization capabilities and dashboards to a forthcoming version.