Category Archives: Talend

Guest Blog Post: “Hybrid Integration for Today’s Digitally Driven Businesses” By Ashwin Viswanath, Director of Cloud Product Marketing, Talend

The following guest blog post was authored by Ashwin Viswanath, Director of Cloud Product Marketing, Talend

While moving to the cloud is becoming a common business decision, many organizations are still trying to figure out the best path to meet their business needs and IT infrastructure goals. The subsequent discussions often result in a hybrid approach where organizations tap the value of both public and private clouds for different services.

Hybrid integration, as it has evolved over the years, plays a very positive role in the creation of a variety of complex value-chain ecosystems supporting numerous digital business initiatives.

The definition of hybrid integration, reflecting the changing nature of these initiatives, has evolved as well. It ranges from: supporting SaaS apps to B2B data sources from partners and suppliers presented as a mashup in a portal; to integrating big data and backend systems to support mobile applications.  As these and other initiatives continue to proliferate, the definition of hybrid integration will continue to morph as well.

 Time for an Upgrade

The rapid evolution of existing initiatives, and an accelerated pace in the creation of new use cases is a clear indication that it’s time to upgrade your hybrid integration capabilities.

For example, imagine that you work in a line-of-business (LOB) that is highly data-driven such as a Customer Success Department.  Access to up-to-the minute information about your customers is critical to your operation. You no longer can wait for the traditional 24-hour IT cycle to obtain operational reports that tell you if your customers need help, details their latest purchases, and tells you which customers are critical to contact.

These days if customers are having bad experiences with your product or service, they will take advantage of other options and churn almost immediately. This is particularly true in today’s subscription-based online environments, which allow a customer to turn off the service and go elsewhere with a few keystrokes.  You need to be able to predict if and when such an event may occur so that you can take corrective action before the customer says “I’ve had enough.”

As a member of our hypothetical Customer Success Department, you also need tools that can analyze a number of different data points gathered from customer support tickets that indicate how often they responded to marketing offers. You even want to read the tweets they have posted about you and your company or be notified immediately if a steady, big ticket customer drops off your radar.

It’s readily apparent that collecting data from all these different data sources, aggregating that data, and deploying the correct filtered business rules and triggers to alert the Customer Success management team represents a lot of work for IT. And the Customer Success Department is only one use case – there are many others such as marketing, finance, and the supply chain – all can benefit from being notified when pre-determined business parameters are not being met. This alert has to be issued before a critical threshold has been reached so the LOB can take corrective action in a timely and decisive manner.

To deploy these hybrid integration services in a world that has become increasingly data-driven and characterized by a 24/7 competitive environment will probably require an upgrade of your data integration service, including all layers of the cloud stack. This kind of intensive rework extends beyond the infrastructure and platform layers, or a few tweaks to your software-as-a-Service (SaaS) applications.

To achieve the kind of insights described in the Customer Success Department applications described above requires a solid cloud-based infrastructure built to support your most demanding Big Data needs. This includes the ability to handle great quantities of streaming data.

The data technologies you select – whether they are NoSQL databases, or some other major real-time big data protocol, must be specifically tailored to work with the large volumes of real time streaming data that are required in today’s competitive environment. This includes a secure cloud-based integration platform that allows your users to connect their applications with one another and transfer data between them. Browser-based graphic development tools are complemented by ready-made integration actions, flow templates, components and connectors that make data integration easy.

Five Phases of Hybrid Integration

In this two-part blog series (part I, part II), I outlined the five phases of hybrid integration, which allow you to easily assess for yourself, which phase of hybrid integration you’re in and how you can move to a more advanced phase. The phase descriptions follow a trajectory – from older, more mature phases to more recent (and potentially disruptive) scenarios. In brief, the five phases are:

  • Phase 1: Replicating SaaS apps to on-premise databases – Companies in this initial stage either need business critical information from their SaaS apps; or they are sending SaaS data to a staging database so that it can be picked up by other on-premises apps
  • Phase 2: Integrating SaaS Apps directly with on-premises apps – Each LOB has their preferred SaaS app. Sales departments use Salesforce, Marketing has Marketo, HR prefers Workday and Finance has NetSuite. These SaaS apps need to connect to a back-office ERP on-premises system such as SAP R/3 or Oracle EBS.  Implementing a pure-play cloud ERP platform is still very much a work in progress, as the blog explains.
  • Phase 3: Hybrid Data Warehousing with the Cloud – As the volume and variety of data grows, you need a strategy to move your data from an on-premises data warehouse to newer, more advanced Big Data resources in the cloud. Choosing the right Big Data protocols and getting underway by at least creating a data lake in the cloud with cloud-based services such as AWS S3 or Azure Blobs, is part of this phase.
  • Phase 4: Real-time analytics on streaming data – It is during this phase that you implement systems that can provide you with the tools to work with real-time streaming data. To realize the full benefit of real-time analytics you need the support of a hybrid integration infrastructure. This infrastructure will vary depending on your use case – for example, supporting software that analyzes weblogs, clickstream data, sensor data, database logs, or social media sentiment.
  • Phase 5: Machine learning for optimized app experience – In the not too distant future, every experience will be delivered as an app through mobile device – that includes enterprise as well as consumer mobile apps. The hybrid integration infrastructure will be architected to provide the ability to discover patterns buried deep within the data using machine learning so the applications can be more responsive to user needs. Advanced algorithms will allow value to be extracted from immense and disparate data sources that go beyond the capabilities of human analysis. For developers, machine learning will apply business-critical analytics to applications to do everything from improving customer experience or providing product recommendations to presenting hyper-personalized content.

Eventually, if current trends continue, everything will move into the cloud, supporting increasing opportunities for converging applications and data integration and redefining yet again what is meant by hybrid integration.

The importance of this transformation was underscored by a recent IDC report. DC expects that cloud IT infrastructure spending will grow at a compound annual growth rate (CAGR) of 15.1 percent and will reach $53.1 billion by 2019 accounting for 46% of the total spending on enterprise IT infrastructure. At the same time, spending on non-cloud IT infrastructure will decline at -1.7 percent CAGR. Spending on public cloud IT infrastructure will grow at a higher rate than spending on private cloud IT infrastructure – at 16.3 percent vs. 13.2 percent CAGR. In 2019, IDC expects service providers will spend $33.6 billion on IT infrastructure for delivering public cloud services, while spending on private cloud IT infrastructure will reach $19.4 billion[1].

Benefits of Hybrid Integration

With advanced hybrid integration in place and cloud integration tools at hand, companies are able to carry out “do it yourself” data integration projects. There is no need for users to constantly go back to the IT department to modify business rules and parameters. They have a turnkey solution that allows them to quickly and easily deal with changing business requirements.  And the end user is far more familiar with those business requirements and knows which technical parameters need to be changed to provide the answers they are looking for.

In addition, these new integration workflows developed by the user community can be leveraged by other business units as well.  This creates an exponential expansion of contextual information between LOBs. IT is freed up to build platforms for future innovation by researching Big Data algorithms and leveraging the cloud infrastructure to maximize these new algorithms, rather than constantly putting patches on old information workflows.

In short, hybrid integration in the cloud offers enormous advantages for companies seeking the best of private and public clouds for scalability, price, control and flexibility.

[1] Worldwide Quarterly Cloud IT Infrastructure Tracker,” IDC, October 2015.  

Talend Releases Version 5.2 With Hadoop Big Data Profiling And NoSQL Integration

Talend, the open source data integration company, announced the release of version 5.2 of its Open Studio data integration and data management platform on Monday. The release features prominent enhancements such as Hadoop Big data profiling and support for well known NoSQL products in addition to a bevy of other usability, productivity and performance improvements.

Hadoop Big Data Profiling

Talend’s big data profiling functionality demonstrates the ability to “discover and understand data in Hadoop clusters” with a view to:

•Identifying data quality issues such as corrupt, incomplete, duplicate or inconsistent data
•Analyzing data in Hive clusters without extracting it from the Hadoop cluster
•Cleansing, enriching, de-duplicating and creating crosswalks across data sets within the Hadoop cluster itself

NoSQL Integration

Talend version 5.2 features support for NoSQL databases in the form of an initial set of connectors to Cassandra and MongoDB. The product indigenously supports Apache Hadoop and integrates with Hadoop Distributed File System (HDFS), HCatalog, Hive, Oozie, Pig and Sqoop. The support for NoSQL databases complements a total of more than 450 connectors to other products and platforms that are already built into the Talend Open Studio architecture.

Fabrice Bonan, co-founder and Chief Technical Officer of Talend, elaborated on the significance of the new Talend release by noting:

Talend version 5.2 delivers on our vision of simplifying the development, integration and management of big data so that businesses can focus on using that data to make faster and more informed decisions. We provide the most powerful and versatile open source, big data solution to help organizations load, extract and improve disparate data while leveraging the massively parallel processing power of big data technologies including Apache Hadoop and leading NoSQL databases.

According to Bonan, Talend’s 5.2 release delivers on its mission of streamlining big data management while providing solutions to “load, extract and improve disparate data” in conjunction with the “massively parallel processing power” of Hadoop and NoSQL. The underlying vision, as in most Big Data initiatives, is to help organizations make “faster and more informed decisions.”

Talend’s enhancements point to an industry-wide embrace of more sophisticated Hadoop data discovery and cleansing functionality that empowers data scientists to perform more nuanced manipulations of data within a Hadoop cluster, without extraction. Additionally, virtually all big data integration platforms will need to support NoSQL databases such as Cassandra and MongoDB given NoSQL’s rapid uptake by enterprise customers at both a cloud and traditional data center level.

At a product level, however, Talend’s innovations in version 5.2 on the Big data profiling front are geared more toward data scientists than they are to business analysts or business stakeholders that will be consuming the analytical insights themselves. This release focuses on architectural and data processing enhancements while leaving business-focused functionality upgrades such as enhanced data visualization capabilities and dashboards to a forthcoming version.