Informatica Big Data Edition Comes Pre-Installed On Cloudera QuickStart VM And Hortonworks Sandbox

Earlier this month, Informatica announced 60 day free trials of Informatica Big Data Edition for Cloudera QuickStart VM and the Hortonworks Sandbox. The 60 day trial means that the Informatica Big Data Edition will be pre-installed in the sandbox environments of two of the leading Hadoop distributions in the Big Data marketplace today. Developers using the Cloudera QuickStart VM and Hortwonworks Sandbox now have streamlined access to Informatica’s renowned big data cleansing, data integration, master data management and data visualization tools. The code-free, graphical user interface-based Informatica Big Data Edition allows customers to create ETL and data integration workflows as well as take advantage of the hundreds of pre-installed parsers, transformations, connectors and data quality rules for Hadoop data processing and analytics. The Informatica Big Data platform specializes in Hadoop profiling, parsing, cleansing, loading, enrichment, transformation, integration, analysis and visualization and reportedly improves developer productivity five-fold by means of its automation and visual interface built on the Vibe virtual data machine.

Although the Informatica Big Data Edition supports MapR and Pivotal Hadoop distributions, the free 60 day trial is currently available only for Cloudera and Hortonworks. Informatica’s success in seeding its Big Data Edition with Cloudera and Hortonworks increases the likelihood that developers will explore and subsequently use its Big Data Edition platform as a means of discovering and manipulating Big Data sets. As such, Informatica’s Big Data Edition competes with products like Trifacta that similarly facilitate the manipulation, cleansing and visualization of Big Data by means of a code free user interface that increases analyst productivity and accelerates the derivation of actionable business intelligence. On one hand, the recent proliferation of Big Data products that allow users to explore Big Data without learning the intricacies of MapReduce democratizes access to Hadoop–based datasets. That said, the ability of graphical user interface-driven Big Data discovery and manipulation platforms to enable the granular identification of data anomalies, exceptions and eccentricities that may otherwise become obscured by large-scale trend analysis remains to be seen.

Zettaset Partners With Informatica For Big Data Integration And Processing

Zettaset recently announced that it will embed Informatica PowerCenter Big Data Edition into its Zettaset Orchestrator platform by way of an OEM partnership agreement. Under the terms of the agreement, Zettaset’s Orchestrator platform will leverage Informatica PowerCenter’s data integration functionality to optimize Big Data integration and processing in conjunction with Zettaset’s Hadoop management, security and streamlined deployment functionality. Powered by the Informatica Vibe virtual data machine, Informatica PowerCenter Big Data specializes in Big Data integration by enabling customers to access, integrate and manage massive amounts of data. Informatica Vibe’s “map once, deploy anywhere,” technology allows users to define the business logic for data mapping independent of the technology platform into which the data is deployed. Once users have defined the business rules for mapping source data, the data can be deployed in cloud hosting environments or traditional on-premise data centers without recoding. Moreover, PowerCenter’s “visual no-code development environment” ensures that developers can manipulate or manage data within Hadoop clusters without having to learn Hadoop. Zettaset’s partnership with Informatica complements PowerCenter’s offering by providing a platform that automates and simplifies Hadoop management marked by enterprise-grade security as well. Zettaset and Informatica plan to extend their partnership by embedding Informatica’s data quality, profiling and cleansing tools into the Zettaset Orchestrator platform.

Informatica Launches PaaS For Data Integration In Latest Release Of Informatica Cloud

This week, Informatica revealed the latest edition of its data integration software Informatica Cloud in the form of Informatica Cloud Spring 2012. Informatica Cloud Spring 2012 features the Informatica Cloud Developer Edition, a platform as a service for data integration that empowers developers to create connectors between on-premise or cloud-based applications and the Informatica Cloud. Whereas Informatica Cloud comes pre-built with connectors to select databases and applications, the Informatica Cloud Developer Edition promises to extend Informatica Cloud’s universe of connectivity by giving developers the tools to build connections between their enterprise data repositories and the Informatica Cloud data integration platform.

Connectors currently available on Informatica Marketplace feature applications such as Facebook, LinkedIn, Salesforce, Twitter and Zuora. The Informatica Cloud Developer Edition provides a data integration platform for developers to build connections as outlined in Informatica’s press release below:

Informatica Cloud Developer Edition enables SIs and ISVs to build, customize and deliver native connectivity to any cloud or on-premise business and social applications that have published Web Services APIs. With the new Cloud Connector Toolkit, developers have access to a Java-based API to quickly create high-performance connectors that run as sources or targets within Informatica Cloud

As long as the relevant application targeted for connection to the Informatica Cloud has a “published Web Services API”, developers can leverage a Java-based API provided by the Informatica Cloud Developer Edition to “create high-performance connectors that run as sources or targets within Informatica Cloud.” Informatica Cloud Spring 2012 also features Cloud Integration Templates that provide developers with pre-built templates for integrating data between and across select data repositories. The templates can be embedded within applications and then loaded to the Informatica Cloud or published in the Informatica Marketplace.

Connections that developers build between applications using the Informatica Cloud Developer Edition can also be sold in the Informatica Marketplace to like-minded enterprises seeking similar integration tools. Importantly, the Informatica Cloud Developer Edition illustrates the emerging popularity of PaaS solutions. By providing developers with a tool for creating customized data integration connectors between data sets and applications, Informatica Cloud promises to capitalize on a market appetite for platforms that empower enterprises to customize data integration to their own specific business needs. Because those same cloud connectors can be resold, Informatica’s PaaS promises to create an ever-expanding marketplace of reusable tools that expand data integration capabilities with its base product, Informatica Cloud.

Big Data 2011: The Year in Review

If 2011 was the year of Cloud Computing, then 2012 will surely be the year of Big Data. Big Data has yet to arrive in the way cloud computing has, but the framework for its widespread deployment as a commodity emerged with style and unmistakable promise. For the first time, Hadoop and NoSQL gained currency not only within the developer community, but also amongst bloggers and analysts. More importantly, Big Data garnered for itself a certain status and meaning in the technology community even though few people asked about the meaning of big in “Big Data” in a landscape where the circle around the meaning of “big” with respect to “data” is constantly being redrawn. Even though yesterday’s “big” in Big Data morphed into today’s “small” as consumer personal storage transitions from gigabytes to terabytes, the term “Big Data” emerged as a term that everyone almost instantly understood. It was as if consumers and enterprises alike had been searching for years for a long lost term to describe the explosion of data as evinced by web searches, web content, Facebook and Twitter feeds, photographs, log files and miscellaneous structured and unstructured content. Having been speechless, lacking the vocabulary to find the term for the data explosion, the world suddenly embraced the term Big Data with passion.

Below are some of the highlights of 2011 with respect to big data:

March
•Teradata finalized a deal to acquire Big Data player Aster Data Systems for $263 million.

July
•Yahoo revealed plans to create Hortonworks, a spin-off dedicated to the commercialization of Apache Hadoop.

September
Teradata announced the Teradata Aster MapReduce Platform that combines SQL with MapReduce. The Teradata Aster MapReduce Platform empowers business analysts who know SQL to leverage the power of MapReduce without having to write scripted queries in Java, Python, Perl or C.

October
Oracle announced plans to launch a Big Data appliance featuring Apache Hadoop, Oracle NoSQL Database Enterprise Edition and an open source distribution of R. The company’s announcement of its plans to leverage a NoSQL database represented an abrupt about face of an earlier Oracle position that discredited the significance of NoSQL.
Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Microsoft revealed a strategic partnership with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. Microsoft’s decision not to leverage NoSQL and use instead a Windows based version of Hadoop for SQL Server 2012 constituted the key difference between Microsoft and Oracle’s Big Data platforms.
IBM announced the release of IBM Infosphere BigInsights application for analyzing “Big Data.” The SmartCloud release of IBM’s BigInsights application means that IBM beat competitors Oracle and Microsoft in the race to deploy an enterprise grade, cloud based Big Data analytics platform.

November
•Christophe Bisciglia, founder of Cloudera, the commercial distributor of Apache Hadoop, launched a startup called Odiago that features a Big Data product named WibiData. WibiData manages investigative and operational analytics on “consumer internet data” such as website traffic on traditional and mobile computing devices.
Cloudera announced a partnership with NetApp, the storage and data management vendor. The partnership revealed the release of the NetApp Open Solution for Hadoop, a preconfigured Hadoop cluster that combines Cloudera’s Apache Hadoop (CDH) and Cloudera Enterprise with NetApp’s RAID architecture.
•Big Data player Karmasphere announced plans to join the Hortonworks Technology Partner Program today. The partnership enables Karmasphere to offer its Big Data intelligence product Karmasphere Analytics on the Apache Hadoop software infrastructure that undergirds the Hortonworks Data Platform.
Informatica released the world’s first Hadoop parser. Informatica HParser operates on virtually all versions of Apache Hadoop and specializes in transforming unstructured data into a structured format within a Hadoop installation.
MarkLogic announced support for Hadoop, the Apache open source software framework for analyzing Big Data with the release of MarkLogic 5.
HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, a Next Generation Information Platform that integrates two of its 2011 acquisitions, Vertica and Autonomy. Autonomy IDOL 10 features Autonomy’s capabilities for processing unstructured data, Vertica’s ability to rapidly process large-scale structured data sets, a NoSQL interface for loading and analyzing structured and unstructured data and solutions dedicated to the Data, Social Media, Risk Management, Cloud and Mobility verticals.

December
EMC announced the release of its Greenplum Unified Analytics Platform (UAP). The EMC Greenplum UAP contains the The EMC Greenplum platform for the analysis of structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and EMC Greenplum Chorus, a collaboration and productivity software tool that enables social networking amongst constituents in an organization that are leveraging Big Data.

The widespread adoption of Hadoop punctuated the Big Data story of the year so far. Hadoop featured in almost every Big Data story of the year, from Oracle to Microsoft to HP and EMC, while NoSQL came in a close second. Going into 2012, one of the key questions for the Big Data space concerns the ability of OpenStack to support Hadoop, NoSQL, MapReduce and other Big Data technologies. The other key question for Big Data hinges on the user friendliness of Big Data applications for business analysts in addition to programmers. EMC’s Greenplum Chorus, for example, democratizes access to its platform via a user interface that promotes collaboration amongst multiple constituents in an organization by transforming questions into structured queries. Similarly, the Teradata Aster MapReduce Platform allows business analysts to make use of its MapReduce technology by using SQL. That said, as Hadoop becomes more and more mainstream, the tech startup and data intensive spaces are likely to witness a greater number of data analysts trained in Apache Hadoop in conjunction with efforts by vendors to render Hadoop more accessible to programmers and non-programmers alike.

Informatica Releases World’s First Hadoop Parser

Informatica released the world’s first Hadoop parser on Wednesday in a move that boldly signalled its entry into the hotly contested Big Data analytics space. Informatica HParser operates on virtually all versions of Apache Hadoop and specializes in transforming unstructured data into a structured format within a Hadoop installation. HParser enables the transformation of textual data, Facebook and Twitter feeds, web logs, emails, log files and digital interactive media into a structured or semi-structured schema that allows businesses to more effectively mine the data for actionable business intelligence purposes.

Key features of HParser include the following:

• A visual, integrated development environment (IDE) that streamlines development via a graphical interface.
• Support for a wide range of data formats including XML, JSON, HL7, HIPAA, ASN.1 and market data.
• Ability to parse proprietary machine generated log files.
• Use of the parallelism of MapReduce to optimize parsing performance across massive structured and unstructured data sets.

Informatica’s HParser is available in a both a free and commercial edition. The free, community edition can parse log files, Omniture Web analytics data, XML and JSON. The commercial edition additionally supports HL7, HIPAA, SWIFT, X12, NACHA , ASN.1, Bloomberg, PDF, XLS or Microsoft Word formats. Informatica’s HParser builds upon the company’s June 2011 deployment of Informatica 9.1 for Big Data, which featured “connectivity to big transaction data from traditional transaction databases, such as Oracle and IBM DB2, to the latest optimized for purpose analytic databases, such as EMC Greenplum, Teradata, Teradata Aster Data, HP Vertica and IBM Netezza,” in addition to Hadoop.