Informatica released the world’s first Hadoop parser on Wednesday in a move that boldly signalled its entry into the hotly contested Big Data analytics space. Informatica HParser operates on virtually all versions of Apache Hadoop and specializes in transforming unstructured data into a structured format within a Hadoop installation. HParser enables the transformation of textual data, Facebook and Twitter feeds, web logs, emails, log files and digital interactive media into a structured or semi-structured schema that allows businesses to more effectively mine the data for actionable business intelligence purposes.
Key features of HParser include the following:
• A visual, integrated development environment (IDE) that streamlines development via a graphical interface.
• Support for a wide range of data formats including XML, JSON, HL7, HIPAA, ASN.1 and market data.
• Ability to parse proprietary machine generated log files.
• Use of the parallelism of MapReduce to optimize parsing performance across massive structured and unstructured data sets.
Informatica’s HParser is available in a both a free and commercial edition. The free, community edition can parse log files, Omniture Web analytics data, XML and JSON. The commercial edition additionally supports HL7, HIPAA, SWIFT, X12, NACHA , ASN.1, Bloomberg, PDF, XLS or Microsoft Word formats. Informatica’s HParser builds upon the company’s June 2011 deployment of Informatica 9.1 for Big Data, which featured “connectivity to big transaction data from traditional transaction databases, such as Oracle and IBM DB2, to the latest optimized for purpose analytic databases, such as EMC Greenplum, Teradata, Teradata Aster Data, HP Vertica and IBM Netezza,” in addition to Hadoop.