Hadoop

Teradata Acquires Hadoop Data Archival Specialist RainStor

On December 17, Teradata announced the finalization of the acquisition of RainStor, a big data archiving company that specializes in archival solutions for Hadoop. The acquisition of RainStor gives Teradata ownership of RainStor’s technology for compressing and freezing Hadoop datastores for archival purposes. RainStor’s archival technology empowers companies to compress and store Hadoop data as Hadoop-based datasets proliferate throughout the enterprise in conjunction with the larger transition to data-driven operational and strategic analytics. The acquisition represents Teradata’s fourth major acquisition this year following upon the purchase of Revelytix, Hadapt and Think Big Analytics. Terms of the acquisition were not disclosed although most of RainStor employees will remain in their pre-acquisition locations in San Francisco and Gloucester. The acquisition strengthens Teradata’s Hadoop solutions by augmenting its ability to provide customers with enterprise-wide data archival capabilities.

Categories: Hadoop, Teradata

Hortonworks Files For IPO

As reported by Arik Hesseldahl in Recode, Apache Hadoop vendor Hortonworks has filed for an IPO. The decision by Hortonworks to offer public shares represents the first IPO from a major Hadoop vendor. In fiscal year 2013, Hortonworks reported a loss of $36.6M relative to $11M in revenue. Meanwhile, for the first 9 months of 2014, Hortonworks increased its revenue to $33.3M but posted a loss of $86.7M. The decision by Hortonworks to go public comes after two major capital raises in 2014. In July, HP invested $50M in Hortonworks, following upon the $100M raised by Hortonworks in March. Given the gargantuan capital raises specific to Hortonworks competitors Cloudera and MapR as well, the Big Data landscape should also expect IPOs from Cloudera and MapR in the near future. Meanwhile, more detailed analysis regarding the prospects of Hortonworks executing a successful IPO will emerge in coming weeks in anticipation of the launch of the IPO either in late 2014 or early 2015. In 2011, Hortonworks was spun out of Yahoo, its principal investor. Hortonworks plans to raise up to $100M by means of its IPO.

Categories: Hadoop, Hortonworks

DataTorrent Enhances Platform For Real-Time Analytics On Streaming Big Data

DataTorrent recently announced the availability of DataTorrent Real-Time Streaming (RTS) 2.0, which builds on its June release of the 1.0 version of by providing enhanced capabilities to run real-time analytics on streaming Big data sets. DataTorrent RTS 2.0 boasts the ability to ingest data from “any source, any scale and any location” by means of over 75 connectors that allow the platform to ingest varieties of structured and unstructured data. In addition, this release delivers over 450 Java operators that allow data scientists to perform queries and advanced analytics on Big datasets including predictive analytics, statistical analysis and pattern recognition. In a phone interview with John Fanelli, DataTorrent’s VP of Marketing, Cloud Computing Today learned that the platform has begun work on a Private Beta of a product, codenamed Project DaVinci, to streamline the design of applications via a visual interface that allows data scientists to graphically select data sources, analytic operators and their inter-relationship as depicted below:

As the graphic illustrates, DataTorrent Project DaVinci (Private Beta) delivers a unique visual interface for the design of applications that leverage Hadoop-based datasets. Data scientists can take advantage of DataTorrent’s 450+ Java operators and the platform’s advanced analytics functionality to create and debug applications that utilize distributed datasets and streaming Big data. Meanwhile, DataTorrent RTS 2.0 also boasts the ability to store massive amounts of data in a “HDFS based distributed hash table” that facilitates rapid lookups of data for analytic purposes. With version 2.0, DataTorrent continues to disrupt the real-time, Big data analytics space by delivering a platform capable of ingesting data at any scale and running real-time analytics in the broader context of a seductive visual interface for creating Big data analytics applications. DataTorrent competes in the hotly contested real-time Big data analytics space alongside technologies such as Apache Spark, but delivers a range of functionality that supersedes Spark Streaming as illustrated by its application design, advanced analytics and flexible data ingestion capabilities.

Categories: Big Data, DataTorrent, Hadoop

Informatica Big Data Edition Comes Pre-Installed On Cloudera QuickStart VM And Hortonworks Sandbox

Earlier this month, Informatica announced 60 day free trials of Informatica Big Data Edition for Cloudera QuickStart VM and the Hortonworks Sandbox. The 60 day trial means that the Informatica Big Data Edition will be pre-installed in the sandbox environments of two of the leading Hadoop distributions in the Big Data marketplace today. Developers using the Cloudera QuickStart VM and Hortwonworks Sandbox now have streamlined access to Informatica’s renowned big data cleansing, data integration, master data management and data visualization tools. The code-free, graphical user interface-based Informatica Big Data Edition allows customers to create ETL and data integration workflows as well as take advantage of the hundreds of pre-installed parsers, transformations, connectors and data quality rules for Hadoop data processing and analytics. The Informatica Big Data platform specializes in Hadoop profiling, parsing, cleansing, loading, enrichment, transformation, integration, analysis and visualization and reportedly improves developer productivity five-fold by means of its automation and visual interface built on the Vibe virtual data machine.

Although the Informatica Big Data Edition supports MapR and Pivotal Hadoop distributions, the free 60 day trial is currently available only for Cloudera and Hortonworks. Informatica’s success in seeding its Big Data Edition with Cloudera and Hortonworks increases the likelihood that developers will explore and subsequently use its Big Data Edition platform as a means of discovering and manipulating Big Data sets. As such, Informatica’s Big Data Edition competes with products like Trifacta that similarly facilitate the manipulation, cleansing and visualization of Big Data by means of a code free user interface that increases analyst productivity and accelerates the derivation of actionable business intelligence. On one hand, the recent proliferation of Big Data products that allow users to explore Big Data without learning the intricacies of MapReduce democratizes access to Hadoop–based datasets. That said, the ability of graphical user interface-driven Big Data discovery and manipulation platforms to enable the granular identification of data anomalies, exceptions and eccentricities that may otherwise become obscured by large-scale trend analysis remains to be seen.

Categories: Big Data, Hadoop, Informatica | Tags:

Teradata Acquires Hadoop Consulting And Strategy Services Firm Think Big Analytics

Teradata continued its spending spree by acquiring the Mountain View, CA-based Hadoop consulting firm Think Big Analytics on Wednesday. The acquisition of Think Big Analytics will supplement Teradata’s own consulting practice. Think Big Data Analytics, which has roughly 100 employees, specializes in agile SDLC methodologies for Hadoop consulting engagements that typically last more than a month but less than a quarter of a year. According to Teradata Vice President of Product and Services Marketing Chris Twogood, Teradata has “now worked on enough projects that it’s been able to build reusable assets” as reported in PCWorld. Think Big Analytics will retain its branding and its management team will remain at the company’s Mountain View office. Teradata’s acquisition of Think Big Analytics comes roughly two months after its purchase of Revelytix and Hadapt. Revelytix provides a management framework for metadata on Hadoop whereas Hadapt’s technology empowers SQL developers to manipulate and analyze Hadoop-based data. Teradata’s third Big Data acquisition in less than two months comes at a moment when the Big Data space is exploding with a proliferation of vendors that differentially tackle the problem of data discovery, exploration, analysis and visualization with respect to Hadoop-based data. The question now is whether the industry will experience early market consolidation as evinced by startups snapped up by larger vendors or whether the innovation that startups provide will be able to survive a land grab in the Big Data space initiated by larger, well capitalized companies seeking to complement their Big Data portfolio with newly minted Big Data products and technologies. Terms of Teradata’s acquisition of Think Big Analytics were not disclosed.

Categories: Big Data, Hadoop, Teradata | Tags:

Trifacta’s Deepened Integration With Tableau Streamlines Visualization Of Hadoop Data

Trifacta recently announced a deeper integration of its Data Transformation platform with Tableau, the leader in data visualization and business intelligence, as a key feature of the release of the Trifacta Data Transformation Platform 1.5. The Trifacta Data Transformation Platform 1.5 allows customers to export Trifacta data to a Tableau Data Extract format or register it with Hadoop’s HCatalog to facilitate the integration of Hadoop-based data from Trifacta into Tableau. Trifacta’s Chief Strategy Officer Joe Hellerstein remarked on the significance of the deeper integration with Tableau as follows:

Tableau creates huge opportunities for effectively analyzing data, but working with big data poses specific challenges. The most significant barriers come from structuring, distilling and automating the transfer of data from Hadoop. Our integration removes these barriers in a way that complements self-service data analysis. Now, Trifacta and Tableau users can move directly from big data in Hadoop to powerful, interactive visualizations.

Trifacta’s ability to output data to Tableau Data Extract format means that its customers can more seamlessly integrate Trifacta data with Tableau and reap the benefits of its renowned data visualization capabilities. The Trifacta Data Transformation platform specializes in enhancing analyst productivity in relation to Big Data sets by delivering a machine learning-based user interface that allows analysts to explore, transform, cleanse, visualize and manipulate massive data sets. Moreover, Trifacta’s predictive interaction technology iteratively learns from analyst behavior and offers users guided suggestions about productive paths for data discovery and exploration. The announcement of Trifacta’s deepened integration with Tableau means that Trifacta data which has experienced a process of transformation now encounters a streamlined segue to the Tableau platform. Meanwhile, the deepened partnership between the two vendors positions Tableau to consolidate its market positioning as the de facto business intelligence platform for Hadoop-based data.

Categories: Big Data, Hadoop, Trifacta

Metanautix Emerges From Stealth With $7M In Series A Funding For Streamlined Big Data Processing And Analytics

Metanautix emerged from stealth today by announcing the finalization of $7M in Series A funding in a round led by Sequoia Capital. Additional investors include the Stanford University endowment fund and Shiva Shivakumar, former VP of Engineering at Google. Metanautix delivers a Big Data analytics platform composed of a SQL interface for querying Hadoop data in conjunction with data discovery functionality that empowers analysts to more easily navigate massive amounts of structured and unstructured data. The platform focuses on simplifying the data pipeline between data acquisition and the production of data analytics. As such, Metanautix removes the necessity of combining disparate data sources and thereby delivers the benefits of distributed computing alongside the simplicity of SQL. Users of Metanautix can perform analytics in parallel on structured and unstructured data by taking advantage of an interface that allows users to understand the topography of the data that they are navigating. Founded by veterans of Google and Facebook, Metanautix intervenes in the big data analytics space by allowing users to run analytics on multiple streams of Big Data as noted by CEO Theo Vassilakis below:

The modern enterprise operates on a plethora of data sources. There is great value in using all of these data sources and in providing superior access to ask questions of any data. We’ve made it fast and simple for anyone in an organization to work with any number of data sources at any scale and at a speed that enables rapid business decisions.

Vassilakis notes the ability of Metanautix to manage “any number” of datasets “at any scale” toward the larger end of delivering actionable business intelligence from disparate data sources. Given Vassilakis’s background at Google working on Dremel and the experience of Metanautix’s CTO Apostolos Lerios with processing frameworks for billions of photographic images during his tenure at Facebook, the industry can expect Metanautix to deliver a truly multivalent processing and analytics engine capable of managing heterogeneous data sources of all kinds. Expect more details about the platform to emerge in forthcoming months but, based on the experience of its founders, the Big Data space should brace for the entry of a disruptive Big Data analytics and processing engine that can deliver analytics on massive datasets by means of a radically streamlined operational process. That said, Metanautix will need to find its niche quickly in order to outshine competitors such as Pivotal and Infochimps, the former of which recently announced a collaboration with Hortonworks to enhance Apache Ambari.

Categories: Big Data, Hadoop, Metanautix, Venture Capital

Blog at WordPress.com. The Adventure Journal Theme.