LinkedIn Open Sources Dr. Elephant To Facilitate Optimization Of Hadoop-based Flows

LinkedIn recently announced the open sourcing of Dr. Elephant, a tool that helps Hadoop users optimize their flows. Dr. Elephant aggregates and analyzes data about Hadoop jobs and delivers suggestions about how to optimize jobs to increase their efficiency. Whereas most Hadoop optimization tools focus on simplifying and streamlining the management of Hadoop clusters, Dr. Elephant focuses on the optimization of Hadoop flows. As noted in a LinkedIn blog post, the platform leverages “pluggable, configurable, rule-based heuristics” to provide analytical insight about job performance in addition to recommendations for performance optimization. Used by LinkedIn to enhance developer productivity and improve the efficiency of Hadoop clusters by optimizing their constituent flows, Dr. Elephant delivers an aggregated dashboard of all of the jobs that run on a specific cluster in conjunction with drill-down, visualization functionality of flow performance for each job. The platform specializes in diagnostics at the job-level in contrast to the cluster itself, and is widely used by LinkedIn to diagnose and solve over 80% of flow performance questions. Open sourced under an Apache version 2 license, Dr. Elephant is compatible with Apache Hadoop and Apache Spark and plays in the same space as Driven, the Big Data application performance management framework pioneered by Concurrent Inc.