This week, Cloudera announced the general availability of Cloudera Search, the interactive search engine that enables users to perform free text searches on data stored within the Hadoop Distributed File System (HDFS) and Apache HBase without advanced scripting experience or training. Powered by open source search engine Apache Solr, Cloudera Search is integrated with Apache Zookeeper to manage distributed processing, index sharding and high availability. Cloudera announced the general availability of Cloudera Search after a three month public beta that began on June 4, and months of a private beta prior to June. The platform represents part of Cloudera’s larger project of democratizing access to Big Data and will sit alongside Cloudera Impala, Cloudera’s SQL interface for querying Hadoop clusters. A schematic of Cloudera’s library of tools and platforms for processing Hadoop-based data is given below:
Cloudera Search manages the creation of indexes of Hadoop data with a scalability comparable to MapReduce and integrates indexes produced upon querying Hadoop data into HDFS. Cloudera Search also supports real-time indexing of newly ingested data through an integration with Apache Flume. The platform enables “linearly scalable batch indexing for large data stores within Hadoop on-demand” and its GoLive functionality accommodates “incremental index changes.” Moreover, the Search platform is available by way of Hue, Cloudera’s open source user interface for querying Apache Hadoop data.
With this week’s general availability announcement, Cloudera Search is fully available amongst Cloudera’s product line and is supported by CDH 4.3. Overall, the GA of Cloudera Search illustrates the intensity of the battle to bring Hadoop to non-technical enterprise users by means of an interactive search platform whose ease of use parallels web search platforms such as Google and Bing. MapR, for example, announced an integration of its Hadoop platform with LucidWorks search for Big Data in February. The industry should expect interactive search platforms for Hadoop to proliferate and achieve greater sophistication as Hadoop adoption accelerates across the enterprise.