This week, Christophe Bisciglia, founder of Cloudera, the commercial distributor of Apache Hadoop, launched a startup called Odiago that features a Big Data product named WibiData. Bisciglia launched WibiData with the backing of Google Chairman Eric Schmidt, Cloudera CEO Mike Olson, and SV Angel, the Silicon Valley-based angel fund. WibiData manages investigative and operational analytics on “consumer internet data” such as website traffic on traditional and mobile computing devices. WibiData leverages an Hbase and Hadoop technology platform that features the following attributes: (1) All data specific to a single user/machine/mobile device is organized within one Hbase row; (2) “Produce,” an analytic operator that functions on individual rows. Produce maps data from individual rows into interactive user applications. Produce also performs analytic operations such as classification and weightage of different rows in conjunction with an analytic rules engine; (3) “Gather”, an analytic operator that operates on all rows combined.
WibiData’s “Produce” and “Gather” components operate within a single table database structure in which the schema can dynamically evolve over time. Whereas most relational databases hold a single value in a cell, WibiData’s non-relational database structure allows for an entire table to be stored within a cell. Moreover, WibiData features fewer data manipulation language capabilities for retrieving, updating, inserting and deleting data than SQL. Curt Monash provides a terrific technical overview of WibiData in his blog DBMS2. For more about the company’s founders, see TechCrunch.
SGI and Cloudera today announced a reseller partnership whereby SGI will sell pre-configured Hadoop clusters of hardware and software in addition to technical support. Under the terms of the agreement, SGI will distribute Cloudera’s Apache Hadoop (CDH) alongside its rackable servers and provide level 1 technical support, while Cloudera will provide level 2 and level 3 technical support. SGI already claims a history of deploying Hadoop servers dating back to Hadoop’s earliest days and expects to leverage its existing relationships with customers in the government and financial sectors. SGI’s VP of Product Marketing, Bill Mannel, noted that “SGI has been successfully deploying Hadoop customer installations of up to 40,000 nodes and individual Hadoop clusters of up to 4,000 nodes for a number of years now.” 40,000 nodes per customer installation and 4,000 nodes per cluster represent the upper bound of Hadoop cluster size at Yahoo! and similar enterprise level installations. Mannel elaborated on SGI’s experience with large Hadoop installations by commenting: “This benchmark, our growing presence, and our role in the Hadoop ecosystem, reflect our ongoing commitment to pushing the bar on performance and driving relationships that benefit our customers. As they wrestle with bigger and more complex data challenges every day they can trust SGI to deliver complete Hadoop solutions based on years of experience.”
SGI’s distribution of Hadoop is expected to target customers that would like an enterprise level installation without dedicating in house talent to the deployment. Hadoop is an disruptive open source technology that provides a framework for managing massive volumes of structured and unstructured data. Hadoop provides the data infrastructure for Facebook, LinkedIn and Twitter and has recently gained attention in the wake of recent announcements by Oracle and Microsoft about entering the Big Data space by leveraging Hadoop technology.