Code 42 Software Finalizes $52.5 Million Funding Raise Led By Accel Partners Big Data Fund

Code 42 Software, provider of online backup solutions for consumers and enterprises, finalized a capital raise of $52.5 million from Accel Partners and Split Rock Partners. The capital raise represents Accel’s first investment out of its $100 million fund for Big Data companies. Founded in 2001, Code 42 Software has been profitable for the last three years as reported by GigaOM. The Minneapolis-based company’s products include Crash Plan for consumers, Crash Plan PRO for small businesses and Crash Plan PROe for enterprises. Co-founder and CEO Matthew Dornquast noted that the funding raise was unusual given the profitability of the company, which had bootstrapped its financing in the initial years of its operation. The funding will be used to grow the business globally and add additional features that render it easier for users to more effectively search and manage access privileges to their online backup repositories. Currently, Code 42 manages more than 100 petabytes of data and backs up 250 million new files per day. Although Accel had been pursuing Code 42 for two years, the company finally decided to take in the funding raise upon recognizing possibilities for enterprise level business intelligence analytics on files shared between devices and within an ecosystem and the market opportunity to expand globally. Code 42 competes with the likes of Carbonite, Mozy, Iron Mountain, Box.net and Dropbox. The company currently has more than 4000 enterprise customers including Adobe, Google, Groupon, HP, Intuit, Kraft Foods, LinkedIn, NASA, National Geographic and Salesforce.com.

Fujitsu Reveals Cloud Based Platform For Big Data From Sensing Technologies

Fujitsu revealed a cloud based platform for Big Data known called Data Utilization Platform Services on Monday. The platform enables the aggregation, exchange, manipulation and analysis of “massive quantities of sensing data” in a variety of formats. Fujitsu’s Data Utilization Platform Services features the following four components:

• Data Management & Integration Services

The platform provides an apparatus for the collection and categorization of massive volumes of sensor-driven data.

• Communications Control Services

In addition to collecting and categorizing data, Fujitsu’s platform can transmit data to other devices in order to automatically adjust data driven equipment such as devices in a home, automotive, factory or scientific environment.

• Data Collection and Detection Services

Fujitsu’s platform can apply rules to data derived from sensors to adjust machinic behavior with real-time frequency using an iterative feedback loop. Rule based sensing data decision making may involve equipment in the fields of navigation, robotics or other industries in which real-time decisions depend on a contemporary data store.

• Data Analysis Services.

The platform contains a bevy of business intelligence tools that enable the production of actionable analytics to drive operational decisions.

A schematic of the architecture of the Data Utilization Platform Services is given below:

Fujitsu will also be offering a set of “Data Curation Services” that involve professional services and analytic tools that assist customers to tackle their Big Data challenges. Fujitsu failed to elaborate on the underlying technology for either the cloud-based or Big Data components of its Data Utilization Platform Services, but a report in The Register speculates that “Hadoop, the open source MapReduce data muncher and its related Hadoop Distributed File System” constitutes one of the platform’s key technologies. Absent details of its underlying technology, the most notable feature about Fujitsu’s cloud platform for Big Data is its distinct focus on data derived from sensing technologies in fields such as navigation, robotics and meteorology.

Ten Things You Should Know About Splunk And Its $125 Million IPO

Splunk Inc. filed for a $125 million IPO on Friday in what marks the first IPO in the rapidly growing Big Data technology space. Big Data technology refers to software that specializes in the analysis of massive amounts of structured and unstructured data. Splunk’s mission is “to make machine data accessible, usable and valuable to everyone in an organization.” Splunk produces software that analyzes operational machine data about customer transactions, user actions and security risks. The San Francisco based company provides IT and business stakeholders with analytics that enable them to improve project delivery, cut costs, reduce security threats, demonstrate compliance with security regulations and derive actionable business intelligence insights.

Founded in 2004, Splunk capitalized on the market opportunity for actionable analytics on data derived from increasingly complex and heterogeneous enterprise IT environments featuring corporate data centers, cloud based and virtualized application environments. Splunk’s software provides its users with a 360 degree view of analytics about enterprise operations by running against structured data sets as well as unstructured data that lacks a pre-defined schema. Here are ten things you should know about Splunk and its S-1 filing:

1. Splunk has over 3300 customers including Bank of America, Zynga, Salesforce.com and Comcast.

2. Splunk’s software can be downloaded and installed within hours and lacks extensive customization and professional services for setup. Splunk is currently developing Splunk Storm (Beta), a cloud-based version of its software that features a subset of its functionality.

3. Splunk recorded revenues of $18.2 million, $35.0 million and $66.2 million in fiscal 2009, 2010 and 2011, with losses of $14.8 million, $7.5 million and $3.8 million, respectively. Revenue grew at a rate of 93% for fiscal 2010 and 89% for fiscal 2011.

4. For the first nine months of fiscal 2011 and 2012, Splunk’s revenues were $43.5 million and $77.8 million, with losses of $2 million and $9.7 million, respectively. Revenue grew at a rate of 79% during this time period.

5. Splunkbase and Splunk Answers, Splunk’s online user communities, provide customers with an infrastructure by which to share apps and offer each other insights and support. Splunk believes that enriching these user communities constitutes a key component of its growth strategy.

6. More than 300 apps are available via the Splunkbase website. Over 100 apps were developed by third parties. Examples of Splunk apps include Splunk for Enterprise Security, Splunk for PCI Compliance and Splunk for VMware.

7. In fiscal 2011 and the first nine months of fiscal 2012, 21% and 24% of Splunk’s revenues derived from international sales. The large percentage of Splunk’s customers that are outside the U.S. means that the company is vulnerable to risks specific to international sales transactions related to global economic conditions, increased payment cycles and the additional costs of managerial, legal and accounting for international business operations.

8. The IPO filing cited the following analytics vendors as Splunk’s principal competition: (1) Web analytics vendors such Adobe Systems, Google, IBM and Webtrends; (2) Business intelligence vendors including IBM, Oracle, SAP and EMC; and (3) Big Data technologies such as Hadoop.

9. Godfrey Sullivan has served as Splunk’s CEO since 2008. Prior to Splunk, Sullivan was CEO of Hyperion Solutions Corp., which he helped sell to Oracle for $3.3 billion in 2007.

10. Three of Splunk’s key technologies are Schema on the fly, Machine data fabric and Search engine capability for Machine data. Schema on the fly refers to the ability to develop schemas that adjust to queries and relevant data sets instead of inserting data into a pre-defined schema. The result is a more flexible modality of tagging data that renders itself receptive to unstructured data sets that lack a well defined schema. Machine data fabric refers to the ability to access machine data in all its various forms. Splunk’s machine data fabric means that no data is left uncovered by its software. As noted in the S-1 filing, Splunk’s “software enables users to process machine data no matter the infrastructure topology, from a single machine to a globally distributed, virtualized IT infrastructure.” Search engine capability means that Splunk boasts a range of arithmetic and advanced statistical capabilities for searching and performing business intelligence analysis on machine data.

Splunk has yet to reveal the number of shares that will be offered as part of its $125 million IPO under the ticker symbol SPLK. Thus far, the company has raised $40 million in venture capital funding from August Capital, JK&B Capital, Ignition Partners and Sevin Rosen Funds. The IPO is led by Morgan Stanley. JPMorgan Chase & Co., Credit Suisse Group AG and Bank of America Corp. are also working with Morgan Stanley on the public offering. Rest assured that Splunk’s IPO will be watched very closely by all vendors in the Big Data space.

Oracle Partners With Cloudera For Newly Available Big Data Appliance

On Tuesday, Oracle declared the availability of the Big Data appliance that it introduced to the world at its October conference Oracle Open World. The appliance runs on Linux and features Cloudera’s version of Apache Hadoop (CDH), Cloudera Manager for managing the Hadoop distribution, the Oracle NoSQL database as well as an open source version of R, the statistical software package. Oracle’s partnership with Cloudera in delivering its Big Data appliance goes beyond the latter’s selection as a Hadoop distributor to include assistance with customer support. Oracle plans to deliver tier one customer support while Cloudera will provide assistance with tier two and tier three customer inquiries, including those beyond the domain of Hadoop.

Oracle will run its Big Data appliance on hardware featuring 864 GB main memory, 216 CPU cores, 648 TB of raw disk storage, 40 Gb/s InfiniBand connectivity and10 Gb/s Ethernet data center connectivity. Oracle also revealed details of four connectors to its appliance with the following functionality:

• Oracle Loader for Hadoop to load massive amounts of data into the appliance by using the MapReduce parallel processing technology.
• Oracle Data Integrator Application Adapter for Hadoop which provides a graphical interface that simplifies the creation of Hadoop MapReduce programs.
• Oracle Connector R which provides users of R streamlined access to the Hadoop Distributed File System (HDFS)
• Oracle Direct Connector for Hadoop Distributed File System (ODCH), which supports the integration of Oracle’s SQL database with its Hadoop Distributed File System.

Oracle’s announcement of the availability of its Big Data appliance comes as the battle for Big Data market share takes shape in a landscape dominated by the likes of Teradata, Microsoft, IBM, HP, EMC, Informatica, MarkLogic and Karmasphere. Oracle’s selection of Cloudera as its Hadoop distributor indicates that it intends to make a serious move into the world of Big Data. For one, the partnership with Cloudera gives Oracle increased access to Cloudera’s universe of customers. Secondly, the partnership enhances the credibility of Oracle’s Big Data offering given that Cloudera represents that most prominent distributor of Apache Hadoop in the U.S.

In October, Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Whereas Oracle chose Cloudera for Hadoop distribution, Microsoft partnered with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. In late November, HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, which features the ability to process large-scale structured data sets in addition to a NoSQL interface for loading and analyzing structured and unstructured data. In December, EMC released its Greenplum Unified Analytics Platform (UAP) marked by the ability to load structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and Chorus, a collaboration and productivity software tool. Bolstered by its partnership with Cloudera, Oracle is set to compete squarely with HP’s Autonomy IDOL 10, EMC’s Greenplum Chorus and IBM’s BigInsights until Microsoft’s appliance officially enters the Big Data doohyoo (土俵) qua sumo ring as well.

Big Data 2011: The Year in Review

If 2011 was the year of Cloud Computing, then 2012 will surely be the year of Big Data. Big Data has yet to arrive in the way cloud computing has, but the framework for its widespread deployment as a commodity emerged with style and unmistakable promise. For the first time, Hadoop and NoSQL gained currency not only within the developer community, but also amongst bloggers and analysts. More importantly, Big Data garnered for itself a certain status and meaning in the technology community even though few people asked about the meaning of big in “Big Data” in a landscape where the circle around the meaning of “big” with respect to “data” is constantly being redrawn. Even though yesterday’s “big” in Big Data morphed into today’s “small” as consumer personal storage transitions from gigabytes to terabytes, the term “Big Data” emerged as a term that everyone almost instantly understood. It was as if consumers and enterprises alike had been searching for years for a long lost term to describe the explosion of data as evinced by web searches, web content, Facebook and Twitter feeds, photographs, log files and miscellaneous structured and unstructured content. Having been speechless, lacking the vocabulary to find the term for the data explosion, the world suddenly embraced the term Big Data with passion.

Below are some of the highlights of 2011 with respect to big data:

March
•Teradata finalized a deal to acquire Big Data player Aster Data Systems for $263 million.

July
•Yahoo revealed plans to create Hortonworks, a spin-off dedicated to the commercialization of Apache Hadoop.

September
Teradata announced the Teradata Aster MapReduce Platform that combines SQL with MapReduce. The Teradata Aster MapReduce Platform empowers business analysts who know SQL to leverage the power of MapReduce without having to write scripted queries in Java, Python, Perl or C.

October
Oracle announced plans to launch a Big Data appliance featuring Apache Hadoop, Oracle NoSQL Database Enterprise Edition and an open source distribution of R. The company’s announcement of its plans to leverage a NoSQL database represented an abrupt about face of an earlier Oracle position that discredited the significance of NoSQL.
Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Microsoft revealed a strategic partnership with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. Microsoft’s decision not to leverage NoSQL and use instead a Windows based version of Hadoop for SQL Server 2012 constituted the key difference between Microsoft and Oracle’s Big Data platforms.
IBM announced the release of IBM Infosphere BigInsights application for analyzing “Big Data.” The SmartCloud release of IBM’s BigInsights application means that IBM beat competitors Oracle and Microsoft in the race to deploy an enterprise grade, cloud based Big Data analytics platform.

November
•Christophe Bisciglia, founder of Cloudera, the commercial distributor of Apache Hadoop, launched a startup called Odiago that features a Big Data product named WibiData. WibiData manages investigative and operational analytics on “consumer internet data” such as website traffic on traditional and mobile computing devices.
Cloudera announced a partnership with NetApp, the storage and data management vendor. The partnership revealed the release of the NetApp Open Solution for Hadoop, a preconfigured Hadoop cluster that combines Cloudera’s Apache Hadoop (CDH) and Cloudera Enterprise with NetApp’s RAID architecture.
•Big Data player Karmasphere announced plans to join the Hortonworks Technology Partner Program today. The partnership enables Karmasphere to offer its Big Data intelligence product Karmasphere Analytics on the Apache Hadoop software infrastructure that undergirds the Hortonworks Data Platform.
Informatica released the world’s first Hadoop parser. Informatica HParser operates on virtually all versions of Apache Hadoop and specializes in transforming unstructured data into a structured format within a Hadoop installation.
MarkLogic announced support for Hadoop, the Apache open source software framework for analyzing Big Data with the release of MarkLogic 5.
HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, a Next Generation Information Platform that integrates two of its 2011 acquisitions, Vertica and Autonomy. Autonomy IDOL 10 features Autonomy’s capabilities for processing unstructured data, Vertica’s ability to rapidly process large-scale structured data sets, a NoSQL interface for loading and analyzing structured and unstructured data and solutions dedicated to the Data, Social Media, Risk Management, Cloud and Mobility verticals.

December
EMC announced the release of its Greenplum Unified Analytics Platform (UAP). The EMC Greenplum UAP contains the The EMC Greenplum platform for the analysis of structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and EMC Greenplum Chorus, a collaboration and productivity software tool that enables social networking amongst constituents in an organization that are leveraging Big Data.

The widespread adoption of Hadoop punctuated the Big Data story of the year so far. Hadoop featured in almost every Big Data story of the year, from Oracle to Microsoft to HP and EMC, while NoSQL came in a close second. Going into 2012, one of the key questions for the Big Data space concerns the ability of OpenStack to support Hadoop, NoSQL, MapReduce and other Big Data technologies. The other key question for Big Data hinges on the user friendliness of Big Data applications for business analysts in addition to programmers. EMC’s Greenplum Chorus, for example, democratizes access to its platform via a user interface that promotes collaboration amongst multiple constituents in an organization by transforming questions into structured queries. Similarly, the Teradata Aster MapReduce Platform allows business analysts to make use of its MapReduce technology by using SQL. That said, as Hadoop becomes more and more mainstream, the tech startup and data intensive spaces are likely to witness a greater number of data analysts trained in Apache Hadoop in conjunction with efforts by vendors to render Hadoop more accessible to programmers and non-programmers alike.

Big Data Goes Social With EMC’s Greenplum Unified Analytics Platform

EMC announced the release of its Greenplum Unified Analytics Platform (UAP) on Thursday. The Greenplum Unified Analytics Platform, a unified platform for processing structured and unstructured data, represents EMC’s latest move to consolidate its positioning in the Big Data space and compete squarely with Big Data offerings recently elaborated by Oracle, Microsoft and HP. EMC’s announcement comes scarcely two weeks after HP’s disclosure of the integration of its Autonomy and Vertica offerings within a unified Next Generation Information Platform called Autonomy IDOL 10 that specializes in the processing of structured and unstructured data. EMC’s Unified Analytics Platform features integration with Hadoop, the software framework for analyzing massive amounts of structured and unstructured data.

The EMC Greenplum UAP contains the following three components:

• The EMC Greenplum platform for the analysis of structured data.
• Enterprise-grade Hadoop for analyzing structured and unstructured data.
• EMC Greenplum Chorus, a collaboration and productivity software tool that enables social networking amongst constituents in an organization that are leveraging Big Data.

EMC Greenplum Chorus recognizes the way in which Big Data scientists and analysts may be geographically dispersed across different enterprise locations, even as they need to collaborate to deliver enterprise-wide analysis that integrates structured and unstructured data from different data sets. GigaOM reports data exploration represents one of the most significant features of Chorus because it provides users with a Facebook-like user interface which enables data scientists to “launch a sandbox environment and start analyzing the data with just a few clicks.” According to EMC’s press release, Chorus facilitates collaboration amongst Big Data teams as follows:

EMC Greenplum Chorus opens data science teams up to an entirely new way to collaborate across dispersed geographies and with very large data sets. Through the Chorus interface, users get ready access to tools, data and supporting resources that enable enterprise-wide Big Data productivity. Frictionless and rapid collaboration across data science teams helps to ensure useful insights get back to the business in time to take the right actions, thus increasing agility and innovation.

Like IBM’s artificial intelligence supercomputer Watson, Chorus provides an interface for translating human questions into queries that run against petabytes of data. Chorus also allows users to share results from and refine approaches to data analysis. The social networking component of EMC’s Unified Analytics Platform ensures that diverse constituents can examine Big Data and iteratively refine their approach to data analysis as a collective. Chorus, the collaborative platform of UAP, profoundly differentiates EMC’s Big Data offering from competing products from HP, Oracle, Microsoft, Cloudera and Odiago.

EMC’s Unified Analytic Platform represents the convergence of the hottest trends in technology today: cloud computing, Big Data, virtualization and social networking. The question now is whether social networking and Big Data represents a fad that will pass, or an innovation that forever changes the landscape of products in the Big Data space.