William Blair Reports Rackspace Stands Poised To Close Wal-Mart Big Data Deal

Research firm William Blair reported that Rackspace is likely to win business from Wal-Mart for the purpose of Big Data analytics in the retail sector. According to William Blair analyst Jim Breen, Wal-Mart is hiring OpenStack technical resources and outsourcing cloud-related services to Rackspace. Wal-Mart is reportedly in the process of combining its EMC and IBM-based retail data platforms into one aggregated Big Data platform. Breen wrote that Wal-Mart’s ten online portals currently use segregated data siloes and that the larger corporate vision is to combine these discrete platforms into one massive data repository that enables richer insights about consumer behavior and operations. Breen spoke of the significance of Rackspace’s collaboration with Wal-Mart by noting:

From a broad perspective, we believe Rackspace’s ability to gain traction with Wal-Mart for big data reflects early success of the OpenStack platform and foreshadows new market opportunities.

What is surprising about Breen’s report is that, while Rackspace is recognized as an OpenStack founder and visionary, the San Antonio-based company is less well known as a key player in the Big Data space. Rackspace may be planning to count on leveraging its ability to aggregate web-based data using OpenStack object and block storage as the infrastructure for a Big Data platform for Wal-Mart, but details of the Big Data analytic and querying tools it plans to use for the collaboration have yet to emerge. In any case, shares of Rackspace closed up 2% in early trading on Wednesday after the William Blair announcement. Rackspace shares are up nearly 60% for the entire year.

Trifacta Closes $4.3 Million In Series A Funding; Seeks To Make Big Data Insights More Accessible

This Thursday, Trifacta came out of stealth mode by announcing $4.3 million in Series A funding led by Accel Partners, with additional participation from X/Seed Capital, Data Collective and angel investors Dave Goldberg, Venky Harinarayan and Anand Rajaraman. Trifacta’s mission is to “radically enhance productivity for data analysis” by delivering a solution catered to the human resources responsible for gleaning business significance out of data analysis. Based on the premise that the cost of skilled data analysts continues to rise while the costs of storage and computation become progressively lower, Trifacta intends to enhance the ability of analysts to more effectively manipulate, mine and derive insights from massive amounts of structured data. In an interview with VentureBeat, TriFacta’s co-founder and CEO Joe Hellerstein elaborated on the company’s mission as follows:

There is a lot of talk about engines and algorithms for unlocking value in data. But real value comes from the people who drive the analysis. The question is how you get data into the form where people can get some value out of it.

Similarly, Ping Li, head of Accel’s Big Data fund elaborated on his fund’s interest in Trifacta by noting:

The world doesn’t need another Hadoop or SQL company. The biggest problem with big data is around the ability to get information out of it. That gap is huge, and it’s not going to be solved anytime soon. This is really the soft underbelly of big data right now.

Hellerstein and Ping Li both point to the importance of facilitating access to business insights from Big Data in contrast to merely delivering an enterprise grade storage solution. Trifacta was founded as a result of collaborations between computer scientists at UC Berkeley and Stanford University. The company’s leadership team features cofounder Joe Hellerstein, former Professor of Computer Science at UC Berkeley, Jeffrey Heer, Co-Founder & Chief Experience Officer and Sean Kandel as CTO, whose Ph.D. dissertation research at Stanford University examined interactive products for manipulating data. CXO Jeffrey Heer is also an Assistant Professor of Computer Science at Stanford University, where he leads the Stanford Visualization Group. Specific details of the company’s solutions remain under wraps at present, though Trifacta’s website reports that the company is busily preparing details of solutions for public release while it gears up for a round of aggressive hiring.

RainStor Finalizes $12 Million In Series C Funding

This week, RainStor announced the finalization of $12 million in Series C funding from Credit Suisse and Rogers Venture Partners, with additional participation from existing investors Doughty Hanson Technology Ventures, Storm Ventures and The Dow Chemical Company. RainStor plans to use the funding to enhance product development and further develop its sales and marketing team. RainStor has two “editions” of a Big Data product that enables enterprises to more effectively store and conduct analytics on massive amounts of structured and unstructured data: Big Data Retention and Big Data Analytics On Hadoop. Big Data Retention allows enterprises to effectively store and access massive amounts of historical data that is used less frequently than mission critical data stores. RainStor’s Big Data Analytics On Hadoop empowers enterprises to perform analytics on petabytes of structured data.

RainStor’s Big Data solutions can be flexibly deployed across a number of IT infrastructures including SAN, NAS, CAS and cloud-based platforms. Moreover, its Big Data products allow for queries using SQL, popular BI products and MapReduce when running on the Hadoop Distributed File System. RainStor’s Big Data platform leverages patented compression technology to store and retrieve massive amounts of structured data at low cost. Peter Norley, Managing Director at Credit Suisse, remarked on the importance of RainStor’s Big Data platform to the financial services industry as follows:

Driven by compliance regulations, banks and financial institutions are now required to retain and analyze petabytes of data. Compounded by rapid growth, current needs exceed the capacity of existing database and data warehouse environments. RainStor has built a unique combination of database capabilities that have proven essential for financial institutions in order to sustain growth levels in the most cost effective way, while meeting regulatory needs.

Here, Norley elaborates on how RainStor’s offering enables financial institutions to comply with regulations that dictate the preservation of massive amounts of data. Compliance regulations dictate that enterprises confront data storage needs that exceed the capacities of current warehousing options and consequently require a Big Data offering such as RainStor’s. Used by over 100 enterprises for Big Data management and analytics, RainStor’s Big Data platform stands poised to build on its unique branding as a nimble, cost effective, customer-centric Big Data platform with “the highest level of compression on the market” in addition to advanced querying capabilities.

Etsy, Airbnb And The Climate Corporation Use Concurrent’s Cascading Big Data Application For Hadoop Programming

Concurrent Inc. has recently announced that enterprise customers such as Airbnb, Etsy and The Climate Corporation are using Concurrent’s Big Data management application Cascading in combination with Amazon Elastic MapReduce to manage Big Data processing in Hadoop. Cascading is a Big Data processing application that allows developers to use an API to construct data processing and analytic operations on Apache Hadoop clusters without leveraging advanced programming languages such as Pig and Hive. In comparison to Pig and Hive, Cascading enables programmers to write Hadoop-related code with comparable granularity and superior job orchestration and management capabilities. A Java application, Cascading can be used within both a private data center environment as well as a cloud based development ecosystem. Airbnb uses Cascading to “determine factors driving room bookings as well as user drop-off” whereas Etsy’s Cascading deployment “powers all A/B analysis, a variety of analytics and dashboards, behavioral inputs to our search index.”

Cascading’s use across of a number of industry verticals for Apache Hadoop programming and analytics points to a quiet revolution in the Big Data world marked by the increasing currency of programming frameworks that simplify and streamline the construction of data processing tasks within a Hadoop cluster. Speaking of the milestone constituted by Cascading’s usage by customers such as Etsy and Airbnb, Concurrent CEO Chris Wensel noted that Cascading “has been battle tested in rigorous production environments for many years. Developers rely on Cascading and the growing ecosystem of community sponsored projects to build complex data intensive applications that drive their business.” Expect more and more enterprises to leverage Cascading to simplify Hadoop-programming both within cloud environments and traditional data center infrastructures as the demand for big data analytics intensifies both in scope and business urgency.

Mojix Uses Hadoop-based Big Data Analytics For RFID

Los Angeles-based Mojix has recently revolutionized the RFID space as a result of applying deep space signal processing technology communications to the commercial RFID industry. Mojix’s STAR 3000 technology can pick up signals from RFID tags as much as 600 feet away from a sensing device, which represents a twenty fold improvement over existing systems. The unprecedented quality of its ability to track merchandise at long distances gives customers a correspondingly unique ability to track the location and trajectory of their products. Moreover, the Mojix Star 3000 technology boasts a “cloud-based hosted server, enabling users to deploy a system at a lower cost by eliminating the need to acquire software,” in addition to cloud and virtualization options that empower customers to track their assets at multiple locations within the supply chain.

Prior to founding Mojix, the company’s CEO, Dr. Ramin Sadr, led a team of NASA scientists working on problems in deep space communications involving the creation of NASA receivers and the capture of telemetry data from the Galileo spacecraft mission. After leaving NASA, Dr. Sadr extended his work in wireless communications to RFID and founded Mojix to deliver RFID solutions to customers in the automotive, distribution, manufacturing, oil & gas and retail verticals. In addition to tracking the location of customer merchandise, Mojix’s big data analytics unlock operational and strategic insights about the distribution and supply chain experience of the assets of its customers, more generally.

Mojix’s deployment of Big Data technology represents a use case regarding the application of Big Data separate and distinct from the common big data use case of mining massive amounts of structured and unstructured web related data. Mojix’s big data technology tracks the movement, in real-time, of millions of pieces of merchandise across supply chains in different verticals. CEO Dr. Ramin Sadr elaborated on the company’s use of Big Data in an interview with Cloud Computing Today as follows:

“Big Data powered by Mojix’s wide area RFID reader network provides an unparalleled level of performance, enterprise wide, in terms of visibility, traceability, storage and streaming capacity, advanced visualization and data mining. Mojix’s approach is centered around a Big Data computational platform, running within the Hadoop framework, to scale as a customer’s demand grows over time by rolling out heterogeneous sensor networks, ranging from passive RFID and sensor networks to GPS and smartphones, driving the next evolution of the ‘Internet-of-Things’.”

Dr. Sadr reveals that Mojix embraced a Hadoop-based “Big Data computational platform” that enables “advanced visualization and data mining.” Each Hadoop cluster is designed to scale as the volume of merchandise multiplies, and the number and type of sensors attached to each unit increases. Resulting analytics provide enterprises with a high degree of visibility into the location and behavior of their products that can easily be amplified as more data, of different varieties, is collected and funneled into Mojix’s Hadoop-based big data platform. Moreover, customers have the option of transferring their RFID data to their own data warehouse to run internal analytics as desired.

Mojix’s Big Data technology platform and analytics positions it on the cusp of the big data revolution in terms of delivering strategic insights to customers of the highest quality. The innovation of Mojix Star 3000 involves not only its ability to detect merchandise at longer distances than are common in RFID, but also its capability to mine and produce meaningful operational analytics on the data that it collects. Expect to hear more details about Mojix’s innovative use of Hadoop and its data analytics platform in the coming months as its innovative hardware consolidates its footprint in the RFID industry.

Cascading 2.0 Streamlines Hadoop-based Big Data Analysis And Development

This week, Concurrent Inc. announced the release of Cascading 2.0, an application framework that streamlines the process of creating Hadoop applications for Java developers. An open source alternative to MapReduce, the product provides an API and framework for constructing complex data processing tasks within a Hadoop cluster. Cascading features an abstraction platform wherein data captured from raw data sources is channeled into “pipes” that execute data analysis jobs and processes. In combination with data sources, “pipes” and data sources and data sinks are referred to as a “data flow.” Flows can converge into a “cascade” that can be managed and scheduled using Cascading 2.0’s scheduling system. Cascading 2.0 also allows developers to detach applications by running them in memory and testing them on smaller data sets.

The Cascading 2.0 API also empowers developers to:

• Model and explore structured and unstructured data
• Transfer applications from development to production environments
• Use familiar Java languages to develop applications and processes within a Hadoop cluster without learning MapReduce

Cascading is licensed under version 2.0 of an Apache Software License. Concurrent’s CEO Chris Wensel elaborated on Cascading 2.0’s value proposition for organizations building Hadoop-based applications as follows:

Building applications on Hadoop, despite its growing adoption in the enterprise, is notoriously difficult. We are driving the future of application development and management on Hadoop, by allowing enterprises to quickly extract meaningful information from large amounts of distributed data and better understand the business implications. We make it easy for developers to build powerful data processing applications for Hadoop, without requiring months spent learning about the intricacies of MapReduce.

As Wensel suggests, Cascading stands poised to play a pivotal role in the big data revolution by transforming the way in which developers create and manage Hadoop-based applications. With enterprises such as Etsy, Razorfish, Trulia and Twitter using Cascading for data analysis and discovery, Cascading has garnered an early foothold in the market for software that streamlines development. Expect enterprises to deploy software such as Cascading 2.0 as developers and data scientists gravitate toward simplified ways of managing data processing in a Hadoop cluster.