Big Data

Treasure Data Partners With Yahoo Japan To Promote Its Cloud-based Big Data Processing And Analytics Platform

Today, Treasure Data announces a partnership with Yahoo! JAPAN whereby Yahoo! JAPAN will resell the Treasure Data platform to customers interested in leveraging the platform’s Big Data capture, processing and analytics capabilities. Branded the Yahoo! JAPAN Big Data Insight, the collaboration between Treasure Data and Yahoo! JAPAN will allow organizations to store and run analytics on massive amounts of real-time data without managing the relevant hardware infrastructure or mastering the intricacies of MapReduce. The Treasure Data platform embodies the intersection between cloud computing and Big Data given that customers have the opportunity to take advantage of Treasure Data’s cloud for storing Big Data as illustrated below:

The graphic above illustrates the Treasure Data platform’s ability to collect, store and run real-time analytics on massive amounts of cloud-based data. Worth noting about the Treasure Data platform is that although the platform specializes in Big Data processing and analytics, data is not stored within the HDFS Hadoop data format. Instead of HDFS, the Treasure Data platform stores data as Plazma, its “own distributed columnar storage system” that boasts attributes such as scalability, efficiency, elasticity and a schema-less architecture. Plazma’s columnar storage structure means that queries can focus on swathes of data in contrast to the entire dataset, thereby enabling faster queries, more effective use of the platform’s schema-less data model and superior performance all around. Plazma is achieved by transforming row-based JSON data into a columnar format that optimizes storage and the processing of analytical queries. Treasure Data’s resulting analytical platform features use cases such as web-based data from software applications and mobile applications in addition to data from the internet of things such as appliances and wearable devices. Today’s announcement represents a huge coup for Treasure Data because of the co-branding of its technology alongside Yahoo, one of the industry’s experts in the storage, processing and analysis of Big Data. Moreover, the collaboration with Yahoo promises to strengthen Treasure Data’s market presence in Japan and potentially pave the way for greater market expansion into Asia and the Pacific Rim, more generally.

Categories: Big Data, Treasure Data, Yahoo

Informatica Big Data Edition Comes Pre-Installed On Cloudera QuickStart VM And Hortonworks Sandbox

Earlier this month, Informatica announced 60 day free trials of Informatica Big Data Edition for Cloudera QuickStart VM and the Hortonworks Sandbox. The 60 day trial means that the Informatica Big Data Edition will be pre-installed in the sandbox environments of two of the leading Hadoop distributions in the Big Data marketplace today. Developers using the Cloudera QuickStart VM and Hortwonworks Sandbox now have streamlined access to Informatica’s renowned big data cleansing, data integration, master data management and data visualization tools. The code-free, graphical user interface-based Informatica Big Data Edition allows customers to create ETL and data integration workflows as well as take advantage of the hundreds of pre-installed parsers, transformations, connectors and data quality rules for Hadoop data processing and analytics. The Informatica Big Data platform specializes in Hadoop profiling, parsing, cleansing, loading, enrichment, transformation, integration, analysis and visualization and reportedly improves developer productivity five-fold by means of its automation and visual interface built on the Vibe virtual data machine.

Although the Informatica Big Data Edition supports MapR and Pivotal Hadoop distributions, the free 60 day trial is currently available only for Cloudera and Hortonworks. Informatica’s success in seeding its Big Data Edition with Cloudera and Hortonworks increases the likelihood that developers will explore and subsequently use its Big Data Edition platform as a means of discovering and manipulating Big Data sets. As such, Informatica’s Big Data Edition competes with products like Trifacta that similarly facilitate the manipulation, cleansing and visualization of Big Data by means of a code free user interface that increases analyst productivity and accelerates the derivation of actionable business intelligence. On one hand, the recent proliferation of Big Data products that allow users to explore Big Data without learning the intricacies of MapReduce democratizes access to Hadoop–based datasets. That said, the ability of graphical user interface-driven Big Data discovery and manipulation platforms to enable the granular identification of data anomalies, exceptions and eccentricities that may otherwise become obscured by large-scale trend analysis remains to be seen.

Categories: Big Data, Hadoop, Informatica | Tags:

Base Enhances Sales Productivity Platform With Real-Time Analytics And Rich Data Visualization

Base, the CRM that leverages real-time data and analytics, recently announced the release of a bevy of new features and functionality that brings real-time, Big Data analytics to cloud-based sales productivity management. Base’s proprietary technology aggregates data from sources such as phone calls, in person meetings, social network-based prospects and news feeds and subsequently produces real-time notifications to sales professionals. As a result, sales teams can minimize their manual input of sales-related data and instead take advantage of the analytic and data visualization capabilities of the Base platform. The Base platform testifies to a qualitative shift within the CRM space marked by the delivery of enhanced automation to sales operations workflows resulting from the conjunction of real-time data, predictive analytics and data visualization. Uzi Shmilovici, CEO of Base, remarked on the positioning of Base within the larger CRM landscape as follows:

Base picks up where other CRMs have left off. Until now, legacy cloud Sales and CRM products like Salesforce have been accepted as ‘the norm’ by the enterprise market. However, recent advancements in big data, mobility and real-time computing reveal a need for a new generation of intelligent sales software that offers flexibility, visibility, and real-time functionality. If you’re using outdated technology that cannot adapt to the advanced needs of modern day sales teams, your competition will crush you.

Here, Shmilovici comments on the way in which big data, real-time analytics and the proliferation of mobile devices have precipitated the creation of a new class of sales applications that outstrip the functionality of “legacy cloud Sales and CRM products like Salesforce.” In a phone interview with Cloud Computing Today, Shmilovici elaborated on the ability of the Base platform to aggregate disparate data sources to produce rich, multivalent profiles of sales prospects that augment the ability of sales teams to convert leads into qualified sales. Base’s ability to enhance sales operations by means of data-driven analytics are illustrated by the screenshot below:

The graphic above illustrates the platform’s ability to track sales conversions at the level of individual sales professionals as well as sales managers or owners within a team. VPs of Sales can customize analytics regarding the progress of their teams to enable enhanced talent and performance management in addition to gaining greater visibility as to where the market poses its stiffest challenges. More importantly, however, Base delivers a veritable library of customized analytics that illustrates a prominent use case for the convergence of cloud computing, real-time analytics and Big Data technologies. As such, the success of the platform will depend on its ability to continue enhancing its algorithms and analytics while concurrently enriching the user experience that remains integral to the daily experience of sales teams.

Categories: Big Data, Miscellaneous | Tags: , , , ,

Teradata Acquires Hadoop Consulting And Strategy Services Firm Think Big Analytics

Teradata continued its spending spree by acquiring the Mountain View, CA-based Hadoop consulting firm Think Big Analytics on Wednesday. The acquisition of Think Big Analytics will supplement Teradata’s own consulting practice. Think Big Data Analytics, which has roughly 100 employees, specializes in agile SDLC methodologies for Hadoop consulting engagements that typically last more than a month but less than a quarter of a year. According to Teradata Vice President of Product and Services Marketing Chris Twogood, Teradata has “now worked on enough projects that it’s been able to build reusable assets” as reported in PCWorld. Think Big Analytics will retain its branding and its management team will remain at the company’s Mountain View office. Teradata’s acquisition of Think Big Analytics comes roughly two months after its purchase of Revelytix and Hadapt. Revelytix provides a management framework for metadata on Hadoop whereas Hadapt’s technology empowers SQL developers to manipulate and analyze Hadoop-based data. Teradata’s third Big Data acquisition in less than two months comes at a moment when the Big Data space is exploding with a proliferation of vendors that differentially tackle the problem of data discovery, exploration, analysis and visualization with respect to Hadoop-based data. The question now is whether the industry will experience early market consolidation as evinced by startups snapped up by larger vendors or whether the innovation that startups provide will be able to survive a land grab in the Big Data space initiated by larger, well capitalized companies seeking to complement their Big Data portfolio with newly minted Big Data products and technologies. Terms of Teradata’s acquisition of Think Big Analytics were not disclosed.

Categories: Big Data, Hadoop, Teradata | Tags:

Trifacta’s Deepened Integration With Tableau Streamlines Visualization Of Hadoop Data

Trifacta recently announced a deeper integration of its Data Transformation platform with Tableau, the leader in data visualization and business intelligence, as a key feature of the release of the Trifacta Data Transformation Platform 1.5. The Trifacta Data Transformation Platform 1.5 allows customers to export Trifacta data to a Tableau Data Extract format or register it with Hadoop’s HCatalog to facilitate the integration of Hadoop-based data from Trifacta into Tableau. Trifacta’s Chief Strategy Officer Joe Hellerstein remarked on the significance of the deeper integration with Tableau as follows:

Tableau creates huge opportunities for effectively analyzing data, but working with big data poses specific challenges. The most significant barriers come from structuring, distilling and automating the transfer of data from Hadoop. Our integration removes these barriers in a way that complements self-service data analysis. Now, Trifacta and Tableau users can move directly from big data in Hadoop to powerful, interactive visualizations.

Trifacta’s ability to output data to Tableau Data Extract format means that its customers can more seamlessly integrate Trifacta data with Tableau and reap the benefits of its renowned data visualization capabilities. The Trifacta Data Transformation platform specializes in enhancing analyst productivity in relation to Big Data sets by delivering a machine learning-based user interface that allows analysts to explore, transform, cleanse, visualize and manipulate massive data sets. Moreover, Trifacta’s predictive interaction technology iteratively learns from analyst behavior and offers users guided suggestions about productive paths for data discovery and exploration. The announcement of Trifacta’s deepened integration with Tableau means that Trifacta data which has experienced a process of transformation now encounters a streamlined segue to the Tableau platform. Meanwhile, the deepened partnership between the two vendors positions Tableau to consolidate its market positioning as the de facto business intelligence platform for Hadoop-based data.

Categories: Big Data, Hadoop, Trifacta

Q&A With Dave McCrory, CTO of Basho Technologies, Regarding Riak, Riak CS and the NoSQL Landscape

Cloud Computing Today recently had the privilege of speaking with Dave McCrory, CTO of Basho Technologies, about the NoSQL space and Basho’s competitive differentiation within the NoSQL landscape. McCrory elaborated on Basho’s Riak “open source, distributed database” by noting its high availability, scalability and ability to handle any type of data as follows:

Cloud Computing Today: How do you envision the NoSQL space? What are your high level impressions of the competitive landscape amongst NoSQL vendors?

Dave McCrory (Basho Technologies): The NoSQL industry has many players for various use cases, but overall it is still young, especially from the enterprise point of view. I’ve been involved in big data for quite some time, and as data continues to grow, the NoSQL industry will grow with it. As the early adopters begin to move to the early majority – we are positioned in that space for crossing that chasm. Looking at how people want to build applications and data we will see, as an industry, in the next few years nearly half of enterprises will embrace NoSQL technologies to deal with the problems that traditional databases cannot deal with. Other NoSQL providers like MongoDB have an amazing presence in the market as it has made it easy for developers to give it a try. From my understanding from the market view, at the same time, it is limited in the actual applications that can be used. With so many companies offering NoSQL solutions for specific use cases and the high demand for data management, I can only see the industry continuing to expand and thrive.

Cloud Computing Today: Where do you see Basho within the larger NoSQL space at present?

Dave McCrory (Basho Technologies): We’re looking to provide the strongest key value solution and object store we can – that’s our priority right now. Although we at Basho are still a fairly young company, I think our technology speaks for itself. Since starting at Basho in the spring, I’ve been able to work with the outstanding Basho engineers and I’m amazed by what they have accomplished. Riak and Riak CS use simplified administrative features and a key/value system which enable anyone with command line experience to build a cluster in less than 15 minutes. I believe that Riak’s simplicity and usability are what separates it from other companies in the NoSQL space.

Some of that usability is our differentiation expressed in terms of high availability, fault tolerance and the ability to scale well beyond many of our competitors.

Cloud Computing Today: What are the key differentiators of Riak? What does Basho have planned for Riak in subsequent releases in the near future?

Dave McCrory (Basho Technologies): Riak’s key differentiators are its ability to offer high availability, massive scale and a variety of data types. Since Riak stores data as binary it is able to handle any type of data, unlike other solutions. Its top features include operational ease at large scales, always-on availability, and the ability to add and remove nodes easily and quickly as needed.

We are unique in that we have built object storage on our foundation and offer both key value and object store from the same platform. We have a thriving community, but our go to market in very focused on the enterprise. That has resulted in almost 200 enterprise customers including a third of the Fortune 50.

We have a lot planned for Basho and Riak in the coming months. We recently launched Riak CS 1.5 which offers additional Amazon S3 compatibility, performance improvement in garbage collection processes, and new, simplified administrative features. We are releasing Riak 2.0 in the fall which will provide enhanced search capability, expanded data types and more customer control over consistency, and we are hosting the annual RICON conference in Las Vegas in October, so you’ll be hearing a lot from Basho the rest of the year!

Categories: Basho Technologies, Big Data, NoSQL | Tags: ,

Google’s Mesa Data Warehouse Takes Real Time Big Data Management To Another Level

Google recently announced development of Mesa, a data warehousing platform designed to collect data for its internet advertising business. Mesa delivers a distributed data warehouse that can manage petabytes of data while delivering high availability, scalability and fault tolerance. Mesa is designed to update millions of rows per second, process billions of queries and retrieve trillions of rows per day to support Google’s gargantuan data needs for its flagship search and advertising business. Google elaborated on the company’s business need for a new data warehousing platform by commenting on its evolving data management needs as follows:

Google runs an extensive advertising platform across multiple channels that serves billions of advertisements (or ads) every day to users all over the globe. Detailed information associated with each served ad, such as the targeting criteria, number of impressions and clicks, etc. are recorded and processed in real time…Advertisers gain fine-grained insights into their advertising campaign performance by interacting with a sophisticated front-end service that issues online and on-demand queries to the underlying data store…The scale and business critical nature of this data result in unique technical and operational challenges for processing, storing and querying.

Google’s advertising platform depends upon real-time data that records updates about advertising impressions and clicks in the larger context of analytics about current and potential advertising campaigns. As such, the data model requires the ability to accommodate atomic updates to advertising components that cascade throughout an entire data repository, consistency and correctness of data across datacenters and over time, the ability to support continuous updates, low latency query performance, scalability as illustrated by the ability to support petabytes of data and data transformation functionality that accommodates changes to data schemas. Mesa utilizes Google products as follows:

Mesa leverages common Google infrastructure and services, such as Colossus, BigTable and MapReduce. To achieve storage scalability and availability, data is horizontally partitioned and replicated. Updates may be applied at granularity of a single table or across many tables. To achieve consistent and repeatable updates, the underlying data is multi-versioned. To achieve update scalability, data updates are batched, assigned a new version number and periodically incorporated into Mesa. To achieve update consistency across multiple data centers, Mesa uses a distributed synchronization protocol based on Paxos.

While Mesa takes advantage of technologies from Colossus, BigTable, MapReduce and Paxos, it delivers a degree of “atomicity” and consistency lacked by its counterparts. In addition, Mesa features “a novel version management system that batches updates to achieve acceptable latencies and high throughput for updates.” All told, Mesa constitutes a disruptive innovation in the Big Data space that extends the attributes of atomicity, consistency, high throughput, low latency and scalability on the scale of trillions of rows toward the end of a “petascale data warehouse.” While speculation proliferates about the possibilities for Google to append Mesa to its Google Compute Engine offering or otherwise open-source it, the key point worth noting is that Mesa represents a qualitative shift with respect to the ability of a Big Data platform to process petabytes of data that experiences real-time flux. Whereas the cloud space is accustomed to seeing Amazon Web Services usher in breathtaking innovation after innovation, time and time again, Mesa conversely underscores Google’s continuing leadership in the Big Data space. Expect to hear more details about Mesa at the Conference on Very Large Data Bases next month in Hangzhou, China.

Categories: Big Data, Google | Tags: , , , , , ,

Create a free website or blog at WordPress.com. The Adventure Journal Theme.