Hadoop

MapR Announces Support For All Five Components Of Apache Spark In Its Hadoop Distribution

On Thursday, MapR Technologies announced that it will be adding Apache Spark to its Hadoop distribution by means of a partnership with Databricks, the principal steward behind Apache Spark. Apache Spark facilitates the development of big data applications that specialize in interactive analytics, real-time analytics, machine learning and stream processing. In contrast to MapReduce, Apache Spark provides a greater range of data operators such as “mappers, reducers, joins, group-bys, and filters” that permit the modeling of more complex data flows than are available simply via map and reduce operations. Moreover, because Spark stores the results of data operators in memory, it enables low latency computations and increased efficiencies on iterative calculations that operate on in memory computational results. Spark is additionally known for its ability to automate the parallelization of jobs and tasks in ways that optimize performance and correspondingly relieve developers of the responsibility of sequencing the execution of jobs. Apache Spark can improve application performance by a factor of between 5 and 100 while its programming abstraction framework, which is based on distributed unchanging aggregations of data known as Resilient Distributed Datasets, reduces the amount of code required by 80%. MapR will support all five components of the Spark stack, namely, Shark, Spark Streaming, MLLib, GraphX and Spark R. The five components of Apache Spark illustrate the versatility of Apache Spark insofar as they can support applications that interface with streaming datasets, machine learning and graph-based applications, R and SQL. MapR’s decision to support the entire Spark stack diverges from its competitor Cloudera, which does not support Shark, the SQL on Hadoop component of Apache Spark that competes with Cloudera’s Impala product, as reported in GigaOM. All told, today’s announcement represents a small but significant attempt by MapR to reclaim the relevance of its Hadoop distribution in the wake of Cloudera’s $900M funding announcement and the $100M in funding recently secured by Hortonworks. That said, we should expect MapR to follow suit with a similar capital raise soon, even though its CMO Jack Norris claims that “with 500 paid customers the company is profitable and able to continue being successful from its current position.”

Categories: Big Data, Hadoop, MapR | Tags:

Pivotal Releases Pivotal Big Data Suite With Pricing Per Core And Annual Subscription

Not to be outdone by the slew of product and price announcements from Google, Amazon Web Services and Microsoft over the past week, EMC-VMware spinoff Pivotal announced a new product offering branded the Pivotal Big Data Suite on Wednesday. The platform delivers Pivotal Greenplum Database, Pivotal GemFire, Pivotal SQLFire, Pivotal GemFire XD and Pivotal HAWQ, in addition to unlimited use of Pivotal’s Hadoop distribution Pivotal HD. Because the Pivotal Big Data Suite is priced on the basis of an annual subscription for all software and services, in addition to per core pricing for computing resources, customers need not fear additional fees related to software licensing or customer support over and beyond the subscription price. Moreover, customers essentially have access to a commercial-grade Hadoop distribution for free as part of the subscription price. Pivotal compares the Big Data Suite to a “swiss army knife for Big Data” that enables customers to “use whatever tool is right for your problem, for the same price.” Customers have access to products such as Greenplum’s massively parallel processing (MPP) architecture-based data warehouse, GemFire XD’s in-memory distributed Big data store for real-time analytics with a low latency SQL interface and HAWQ’s SQL-querying ability for Hadoop. Taken together, the Pivotal Big Data Suite edges towards the realization of Pivotal One, an integrated solution that performs Big Data management and analytics for ecosystems of applications, real-time data feeds and devices that can serve the data needs of the internet of things, amongst other use cases. More importantly, the Pivotal Big Data Suite represents the most systematic attempt to productize Big Data solutions in the industry at large, even if it is composed of an assemblage of heterogeneous products under one roof. The combination of access to a commercial grade Hadoop distribution (Pivotal HD), a data warehouse designed to store petabytes of data (Pivotal Greenplum) and closed loop real-time analytics solutions (Pivotal GemFire XD) within a unified product offering available via an annual subscription and per core pricing constitutes an offer not easy to refuse for anyone seriously interested in exploring the capabilities of Big Data. The bottom line is that Pivotal continues to push the envelope with respect to Big Data technologies although it now stands to face the challenge posed by cash flush Cloudera, which recently finalized $900M in funding and a strategic and financial partnership with Intel.

Categories: Big Data, Hadoop, Pivotal | Tags: , , , , ,

Intel Invests $740M In Cloudera As Part Of Broader Strategic Big Data Initiative

In a stunning move that is likely to shape the Big Data space for years, Intel recently decided to partner with Cloudera to support its Hadoop distribution rather than enhancing Intel’s own Hadoop distribution. Cloudera will optimize its Hadoop distribution (CDH) to work with Intel’s hardware technology and Intel, conversely, will promote CDH as the Hadoop distribution of choice of enterprise Big Data analytics and the internet of things. Meanwhile, Intel will contribute insights from its own Hadoop distribution to Cloudera’s Hadoop distribution (CDH) and the resulting integration will be rendered available as part of Cloudera’s open source Hadoop initiatives. The partnership between Intel and Cloudera also featured an equity investment by Intel between $740M to $760M that translates into an 18% ownership stake in Cloudera. The $740M invested by Intel brings Cloudera’s recent funding raises to roughly $900M subsequent to its $160M funding raise in mid-March. Intel will join Cloudera’s board of directors and become “Cloudera’s largest strategic shareholder.” According to its press release, Intel’s investment in Cloudera represents Intel’s “single largest data center technology investment in its history.” Intel’s strong presence in countries such as India and China where Cloudera has thus far failed to gain traction means that the partnership stands to dramatically expand Cloudera’s global market share significantly. More importantly, however, Intel’s deep integration with the technologies in almost every datacenter worldwide render it a formidable ally for Cloudera to fulfill its aspiration of becoming the leading Hadoop distribution in the world in ways that promise to transform computing hardware as well as the Hadoop distributions that integrate with Intel’s Xeon technology.

Categories: Big Data, Cloudera, Hadoop, Intel, Venture Capital | Tags: ,

Zettaset Extends Hadoop Encryption To Data-In-Motion

Building upon its November announcement regarding Zettaset Orchestrator’s support for the encryption of Hadoop data at rest, Zettaset today announced the Orchestrator platform’s support for the encryption of data in motion. The addition of encryption in motion functionality to the Zettaset platform enables encryption of connections between nodes within a Hadoop cluster, all interfaces to the Orchestrator management console, connectors to business intelligence platforms and all communication links more generally. Zettaset Orchestrator’s support of data-in-motion encryption positions the platform to provide encryption to cloud-based Hadoop deployments on platforms such as Amazon Web Services Elastic MapReduce (EMR), or Hadoop as a Service solutions offered by vendors such as Qubole and Xplenty.

Zettaset delivers an enterprise-grade Big Data management platform that specializes in security, high availability and performance as illustrated by the graphic below:

The platform supports high availability by means of automated failover services. Moreover, Zettaset Orchestrator offers activity monitoring for compliance and auditing purposes, role based access control for HiveServer2 and HDFS, and integration with Active Directory and LDAP as revealed by CEO Jim Vogt in an interview with Cloud Computing Today. Compatible with all major Hadoop distributions, Zettaset aims to deliver encryption as part of a broader security package that also features identity management and access control in ways that facilitate compliance with regulatory frameworks such as HIPAA and PCI. Today’s announcement about the platform’s support for data-in-motion encryption positions the Mountain View-based company to compete in the hotly contested cloud encryption space. Unlike the likes of cloud encryption vendors CipherCloud and Vaultive, however, Zettaset’s combination of commitments to high availability and integrated product security renders it unique within the Hadoop management and security space. As more and more enterprises tackle the challenges of operationalizing Big Data, expect Zettaset’s data-at-rest and data-in-motion encryption functionality to propel an intensification of its early traction within the healthcare and financial services verticals as customers increasingly seek a turnkey Big Data management platform that manages Hadoop encryption, access, compliance reporting and availability.

Categories: Big Data, Hadoop, Zettaset | Tags: , , , ,

Pivotal HD 2.0 Features Support For Apache Hadoop 2.2 And General Availability Of GemFire XD

This week, EMC and VMware spinoff Pivotal announced the availability of Pivotal HD 2.0, a commercial distribution of Apache Hadoop that now features support for Apache Hadoop 2.2. Moreover, Pivotal also revealed the general availability of Pivotal GemFire XD, a SQL compliant, in-memory database designed for real-time analytics for Big Data processing. In its initial release, Pivotal GemFire XD represents an in-memory distributed data store that “provides a low-latency SQL interface to in-memory table data, while seamlessly integrating data that is persisted in HDFS.” Because GemFire brings the power of real-time analytics to Hadoop, it empowers mobile providers to run complex algorithms on incoming calls to route the call appropriately, or geospatial navigation systems to alter suggested routes based on incoming data about traffic and weather conditions. Like Apache Spark, a parallel data processing framework that facilitates real-time analytics on Hadoop, GemFire enables real-time Big Data analytics but is explicitly designed for data environments with high demands for scalability and availability. Michael Cucchi, Pivotal’s senior director of product marketing, commented on Pivotal’s interest in Spark and GemFire XD in an interview with InformationWeek as follows:

We’re excited about Spark and will support it, but it’s generally used for [data] ingest or caching,” GemFire XD is an ANSI-compliant SQL database with high-availability features, and it can run over wide-area networks, so you can have an instance in Europe and another in North America with replication.

Built on the vFabric SQLFire product that belongs to the category of NewSQL databases noted for high performance and scalability, GemFire XD is adds features such as HDFS-persistence and off-heap memory storage for table data. In addition to GemFire XD, Pivotal 2.0 also features an integration with GraphLab for graphing analytics as well as enhancements to HAWQ such as support for MADlib, R, Python, Java, and Parquet. Overall, Pivotal 2.0 represents a notable advancement over Pivotal 1.1 that brings the power of YARN, real-time analytics via GemFire XD and graphing technology to Hadoop and Big Data processing and analytics. With Pivotal HD 2.0 released less than 6 months after the November 1, 2013 release of Pivotal HD 1.1, Pivotal promises to innovate in the Big Data space at the same dizzying rate with which Amazon Web Services innovates with regard to cloud computing technologies and platforms. Expect to hear more about the conjunction of real-time analytics and graphing technologies on Hadoop via Pivotal 2.0 as customer use cases proliferate and circulate throughout the Big Data space.

Categories: Big Data, Hadoop, Pivotal | Tags:

Intel Provides Preview Of Intel Data Platform And Analytics Toolkit For Big Data

Intel recently elaborated details of its Intel Data Platform, a suite of software applications designed to facilitate analytics on big data. The platform will complement the Intel Distribution for Apache Hadoop by providing a wealth of graph analytic and predictive modeling functionality via the Intel Data Platform: Analytics Toolkit that enables data scientists to derive actionable business intelligence from big data sets. In addition to raw analytic and data visualization capabilities, the Intel Data Platform features the ability to process streaming data sets and perform iterative and interactive analytics. The Intel Data Platform’s Analytics Toolkit provides users with algorithms related to graph analytics and machine learning that enable enhanced fraud detection, customer profiling and big data management and processing. For example, China Mobile Guangdong was able to implement online billing to the point where it could add up to 800,000 new records/second or up to 30 terabytes of data per month. Similarly, the platform has been used to help retailers nimbly respond to social media promotions by ensuring shelves are appropriately stocked in response to the spikes in consumer demand that result from promotions on platforms such as Twitter and Facebook, media announcements or seasonal changes including unanticipated weather. The Intel Data Platform exemplifies the proliferation of Big Data analytics solutions that have emerged as more and more enterprises perform experiments with Big Data of varying intensity. The platform will be available in Q2 of 2014 in Enterprise and Premium Editions that differ according to the degree of available customer support.

Categories: Big Data, Hadoop, Intel | Tags: , , , ,

Infochimps Reveals Big Data Adoption Lifecycle Methodology Alongside Its Big Data PaaS

Infochimps today announced a Big Data platform as a service that integrates with existing enterprise IT infrastructures while adding Big Data management and analytic solutions. The Infochimps platform is based on open source, web-scale technologies in addition to a cloud-based deployment structure. One of the unique features of the Infochimps solution is that it gauges the position of customers with respect to Big Data management and subsequently recommends a path toward effectively operationalizing Big Data in conjunction with customer needs. To help customers understand how to realize their Big Data needs, Infochimps complements its Big Data platform as a service with a suite of consulting services designed to guide customers through the Big Data lifecycle. Jim Kaskade, Director of CSC’s open Big Data solutions, commented on the Infochimps methodology as follows:

We’ve defined distinct phases along the Big Data adoption lifecycle where companies fall. We identify our customers’ current state, and then carefully guide them to organization-wide operationalization of Big Data insights.

Infochimps shares the insight previously articulated by Paul Maritz, CEO of Pivotal, that with the exception of companies such as Google, Facebook and Twitter, few enterprises have come to terms with the project of effectively operationalizing Big Data. In an interview with Raj Dalal of Big Data Insights at Strata 2014, Kaskade claimed that approximately 50% of Big Data initiatives fail due to poorly scoped projects, excessive complexity within the Big Data technology landscape and internal political friction. In response, Infochimps proposes a comprehensive Big Data implementation methodology in addition to its PaaS platform. Details remain scant but we should expect to hear more at Strata and in the coming weeks about the Infochimps methodology for assessing the customer’s current state of Big Data and subsequently designing a programmatic path focused around integrating existing technology stacks with its Big Data PaaS. Infochimps was acquired by CSC in August 2013.

Categories: Big Data, Hadoop, Infochimps, Platform as a Service

Blog at WordPress.com. Customized Adventure Journal Theme.