Given recent concerns about the security of online data related to allegations of NSA spying, Microsoft is putting its stake in the ground by noting it will take additional measures to allow customers to store cloud-based data in data centers that reside in nations of their choosing. Microsoft’s general counsel, Brad Smith, noted that “people should have the ability to know whether their data…are being subject to laws in some other country and should have the ability to make an informed choice of where their data resides.” As noted in The New York Times, Microsoft’s option of allowing customers where to store their data for select applications such as Office 365, Dynamic CRM Online and Windows Azure predates Snowden’s revelations about NSA spying. But in an interview with The Financial Times, Microsoft’s Brad Smith indicated the company would be expanding the range of options available to customers with respect to data storage as it relates to national and regional boundaries. Details of Microsoft’s plans to enhance online data storage options remain scant, so we should expect to hear more from the Seattle tech behemoth in the days to come. For now, however, Microsoft’s decision to support options for data storage outside of the NSA’s purview raises a constellation of legal and philosophical questions about the rights of data owners to selectively store data transnationally in ways analogous to the debate about the legality of foreign-held Swiss bank accounts that are immune to certain fiduciary regulations, for example.
Microsoft To Expand Options Related To Geography Of Online Data Storage Given NSA-related Privacy Concerns
Microsoft announced plans to support an open-source version of Java on both its Windows Azure IaaS and PaaS platforms at last week’s O’Reilly Open Source Convention in Portland, Oregon. Microsoft will offer the Java Standard Edition (Java SE) and will work with Azul Systems to “build, certify and distribute a compliant OpenJDK-based distribution meeting the Java SE specification for use with Windows Server environments on Azure.” Azul will collaborate with Microsoft’s wholly-owned subsidiary Microsoft Open Technologies to develop the new OpenJDK in an effort that will focus largely on compliance, standards and specifications given Microsoft’s experience of being sued by Sun Microsystems for developing a non-compliant version of Java. Sunnyvale, CA-based Azul Systems is an experienced provider of Java runtime to enterprises that specializes in optimizing enterprise usage of Java by improving performance, scalability, latency, response times and consistency. Azul will license the OpenJDK on Azure under a GNU General Public License (GPL) version 2 and certify it for compliance with Java SE.
Microsoft’s support of Java on its Azure platform comes in the wake of a partnership announced in June whereby Oracle software such as Java will be certified and supported by Oracle to run on the Azure platform and Microsoft’s Hyper-V virtualization technology.
This was the week where Microsoft announced the general availability of Windows Azure Infrastructure as a Service. More than a simple declaration of production-grade availability, Microsoft’s announcement about its IaaS platform delivered the strongest possible elaboration of its intent to compete head to head with Amazon Web Services in the IaaS space to date. In a blog post, Microsoft’s Bill Hilf accurately assessed enterprise readiness with respect to cloud adoption by noting that customers are not interested in replacing traditional data centers with cloud based environments. Customers typically want to supplement existing data infrastructures with IaaS and PaaS installations alongside private cloud environments and traditional data center ecosystems. In other words, hybridity is the name of the game with respect to enterprise cloud adoption at present, and Hilf’s argument is that no one is better suited to recognize and respond to that hybridity than Microsoft. In conjunction with the general availability of its Azure IaaS platform, Microsoft pledges a commitment to “match Amazon Web Services prices for commodity services such as compute, storage and bandwidth” alongside “monthly SLAs that are among the industry’s highest.”
Microsoft also announced new, larger Virtual Machine sizes on the order of 28GB/4 core and 56 GB/8 core in addition to new Virtual Machine image templates featuring a gallery of image templates including Windows Server 2012, Windows Server 2008 R2, SQL Server, BizTalk Server and SharePoint Server as well as VM templates for applications that run on Ubuntu, CentOS, and SUSE Linux distributions. Overall, the announcement represents an incisive and undisguised assault on the market dominance of Amazon Web Services within the IaaS space that is all the more threatening given Microsoft’s ability to match AWS in price, functionality and service. The key question now is the degree to which OpenStack and Google’s Google Compute Engine (GCE) will emerge as major players within the IaaS space. OpenStack has already emerged as a major IaaS player, but it remains to be seen which distribution will take the cake at the enterprise level. Nevertheless, analysts should expect a tangible reconfiguration of IaaS market share by the end of 2013, with a more significant transformation in place roughly a year from the release in general availability of Google’s Compute Engine, which was released in Beta in June 2012.
From December 28 to December 30, Microsoft’s Windows Azure platform experienced an outage for its South Central US Region that arrived head upon heels after the Amazon Web Services Christmas eve outage that became famous for incapacitating Netflix. The outage was first reported by Microsoft at 3:16 PM UTC on December 28 with the news that a networking issue was “partially affecting the availability of Storage service in the South Central US subregion” on its Windows Azure Service Dashboard. Hours later, Microsoft noted that the outage was affecting the ability to display the status of service for all other regions, even though service itself was unaffected outside the South Central US Region.
The first substantial elaboration on the cause of the outage came six hours after the disclosure of the outage at December 28, 9:16 PM UTC:
The repair steps are taking longer because it involves recovery of some faulty nodes on the impacted cluster. We expect this activity to take a few more hours. Further updates will be published after the recovery is complete. We apologize for any inconvenience this causes our customers. Note: The availability is unaffected for all other services and sub-regions. We are currently unable to display the status of the individual services and sub-regions due to the above mentioned issue.
Here, Microsoft specifies that the root cause of the problem consisted of “faulty nodes on the impacted cluster,” and that repair would be complete within a few hours. But 9 hours after this specification—or within 15 hours of the initial announcement—the Azure team announced that the problems which affected the recovery of the affected nodes was “likely to take a significant amount of time.” The impact on the creation of new VM jobs and Service Management operations had been addressed, in the meantime, but the full and complete recovery of the cluster would take more time.
On December 30, 9:00 PM UTC, the Azure team reported:
The repair steps are still underway to restore full availability of Storage service in the South Central US sub-region. Windows Azure provides asynchronous geo replication of Blob & Table data between data centers, but does not currently support geo-replication for Queue data or failover per Storage account. If a failover were to occur, it would impact all accounts on the affected Storage cluster, resulting in loss of Queue data and some recent Blob & Table data. To prevent this risk to customer data and applications, we are focusing on bringing the affected stamp back to full recovery in a healthy state. We continue to work to resolve this issue at the earliest and our next update will be before 6PM PST on 12/30/2012. Please contact Windows Azure Technical support for assistance. We apologize for any inconvenience this causes our customers.
With this announcement, impacted customers finally learn of the real root cause of the outage: the Azure platform currently fails to support georeplication for storage failover data and queue data. A failover such as the one experienced by affected clusters therefore results both in the loss of queue data as well as “recent Blob & Table data,” leading to a longer time to recover the faulty nodes on the affected cluster. Georeplication, recall, refers to the practice of maintaining replicas of customer data in locations that are hundreds of miles of apart in order to more effectively protect customers against data center outages. Azure Storage’s lack of support for georeplication of failover and queue data, however, led to the prolongation of the December 2012 outage.
The problem was finally, fully resolved at 10:16 AM UTC, December 31, 2012:
Storage is fully functional in the South Central US sub-region All customers should have access to their data. We apologize for any inconvenience this caused our customers.
Notable about the Microsoft Azure outage was its relative lack of media coverage in comparison to the Amazon Web Services outage, which lasted roughly 24 hours in comparison to 77 hours for the Azure outage. Granted, the Amazon Web Services outage affected Netflix, one of the IaaS industry’s most prominent customers alongside Zynga, but the contrast between the coverage accorded to each of these platforms illustrates the market dominance of Amazon Web Services as measured by the way in which its outages affect measurably more customers and end-users than other IaaS platforms. Another factor accounting for the relative disparity in media coverage between the AWS and Azure outages is AWS’s trademark painstaking post-mortem analysis of outages that Microsoft and all other vendors would do well to match in depth and specificity, going forward.
OpenStack decided to remove the code that supports Hyper-V despite a statement from Microsoft that pledged a “commitment” to working with OpenStack to resolve the issues with the unmaintained and broken code. The code would have allowed OpenStack users to deploy OpenStack on a cloud infrastructure that leverages the Hyper-V hypervisor. The code will be removed from OpenStack in conjunction with the release of Essex, the next software release, scheduled for the second quarter of 2012. The OpenStack change log justifies the removal of the supporting code for Hyper-V as follows:
HyperV has been unmaintained for several releases now. The unit tests are superficial, we have no way to test it, noone has stepped forward to maintain it, and for a very long time, we’ve not had any reports that it works. Furthermore, many improvements have been made across other hypervisor drivers that have not been done in the HyperV driver, so even if it worked, it would only expose a subset of the functionality that the other drivers do.
The change log indicates that “even if it worked,” Hyper-V would lag significantly behind the testing process for “other hypervisor drivers that have not been done in the HyperV driver.” The removal of the code will not impact any production deployments insofar as Joshua McKenty, CEO of Piston Cloud Computing and member of OpenStack’s Project Policy Board, commented: “I don’t know of any production deployment of it. I don’t know of any active development deployment of it.”
The more salient issue highlighted by the decision to remove Hyper-V is the lack of adoption of OpenStack by companies with Windows based cloud infrastructures. As noted by James Staten, Senior Analyst at Forrester Research, however, the lack of adoption of OpenStack in conjunction with a Windows Enterprise license is unsurprising given that a company building a cloud on an open source framework is not particularly likely to have an enterprise license for Windows based cloud software. Nevertheless, the removal of Hyper-V illustrates how companies with Windows based cloud infrastructures have yet to experiment with adding OpenStack to their IT environment as an additional cloud operating system, even for pilot or research purposes.
OpenStack is the largest collaboration on open source cloud computing in the world. The organization currently features the support of 149 companies and over 2300 individuals, with user groups in Australia, Austin (TX), Boston, China, Egypt, France, Indonesia, Japan, New York, San Francisco, Seattle, South Korea and Russia in addition to a Spanish language users group.
According to an IDG news article by Nancy Gohring, Microsoft Corporation is committed to supporting Hyper-V with OpenStack. Microsoft apparently released a statement claiming that it is “committed to working with the community to resolve the current issues with Hyper-V and OpenStack.” Microsoft had pledged support for Hyper-V in OpenStack deployments in October 2010 through a partnership with Cloud.com, but since then failed to support development of the OpenStack code supporting Hyper-V. Microsoft announced its commitment to supporting Hyper-V after OpenStack developer Thierry Carrez suggested removing support for Hyper-V from the forthcoming Essex release of OpenStack because the code was “broken and unmaintained.” Joshua McKenty, CEO of Piston Cloud Computing, Technical Architect of NASA’s Nebula Cloud Computing Platform and member of OpenStack’s Project Policy Board, remarked that he knows of no “production deployment” or “active development deployment” of Hyper-V in the OpenStack community. Microsoft has yet to release a more complete elaboration of its position on support for Hyper-V in OpenStack. The Essex release of OpenStack is scheduled for the second quarter of 2012.
On Tuesday, Oracle declared the availability of the Big Data appliance that it introduced to the world at its October conference Oracle Open World. The appliance runs on Linux and features Cloudera’s version of Apache Hadoop (CDH), Cloudera Manager for managing the Hadoop distribution, the Oracle NoSQL database as well as an open source version of R, the statistical software package. Oracle’s partnership with Cloudera in delivering its Big Data appliance goes beyond the latter’s selection as a Hadoop distributor to include assistance with customer support. Oracle plans to deliver tier one customer support while Cloudera will provide assistance with tier two and tier three customer inquiries, including those beyond the domain of Hadoop.
Oracle will run its Big Data appliance on hardware featuring 864 GB main memory, 216 CPU cores, 648 TB of raw disk storage, 40 Gb/s InfiniBand connectivity and10 Gb/s Ethernet data center connectivity. Oracle also revealed details of four connectors to its appliance with the following functionality:
• Oracle Loader for Hadoop to load massive amounts of data into the appliance by using the MapReduce parallel processing technology.
• Oracle Data Integrator Application Adapter for Hadoop which provides a graphical interface that simplifies the creation of Hadoop MapReduce programs.
• Oracle Connector R which provides users of R streamlined access to the Hadoop Distributed File System (HDFS)
• Oracle Direct Connector for Hadoop Distributed File System (ODCH), which supports the integration of Oracle’s SQL database with its Hadoop Distributed File System.
Oracle’s announcement of the availability of its Big Data appliance comes as the battle for Big Data market share takes shape in a landscape dominated by the likes of Teradata, Microsoft, IBM, HP, EMC, Informatica, MarkLogic and Karmasphere. Oracle’s selection of Cloudera as its Hadoop distributor indicates that it intends to make a serious move into the world of Big Data. For one, the partnership with Cloudera gives Oracle increased access to Cloudera’s universe of customers. Secondly, the partnership enhances the credibility of Oracle’s Big Data offering given that Cloudera represents that most prominent distributor of Apache Hadoop in the U.S.
In October, Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Whereas Oracle chose Cloudera for Hadoop distribution, Microsoft partnered with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. In late November, HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, which features the ability to process large-scale structured data sets in addition to a NoSQL interface for loading and analyzing structured and unstructured data. In December, EMC released its Greenplum Unified Analytics Platform (UAP) marked by the ability to load structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and Chorus, a collaboration and productivity software tool. Bolstered by its partnership with Cloudera, Oracle is set to compete squarely with HP’s Autonomy IDOL 10, EMC’s Greenplum Chorus and IBM’s BigInsights until Microsoft’s appliance officially enters the Big Data doohyoo (土俵) qua sumo ring as well.
If 2011 was the year of Cloud Computing, then 2012 will surely be the year of Big Data. Big Data has yet to arrive in the way cloud computing has, but the framework for its widespread deployment as a commodity emerged with style and unmistakable promise. For the first time, Hadoop and NoSQL gained currency not only within the developer community, but also amongst bloggers and analysts. More importantly, Big Data garnered for itself a certain status and meaning in the technology community even though few people asked about the meaning of big in “Big Data” in a landscape where the circle around the meaning of “big” with respect to “data” is constantly being redrawn. Even though yesterday’s “big” in Big Data morphed into today’s “small” as consumer personal storage transitions from gigabytes to terabytes, the term “Big Data” emerged as a term that everyone almost instantly understood. It was as if consumers and enterprises alike had been searching for years for a long lost term to describe the explosion of data as evinced by web searches, web content, Facebook and Twitter feeds, photographs, log files and miscellaneous structured and unstructured content. Having been speechless, lacking the vocabulary to find the term for the data explosion, the world suddenly embraced the term Big Data with passion.
Below are some of the highlights of 2011 with respect to big data:
•Teradata finalized a deal to acquire Big Data player Aster Data Systems for $263 million.
•Yahoo revealed plans to create Hortonworks, a spin-off dedicated to the commercialization of Apache Hadoop.
•Teradata announced the Teradata Aster MapReduce Platform that combines SQL with MapReduce. The Teradata Aster MapReduce Platform empowers business analysts who know SQL to leverage the power of MapReduce without having to write scripted queries in Java, Python, Perl or C.
•Oracle announced plans to launch a Big Data appliance featuring Apache Hadoop, Oracle NoSQL Database Enterprise Edition and an open source distribution of R. The company’s announcement of its plans to leverage a NoSQL database represented an abrupt about face of an earlier Oracle position that discredited the significance of NoSQL.
•Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Microsoft revealed a strategic partnership with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. Microsoft’s decision not to leverage NoSQL and use instead a Windows based version of Hadoop for SQL Server 2012 constituted the key difference between Microsoft and Oracle’s Big Data platforms.
•IBM announced the release of IBM Infosphere BigInsights application for analyzing “Big Data.” The SmartCloud release of IBM’s BigInsights application means that IBM beat competitors Oracle and Microsoft in the race to deploy an enterprise grade, cloud based Big Data analytics platform.
•Christophe Bisciglia, founder of Cloudera, the commercial distributor of Apache Hadoop, launched a startup called Odiago that features a Big Data product named WibiData. WibiData manages investigative and operational analytics on “consumer internet data” such as website traffic on traditional and mobile computing devices.
•Cloudera announced a partnership with NetApp, the storage and data management vendor. The partnership revealed the release of the NetApp Open Solution for Hadoop, a preconfigured Hadoop cluster that combines Cloudera’s Apache Hadoop (CDH) and Cloudera Enterprise with NetApp’s RAID architecture.
•Big Data player Karmasphere announced plans to join the Hortonworks Technology Partner Program today. The partnership enables Karmasphere to offer its Big Data intelligence product Karmasphere Analytics on the Apache Hadoop software infrastructure that undergirds the Hortonworks Data Platform.
•Informatica released the world’s first Hadoop parser. Informatica HParser operates on virtually all versions of Apache Hadoop and specializes in transforming unstructured data into a structured format within a Hadoop installation.
•MarkLogic announced support for Hadoop, the Apache open source software framework for analyzing Big Data with the release of MarkLogic 5.
•HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, a Next Generation Information Platform that integrates two of its 2011 acquisitions, Vertica and Autonomy. Autonomy IDOL 10 features Autonomy’s capabilities for processing unstructured data, Vertica’s ability to rapidly process large-scale structured data sets, a NoSQL interface for loading and analyzing structured and unstructured data and solutions dedicated to the Data, Social Media, Risk Management, Cloud and Mobility verticals.
•EMC announced the release of its Greenplum Unified Analytics Platform (UAP). The EMC Greenplum UAP contains the The EMC Greenplum platform for the analysis of structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and EMC Greenplum Chorus, a collaboration and productivity software tool that enables social networking amongst constituents in an organization that are leveraging Big Data.
The widespread adoption of Hadoop punctuated the Big Data story of the year so far. Hadoop featured in almost every Big Data story of the year, from Oracle to Microsoft to HP and EMC, while NoSQL came in a close second. Going into 2012, one of the key questions for the Big Data space concerns the ability of OpenStack to support Hadoop, NoSQL, MapReduce and other Big Data technologies. The other key question for Big Data hinges on the user friendliness of Big Data applications for business analysts in addition to programmers. EMC’s Greenplum Chorus, for example, democratizes access to its platform via a user interface that promotes collaboration amongst multiple constituents in an organization by transforming questions into structured queries. Similarly, the Teradata Aster MapReduce Platform allows business analysts to make use of its MapReduce technology by using SQL. That said, as Hadoop becomes more and more mainstream, the tech startup and data intensive spaces are likely to witness a greater number of data analysts trained in Apache Hadoop in conjunction with efforts by vendors to render Hadoop more accessible to programmers and non-programmers alike.