Given recent concerns about the security of online data related to allegations of NSA spying, Microsoft is putting its stake in the ground by noting it will take additional measures to allow customers to store cloud-based data in data centers that reside in nations of their choosing. Microsoft’s general counsel, Brad Smith, noted that “people should have the ability to know whether their data…are being subject to laws in some other country and should have the ability to make an informed choice of where their data resides.” As noted in The New York Times, Microsoft’s option of allowing customers where to store their data for select applications such as Office 365, Dynamic CRM Online and Windows Azure predates Snowden’s revelations about NSA spying. But in an interview with The Financial Times, Microsoft’s Brad Smith indicated the company would be expanding the range of options available to customers with respect to data storage as it relates to national and regional boundaries. Details of Microsoft’s plans to enhance online data storage options remain scant, so we should expect to hear more from the Seattle tech behemoth in the days to come. For now, however, Microsoft’s decision to support options for data storage outside of the NSA’s purview raises a constellation of legal and philosophical questions about the rights of data owners to selectively store data transnationally in ways analogous to the debate about the legality of foreign-held Swiss bank accounts that are immune to certain fiduciary regulations, for example.
Microsoft To Expand Options Related To Geography Of Online Data Storage Given NSA-related Privacy Concerns
Microsoft announced plans to support an open-source version of Java on both its Windows Azure IaaS and PaaS platforms at last week’s O’Reilly Open Source Convention in Portland, Oregon. Microsoft will offer the Java Standard Edition (Java SE) and will work with Azul Systems to “build, certify and distribute a compliant OpenJDK-based distribution meeting the Java SE specification for use with Windows Server environments on Azure.” Azul will collaborate with Microsoft’s wholly-owned subsidiary Microsoft Open Technologies to develop the new OpenJDK in an effort that will focus largely on compliance, standards and specifications given Microsoft’s experience of being sued by Sun Microsystems for developing a non-compliant version of Java. Sunnyvale, CA-based Azul Systems is an experienced provider of Java runtime to enterprises that specializes in optimizing enterprise usage of Java by improving performance, scalability, latency, response times and consistency. Azul will license the OpenJDK on Azure under a GNU General Public License (GPL) version 2 and certify it for compliance with Java SE.
Microsoft’s support of Java on its Azure platform comes in the wake of a partnership announced in June whereby Oracle software such as Java will be certified and supported by Oracle to run on the Azure platform and Microsoft’s Hyper-V virtualization technology.
This was the week where Microsoft announced the general availability of Windows Azure Infrastructure as a Service. More than a simple declaration of production-grade availability, Microsoft’s announcement about its IaaS platform delivered the strongest possible elaboration of its intent to compete head to head with Amazon Web Services in the IaaS space to date. In a blog post, Microsoft’s Bill Hilf accurately assessed enterprise readiness with respect to cloud adoption by noting that customers are not interested in replacing traditional data centers with cloud based environments. Customers typically want to supplement existing data infrastructures with IaaS and PaaS installations alongside private cloud environments and traditional data center ecosystems. In other words, hybridity is the name of the game with respect to enterprise cloud adoption at present, and Hilf’s argument is that no one is better suited to recognize and respond to that hybridity than Microsoft. In conjunction with the general availability of its Azure IaaS platform, Microsoft pledges a commitment to “match Amazon Web Services prices for commodity services such as compute, storage and bandwidth” alongside “monthly SLAs that are among the industry’s highest.”
Microsoft also announced new, larger Virtual Machine sizes on the order of 28GB/4 core and 56 GB/8 core in addition to new Virtual Machine image templates featuring a gallery of image templates including Windows Server 2012, Windows Server 2008 R2, SQL Server, BizTalk Server and SharePoint Server as well as VM templates for applications that run on Ubuntu, CentOS, and SUSE Linux distributions. Overall, the announcement represents an incisive and undisguised assault on the market dominance of Amazon Web Services within the IaaS space that is all the more threatening given Microsoft’s ability to match AWS in price, functionality and service. The key question now is the degree to which OpenStack and Google’s Google Compute Engine (GCE) will emerge as major players within the IaaS space. OpenStack has already emerged as a major IaaS player, but it remains to be seen which distribution will take the cake at the enterprise level. Nevertheless, analysts should expect a tangible reconfiguration of IaaS market share by the end of 2013, with a more significant transformation in place roughly a year from the release in general availability of Google’s Compute Engine, which was released in Beta in June 2012.
From December 28 to December 30, Microsoft’s Windows Azure platform experienced an outage for its South Central US Region that arrived head upon heels after the Amazon Web Services Christmas eve outage that became famous for incapacitating Netflix. The outage was first reported by Microsoft at 3:16 PM UTC on December 28 with the news that a networking issue was “partially affecting the availability of Storage service in the South Central US subregion” on its Windows Azure Service Dashboard. Hours later, Microsoft noted that the outage was affecting the ability to display the status of service for all other regions, even though service itself was unaffected outside the South Central US Region.
The first substantial elaboration on the cause of the outage came six hours after the disclosure of the outage at December 28, 9:16 PM UTC:
The repair steps are taking longer because it involves recovery of some faulty nodes on the impacted cluster. We expect this activity to take a few more hours. Further updates will be published after the recovery is complete. We apologize for any inconvenience this causes our customers. Note: The availability is unaffected for all other services and sub-regions. We are currently unable to display the status of the individual services and sub-regions due to the above mentioned issue.
Here, Microsoft specifies that the root cause of the problem consisted of “faulty nodes on the impacted cluster,” and that repair would be complete within a few hours. But 9 hours after this specification—or within 15 hours of the initial announcement—the Azure team announced that the problems which affected the recovery of the affected nodes was “likely to take a significant amount of time.” The impact on the creation of new VM jobs and Service Management operations had been addressed, in the meantime, but the full and complete recovery of the cluster would take more time.
On December 30, 9:00 PM UTC, the Azure team reported:
The repair steps are still underway to restore full availability of Storage service in the South Central US sub-region. Windows Azure provides asynchronous geo replication of Blob & Table data between data centers, but does not currently support geo-replication for Queue data or failover per Storage account. If a failover were to occur, it would impact all accounts on the affected Storage cluster, resulting in loss of Queue data and some recent Blob & Table data. To prevent this risk to customer data and applications, we are focusing on bringing the affected stamp back to full recovery in a healthy state. We continue to work to resolve this issue at the earliest and our next update will be before 6PM PST on 12/30/2012. Please contact Windows Azure Technical support for assistance. We apologize for any inconvenience this causes our customers.
With this announcement, impacted customers finally learn of the real root cause of the outage: the Azure platform currently fails to support georeplication for storage failover data and queue data. A failover such as the one experienced by affected clusters therefore results both in the loss of queue data as well as “recent Blob & Table data,” leading to a longer time to recover the faulty nodes on the affected cluster. Georeplication, recall, refers to the practice of maintaining replicas of customer data in locations that are hundreds of miles of apart in order to more effectively protect customers against data center outages. Azure Storage’s lack of support for georeplication of failover and queue data, however, led to the prolongation of the December 2012 outage.
The problem was finally, fully resolved at 10:16 AM UTC, December 31, 2012:
Storage is fully functional in the South Central US sub-region All customers should have access to their data. We apologize for any inconvenience this caused our customers.
Notable about the Microsoft Azure outage was its relative lack of media coverage in comparison to the Amazon Web Services outage, which lasted roughly 24 hours in comparison to 77 hours for the Azure outage. Granted, the Amazon Web Services outage affected Netflix, one of the IaaS industry’s most prominent customers alongside Zynga, but the contrast between the coverage accorded to each of these platforms illustrates the market dominance of Amazon Web Services as measured by the way in which its outages affect measurably more customers and end-users than other IaaS platforms. Another factor accounting for the relative disparity in media coverage between the AWS and Azure outages is AWS’s trademark painstaking post-mortem analysis of outages that Microsoft and all other vendors would do well to match in depth and specificity, going forward.
OpenStack decided to remove the code that supports Hyper-V despite a statement from Microsoft that pledged a “commitment” to working with OpenStack to resolve the issues with the unmaintained and broken code. The code would have allowed OpenStack users to deploy OpenStack on a cloud infrastructure that leverages the Hyper-V hypervisor. The code will be removed from OpenStack in conjunction with the release of Essex, the next software release, scheduled for the second quarter of 2012. The OpenStack change log justifies the removal of the supporting code for Hyper-V as follows:
HyperV has been unmaintained for several releases now. The unit tests are superficial, we have no way to test it, noone has stepped forward to maintain it, and for a very long time, we’ve not had any reports that it works. Furthermore, many improvements have been made across other hypervisor drivers that have not been done in the HyperV driver, so even if it worked, it would only expose a subset of the functionality that the other drivers do.
The change log indicates that “even if it worked,” Hyper-V would lag significantly behind the testing process for “other hypervisor drivers that have not been done in the HyperV driver.” The removal of the code will not impact any production deployments insofar as Joshua McKenty, CEO of Piston Cloud Computing and member of OpenStack’s Project Policy Board, commented: “I don’t know of any production deployment of it. I don’t know of any active development deployment of it.”
The more salient issue highlighted by the decision to remove Hyper-V is the lack of adoption of OpenStack by companies with Windows based cloud infrastructures. As noted by James Staten, Senior Analyst at Forrester Research, however, the lack of adoption of OpenStack in conjunction with a Windows Enterprise license is unsurprising given that a company building a cloud on an open source framework is not particularly likely to have an enterprise license for Windows based cloud software. Nevertheless, the removal of Hyper-V illustrates how companies with Windows based cloud infrastructures have yet to experiment with adding OpenStack to their IT environment as an additional cloud operating system, even for pilot or research purposes.
OpenStack is the largest collaboration on open source cloud computing in the world. The organization currently features the support of 149 companies and over 2300 individuals, with user groups in Australia, Austin (TX), Boston, China, Egypt, France, Indonesia, Japan, New York, San Francisco, Seattle, South Korea and Russia in addition to a Spanish language users group.
According to an IDG news article by Nancy Gohring, Microsoft Corporation is committed to supporting Hyper-V with OpenStack. Microsoft apparently released a statement claiming that it is “committed to working with the community to resolve the current issues with Hyper-V and OpenStack.” Microsoft had pledged support for Hyper-V in OpenStack deployments in October 2010 through a partnership with Cloud.com, but since then failed to support development of the OpenStack code supporting Hyper-V. Microsoft announced its commitment to supporting Hyper-V after OpenStack developer Thierry Carrez suggested removing support for Hyper-V from the forthcoming Essex release of OpenStack because the code was “broken and unmaintained.” Joshua McKenty, CEO of Piston Cloud Computing, Technical Architect of NASA’s Nebula Cloud Computing Platform and member of OpenStack’s Project Policy Board, remarked that he knows of no “production deployment” or “active development deployment” of Hyper-V in the OpenStack community. Microsoft has yet to release a more complete elaboration of its position on support for Hyper-V in OpenStack. The Essex release of OpenStack is scheduled for the second quarter of 2012.
On Tuesday, Oracle declared the availability of the Big Data appliance that it introduced to the world at its October conference Oracle Open World. The appliance runs on Linux and features Cloudera’s version of Apache Hadoop (CDH), Cloudera Manager for managing the Hadoop distribution, the Oracle NoSQL database as well as an open source version of R, the statistical software package. Oracle’s partnership with Cloudera in delivering its Big Data appliance goes beyond the latter’s selection as a Hadoop distributor to include assistance with customer support. Oracle plans to deliver tier one customer support while Cloudera will provide assistance with tier two and tier three customer inquiries, including those beyond the domain of Hadoop.
Oracle will run its Big Data appliance on hardware featuring 864 GB main memory, 216 CPU cores, 648 TB of raw disk storage, 40 Gb/s InfiniBand connectivity and10 Gb/s Ethernet data center connectivity. Oracle also revealed details of four connectors to its appliance with the following functionality:
• Oracle Loader for Hadoop to load massive amounts of data into the appliance by using the MapReduce parallel processing technology.
• Oracle Data Integrator Application Adapter for Hadoop which provides a graphical interface that simplifies the creation of Hadoop MapReduce programs.
• Oracle Connector R which provides users of R streamlined access to the Hadoop Distributed File System (HDFS)
• Oracle Direct Connector for Hadoop Distributed File System (ODCH), which supports the integration of Oracle’s SQL database with its Hadoop Distributed File System.
Oracle’s announcement of the availability of its Big Data appliance comes as the battle for Big Data market share takes shape in a landscape dominated by the likes of Teradata, Microsoft, IBM, HP, EMC, Informatica, MarkLogic and Karmasphere. Oracle’s selection of Cloudera as its Hadoop distributor indicates that it intends to make a serious move into the world of Big Data. For one, the partnership with Cloudera gives Oracle increased access to Cloudera’s universe of customers. Secondly, the partnership enhances the credibility of Oracle’s Big Data offering given that Cloudera represents that most prominent distributor of Apache Hadoop in the U.S.
In October, Microsoft revealed plans for a Big Data appliance featuring Hadoop for Windows Server and Azure, and Hadoop connectors for SQL Server and SQL Parallel Data Warehouse. Whereas Oracle chose Cloudera for Hadoop distribution, Microsoft partnered with Yahoo spinoff Hortonworks to integrate Hadoop with Windows Server and Windows Azure. In late November, HP provided details of Autonomy IDOL (Integrated Data Operating Layer) 10, which features the ability to process large-scale structured data sets in addition to a NoSQL interface for loading and analyzing structured and unstructured data. In December, EMC released its Greenplum Unified Analytics Platform (UAP) marked by the ability to load structured data, enterprise-grade Hadoop for analyzing structured and unstructured data and Chorus, a collaboration and productivity software tool. Bolstered by its partnership with Cloudera, Oracle is set to compete squarely with HP’s Autonomy IDOL 10, EMC’s Greenplum Chorus and IBM’s BigInsights until Microsoft’s appliance officially enters the Big Data doohyoo (土俵) qua sumo ring as well.