Microsoft apologised on Tuesday for a global disruption involving Azure cloud services such as Microsoft Staff, Office 365, and Dynamics 365.
In a post-incident investigation note on the outage, Microsoft said, “We understand how extremely impactful and inappropriate this is and apologize profoundly.” The outage was caused by “authentication flaws” across various Microsoft cloud providers, according to Microsoft. “We are constantly working to develop the Microsoft Azure platform and our systems to better prevent similar events like this.”
Microsoft mentioned changes made after a five-hour blackout on Sept. 28, 2020, that impacted Microsoft 365 users. “We reported our intentions to introduce additional security to the Azure AD (Active Directory) service backend SDP (Session Summary Protocol) framework to avoid the class of issues found here in the September incident.”
The first phase of SDP updates has been completed, according to Microsoft, and the second phase is being implemented in a “very deliberately staged rollout” that will be completed by mid-year. According to Microsoft, “the initial review does show that if it is completely deployed, it will avoid the kind of outage that occurred today, as well as the associated occurrence in September 2020.”
Save for Intune and Microsoft Managed Desktop, Microsoft said the “majority of services” hit by the global Azure and Teams disruption were back online on Tuesday morning. A Tweet from the Microsoft 365 status account at 6:34 a.m. provided the most recent update on the outage.
After a global blackout on Monday that affected the Teams coordination app as well as “many” other Azure, Office 365, and Dynamics 365 applications, Microsoft issued an apology. The problems, which Microsoft announced on Twitter at 3:40 p.m. Eastern Time on Monday, could impact any user “worldwide,” according to the firm.
Despite the outage, some business executives are urging MSPs to shift customers to the cloud more quickly in the wake of the Chinese state-sponsored hackers’ on-premise Exchange Server attack on March 2. Only on-premise versions of Exchange Server were hit by the attack, not Exchange Online or the cloud-based Office 365 email service. Since they were only using on-premise versions of Exchange, emails were compromised from 30,000 U.S. organizations and 60,000 organizations worldwide as a result of the hack.
“MSPs must switch their clients to the cloud sooner, and they must also stabilise their networking networks with circuit diversity and failover,” according to Tydings. “Microsoft has confirmed that they can have greater security in the cloud than they can with on-premise Exchange.”
Partners must provide secure internet access with SD-WAN and wireless failover with carrier plans with a SIM module and a cable backup to a primary fibre line, according to Tydings. MSPs could use alternate networking infrastructure such as Zoom or Cisco Webex in the event of a Microsoft Teams outage, he added.
According to Tydings, with the global pandemic leading to more distributed workforces, on-premise Exchange no longer makes sense for customers. “Since the pandemic, the MSPs we work with have been heroes for converting their clients from on-prem to cloud,” he said.
Companies are investing in making software products faster, but not in making cloud services more resilient, according to Ofer Smadari, co-founder and CEO of StackPulse, a Portland, Ore.-based reliability platform that helps teams detect, respond to, and remediate incidents with code-based automation.