Microsoft AI researchers accidentally exposed terabytes of sensitive data

Warnings about including credentials, keys, and tokens when sharing code on publicly accessible repositories shouldn’t be necessary. It should speak for itself that you don’t just hand over the keys to your data. But what if a misconfiguration ends in a supposed internal storage account becoming suddenly accessible to everyone?

That’s how Microsoft managed to leak access to 38 terabytes of data.

Wiz Research found that Microsoft’s AI research team, while publishing a bucket of open-source training data on GitHub, accidentally exposed 38 terabytes of additional private data — including a disk backup of two employees’ workstations. The backups contained sensitive data, including passwords to Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from 359 Microsoft employees.

An Azure feature called Shared Access Signature (SAS) tokens, which allows users to share data from Azure Storage accounts, was the source of the problem.

SAS token can be used to restrict:

  • What resources a client can access
  • What operations a client can perform (read, write, list, delete)
  • What network a client can access from (HTTPS, IP address)
  • How long a client has access (start time, end time)

Blob storage is a type of cloud storage for unstructured data. A “blob,” which is short for Binary Large Object, is a mass of data in binary form. Azure Storage SAS tokens are essentially strings that allow access to Azure Storage services in a secure manner. They are a type of URI (Uniform Resource Identifier) that offer specific access rights to specified Azure Storage resources, like a blob, or a whole range of blobs.

A Microsoft employee shared a URL for a blob store in a public GitHub repository while contributing to open-source AI learning models. This URL included an overly-permissive SAS token for an internal storage account.

The URL allowed access to more than just the open-source models. It was configured to grant permissions on the entire storage account, thus exposing the additional sensitive data by mistake.

But exposing sensitive data is not even the worst that could have happened, Wiz explains.

“An attacker could have injected malicious code into all the AI models in this storage account, and every user who trusts Microsoft’s GitHub repository would’ve been infected by it.”

After Wiz shared its findings with Microsoft on June 22, 2023 Microsoft revoked the SAS token two days later.

Microsoft stated that:

“The information that was exposed consisted of information unique to two former Microsoft employees and these former employees’ workstations. No customer data was exposed, and no other Microsoft services were put at risk because of this issue. Customers do not need to take any additional action to remain secure.”

Microsoft also said that as a result of Wiz’s research, it has expanded GitHub’s secret spanning service, which monitors all public open source code changes for plaintext exposure of credentials and other secrets to include any SAS token that may have overly permissive expirations or privileges.

Best practices for SAS tokens

Allowing others to learn from their mistakes, Microsoft shared some tips on working with SAS URLs.

  • Apply the principle of least privilege: Scope SAS URLs to the smallest set of resources required by clients (e.g. a single blob), and limit permissions to only those needed by the application (e.g. read-only, write-only).
  • Use short-lived SAS: Always use a near-term expiration time when creating a SAS, and have clients request new SAS URLs when needed. Azure Storage recommends one hour or less for all SAS URLs.
  • Handle SAS tokens carefully: SAS URLs grant access to your data and should be treated as an application secret. Only expose SAS URLs to clients who need access to a storage account.
  • Have a revocation plan: Associate SAS tokens with a stored access policy for fine-grained revocation of a SAS within a container. Be ready to remove the stored access policy or rotate storage account keys if a SAS or shared key is leaked.
  • Monitor and audit your application: Track how requests to your storage account are authorized by enabling Azure Monitor and Azure Storage Logs. Use a SAS Expiration Policy to detect clients using long-lived SAS URLs.

Wiz advises against the external usage of SAS tokens.

“{SAS] tokens are very hard to track, as Microsoft does not provide a centralized way to manage them within the Azure portal. In addition, these tokens can be configured to last effectively forever, with no upper limit on their expiry time. Therefore, using Account SAS tokens for external sharing is unsafe and should be avoided.”


We don’t just report on cloud security.

Cybersecurity risks should never spread beyond a headline. Detect sophisticated threats across Box and other vendors’ cloud repositories by using ThreatDown Cloud Storage Scanning.