Jail for consultant who scraped colossal trove of Alibaba customer data

A billion data points, including the usernames and mobile phone numbers of customers have been siphoned off Alibaba websites by a web crawler. The information has reached us about a week after a court ruling in the case.

The court ruling

A central Chinese court has ruled that an employee of a consultancy firm was guilty of gathering more than a billion data items of Taobao, users since 2019. (Taobao is the consumer-to-consumer platform on Alibaba’s sites.) The court imposed jail terms of more than three years, alongside fines totalling 450,000 yuan (approximately $70,000). Apparently the consultancy firm helped Taobao with merchants on Alibaba’s Taobao online mall. And their employee was using his access to the data to serve other clients.

Alibaba

Alibaba is one of the biggest online marketplaces in the world. Originally Alibaba.com started out as a business-to-business (B2B) platform, but with the foundation of Taobao it expanded into the consumer marketplace.

Alibaba’s consumer businesses annual active consumers on its China retail marketplaces reached 811 million for the twelve months that ended March 31, 2021, increasing from 779 million at the last quarter of 2020.

Like its nearest US equivalent, Amazon, the company also runs cloud services, a payment service (Alipay), and is active in digital media. In 2005 it started a close cooperation with Yahoo!

Alibaba statement

None of the customer data was sold and Alibaba’s users didn’t incur financial losses from the episode, the company said in a statement.

“Taobao devotes substantial resources to combat unauthorized scraping on our platform, as data privacy and security is of utmost importance. We have proactively discovered and addressed this unauthorized scraping. We will continue to work with law enforcement to defend and protect the interests of our users and partners.”

Web scraping

Although some media will call this a data leak or breach, web scraping is a different beast altogether. We did not expect to see the scale of Facebook’s data scrape of 533 million users to be “beaten” anytime soon, but a few months later and here we are. In Facebook’s case the scraping was possible because of a vulnerability that Facebook patched in 2019. In Alibaba’s case the scraping was enabled because the employee of the consultancy firm had full access to a part of the online infrastructure.

And while most types of web scrapers are perfectly fine, for example scrapers that help you find the best price for a product, the question is whether it is OK to scrape websites for personal data. While website users may have given consent to use some of their data for marketing purposes, is it fair to expect that they can anticipate how much information about them is available to potential scrapers, or how that data becomes something entirely different when it’s part of a billion-record data set, or when it’s combined with other information about them that makes their personal life pretty much an open book?

Chinese restrictions

The news about this court ruling comes at a keen moment for China as it recently announced it wants to tighten restrictions on the information gathering by internet giants like Alibaba, Tencent, and others. Last March, the Chinese government published new standards for the collection of personal data, specifically defining “necessary” data collection.

Among limitations like stopping app providers from collecting a broad range of data under a bundled consent model, the new data protection rules force Chinese companies to obtain government permission before transferring data outside of the country and grant individuals a right to access personal information held by data processors.

While we applaud these initiatives to protect user data-privacy, in China’s case it feels like a matter of “do as I say, not as I do.”