Microsoft and OpenAI were sued on Wednesday by sixteen pseudonymous individuals who claim the companies’ AI products based on ChatGPT collected and divulged their personal information without adequate notice or consent.
The complaint [PDF], filed in federal court in San Francisco, California, alleges the two businesses ignored the legal means of obtaining data for their AI models and chose to gather it without paying for it.
“Despite established protocols for the purchase and use of personal information, Defendants took a different approach: theft,” the complaint says. “They systematically scraped 300 billion words from the internet, ‘books, articles, websites and posts – including personal information obtained without consent.’ OpenAI did so in secret, and without registering as a data broker as it was required to do under applicable law.”
Through their AI products, its claimed, the two companies “collect, store, track, share, and disclose” the personal information of millions of people, including product details, account information, names, contact details, login credentials, emails, payment information, transaction records, browser data, social media information, chat logs, usage data, analytics, cookies, searches, and other online activity.
The complaint contends Microsoft and OpenAI have embedded into their AI products the personal information of millions of people, reflecting hobbies, religious beliefs, political views, voting records, social and support group membership, sexual orientations and gender identities, work histories, family photos, friends, and other data arising from online interactions.
OpenAI developed a family of text-generating large language models, which includes GPT-2, GPT-4, and ChatGPT; Microsoft not only champions the technology, but has been cramming it into all corners of its empire, from Windows to Azure.
“With respect to personally identifiable information, defendants fail sufficiently to filter it out of the training models, putting millions at risk of having that information disclosed on prompt or otherwise to strangers around the world,” the complaint says, citing The Register‘s March 18, 2021 special report on the subject.
The 157 page complaint is heavy on media and academic citations expressing alarm about AI models and ethics but light on specific instances of harm.
For the 16 plaintiffs, the complaint indicates that they used ChatGPT, as well as other internet services like Reddit, and expected that their digital interactions would not be incorporated into an AI model.
It remains to be seen how, if at all, plaintiff-created content and metadata has actually been exploited and whether ChatGPT or other models will reproduce that data.
OpenAI in the past has dealt with the reproduction of personal information by filtering it.
The lawsuit is seeking class-action certification and damages of $3 billion – though that figure is presumably a placeholder. Any actual damages would be determined if the plaintiffs prevail, based on the findings of the court.
The complaint alleges Microsoft and OpenAI have violated America’s Electronic Privacy Communications Act by obtaining and using private information, and by unlawfully intercepting communications between users and third-party services via integrations with ChatGPT and similar products.
The sueball further contends the defendants have violated the Computer Fraud and Abuse Act by intercepting interaction data via plugins.
It also alleges violations of the California Invasion of Privacy Act and unfair competition law, the Illinois Biometric Information Privacy Act and consumer fraud and deceptive business practices law, and New York business law, along with various general harms (torts) like negligence and unjust enrichment.
Microsoft and OpenAI declined to comment.
Microsoft, its GitHub subsidiary, and OpenAI were sued last November for allegedly reproducing the code of millions of software developers in violation of licensing requirements through the Copilot service, based on an OpenAI model, that GitHub offers. That case is ongoing. ®