A California law firm has filed a class-action lawsuit against OpenAI for "stealing" personal data to train ChatGPT.
Clarkson Law Firm,Mother in a complaint filed in the Northern District of California court on Wednesday, alleges ChatGPT and Dall-E "use stolen private information, including personally identifiable information, from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge." To train its large language model, OpenAI scraped 300 billion words from the internet, including personal information and posts from social media sites like Twitter and Reddit. The law firm claims OpenAI "did so in secret, and without registering as a data broker as it was required to do under applicable law."
SEE ALSO: Lawyers fined $5K for using ChatGPT to file lawsuit filled with fake casesOpenAI has been the subject of controversy for how and what data it collects to train and further develop ChatGPT. Until recently, there was no explicit way for users to opt out of letting OpenAI use their conversations and personal information to feed the model. ChatGPT was initially banned in Italy, using Europe's General Data Protection Regulation (GDPR), for inadequately protecting user data, especially when it comes to minors. This lawsuit includes OpenAI's opaque privacy policies for existing users, but largely focuses on data scraped from the web that was never explicitly intended to be shared with ChatGPT. Through billion-dollar investments from Microsoft and subscriber revenue for ChatGPT Plus, OpenAI has profited from this data without compensating its source.
The 15 counts in the complaint include violation of privacy, negligence for failing to protect personal data, and larceny by illegally obtaining massive amounts of personal data to train its models. Datasets like Common Crawl, Wikipedia, and Reddit, which include personal information, are publicly available as long as companies follow the protocols for purchase and use of this data. But OpenAI allegedly used this data without permission or consent of users in the context of ChatGPT. Even though people's personal information is public on social media sites, blogs, and articles, if data is used outside of the intended platform, it can be considered a violation of privacy.
In Europe, there's a legal distinction between public domain and free-to-use data thanks to the GDPR law, but in the US, that's still up for debate. Nader Henein, a privacy research VP at Gartner who thinks the sentiment of the lawsuit is valid, said, "People should have control as to how their data is used, even when it is available in the public domain." But Henein is unsure if the US legal system would agree.
Ryan Clarkson, managing partner said in the firm's blog post, it's critical to act now with existing laws instead of waiting for Executive and Judicial branches to respond with federal regulation. "We cannot afford to pay the cost of negative outcomes with AI like we’ve done with social media, or like we did with nuclear. As a society, the price we would all pay is far too steep."
Topics Privacy ChatGPT
'Frozen 2' spins a darker, more mature fairy tale: Movie reviewGoogle Maps adds button to translate addresses and directionsElon Musk says new Tesla Gigafactory will be built in BerlinAirPods Pro covered in 18Chrissy Teigen and John Legend continued to be irritatingly cute at the OscarsAir fryers at The Home Depot are 50% offApple finally reveals when its pricey 'cheese grater' Mac Pro will become availableTikTok tests new feature to make it easier to buy stuff you don't needFacebook ad scam tricks users with images and video of Kickstarter productsLily Allen quits Twitter after trolls attack her over son's deathNicole Kidman is a great actress and a terrible clapper and that's okayPlease, for the love of God, stop saying 'Hidden Fences'Your guide to the 2017 Oscars, in highly accurate chartsThis seal delightedly hugging a toy version of itself is your new wallpaperProtesters played Christine Blasey Ford's testimony outside dinner honoring Brett KavanaughBrie Larson, like everyone else, didn't seem thrilled with Casey Affleck's Oscar winChrissy Teigen and John Legend continued to be irritatingly cute at the OscarsPlease, for the love of God, stop saying 'Hidden Fences'Colbert skewers Trump with a parody of that New York Times adHulu + Live TV is now $55 so you might as well just get cable Mourning Lincoln, and Other News Did William Blake's ‘Songs of Innocence’ Inspire Radoihead? The Answers to Our Thirty Word Puzzles Not Sorry: An Interview with Jeremy M. Davies This Sporting Life: On David Storey’s Classic Rugby Novel Writ in Water: The Enduring Mystery of Keats’s Last Words On the Merits of Disturbing Literature When Houdini Hired Lovecraft to Write for Him When I Auditioned for George Martin: An Appreciation Bookstores Are Great—They’re Also Filled with Lurking Creeps Ben Lerner on John Ashbery Painting a Poem—Diane Szczepaniak’s Watercolors & Wallace Stevens James Tate’s Last Poem, Found in His Typewriter There is a New Record for Most Bollywood Lyrics Ever Written, and Other News At Last, We Answer Patricia Lockwood's Excellent Tweet In France, Rereading Old Diaries Lydia Davis Will Receive Our 2016 Hadada Award The Library of Congress and the Art of the Courtroom Sketch Stolen Glasses: A Graphic Essay Who Is Professor Bhaer? Part 3 of an Ongoing Investigation