On Thursday,Denmark OpenAI once again shook up the AI world with a video generation model called Sora.
The demos showed photorealistic videos with crisp detail and complexity, based off of simple text prompts. A video based on the prompt "Reflections in the window of a train traveling through the Tokyo suburbs" looked like it was filmed on a phone, shaky camera work and reflections of train passengers included. No weird distorted hands in sight.
This Tweet is currently unavailable. It might be loading or has been removed.
A video from the prompt, "A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors" looked like a Christopher Nolan-Wes Anderson hybrid.
This Tweet is currently unavailable. It might be loading or has been removed.
Another of golden retriever puppies playing in the snow rendered soft fur and fluffy snow so realistic you could reach out and touch it.
The 7 trillion dollar question is, how did OpenAI achieve this? We don't actually know because OpenAI has barely shared anything about its training data. But in order to create a model this advanced, Sora needed lots of video data, so we can assume it was trained on video data scraped from all corners of the internet. And some are speculating that training data included copyrighted works. OpenAI did not immediately respond to request for comment on Sora's training data.
SEE ALSO: 8 wild Sora AI videos generated by the new OpenAI tool you need to seeIn OpenAI's technical paper it largely focuses on the method for achieving these results: Sora is a diffusion model that turns visual data into "patches" or pieces of data that the model can understand. But there's scant mention of where the visual data came from.
OpenAI says it “take[s] inspiration from large language models which acquire generalist capabilities by training on internet-scale data.” The incredibly vague “taking inspiration” part is the only evasive reference to the source of Sora’s training data. Further down in the paper, OpenAI says, “training text-to-video generation systems requires a large amount of videos with corresponding text captions.” The only source of a massive amount of visual data can be found on the internet, another hint at where Sora comes from.
The legal and ethical issue of how training data is acquired for AI models has been around ever since OpenAI launched ChatGPT. Both OpenAIand Googlehave been accused of “stealing” data to train their language models, in other words using data scraped from social media, online forums like Reddit and Quora, Wikipedia, databases of private books, and news sites.
Until now the rationale for scraping the entirety of the internet for training data is that it's publicly-available. But publicly-available doesn't always translateto public domain. Case in point, the New York Timesis suingOpenAI and Microsoft for copyright infringement, alleging OpenAI's models used the Times' works word for word or incorrectly cited the stories.
Now it looks like OpenAI is doing the same thing, but with video. If this is the case, you can expect heavy-hitters in the entertainment industry to have something to say about it.
But the problem remains: We still don't know the source of Sora's training data. "The company (despite its name) has been characteristically close-lipped about what they have trained the models on," wroteGary Marcus, an AI expert who testified at the U.S. Senate AI Oversight Committee hearing. " Many people have [speculated] that there’s probably a lot of stuff in there that is generated from game engines like Unreal. I would not at all be surprised if there also had been lots of training on YouTube visited, and various copyrighted materials," said Marcus, before adding, "Artists are presumably getting really screwed here."
Despite OpenAI's refusal to divulge its secrets, artists and creatives are assuming the worst. Justine Bateman, a filmmaker and SAG-AFTRA generative AI advisor didn't mince words. "Every nanosecond of this #AIgarbage is trained on stolen work by real artists," postedBateman on X. "Repulsive," she added.
This Tweet is currently unavailable. It might be loading or has been removed.
Others in creative industries are concerned about how the rise of Sora and video generating models will affect their jobs. "I work in film vfx, practically everyone I know is doom and gloom, panicking about what to do now," posted@jimmylanceworth.
OpenAI didn't completely ignore the explosive impact Sora might have. But that's largely focused on potential harms involving deepfakes and misinformation. It is currently in red-teaming phase, which means it's being stress-tested for inappropriate and harmful content. Towards the end of its announcement, OpenAI said it will be "engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology."
But that doesn't address the harms that may have already occurred by making Sora in the first place.
Topics Artificial Intelligence OpenAI
CES Unveiled 2025: Jizai's MiNYT mini crossword answers for January 5, 2025New York Knicks vs. Oklahoma City Thunder 2025 livestream: Watch NBA onlineBest Kindle Scribe deal: Save $85 at AmazonGiants vs. Eagles 2025 livestream: How to watch NFL onlineSabalenka vs. Kudermetova 2025 livestream: Watch Brisbane International final for freeWordle today: The answer and hints for January 6, 2025Samsung Galaxy S25 leak teases new AI featuresGet the Lenovo Yoga 7i 2Tesla stock slides after firstWaymo stopped Los Angeles man from stealing a driverless carWhat's new to streaming this week? (Jan. 3, 2025)NYT Strands hints, answers for January 3CES Unveiled 2025: OpenDroids' R2D3 domestic robot is 'Roomba on crack'Saints vs. Buccaneers 2025 livestream: How to watch NFL onlineNordicTrack T Series: Get it for $120 off at AmazonSabalenka vs. Kudermetova 2025 livestream: Watch Brisbane International final for freeWordle today: The answer and hints for January 3, 2025Best robot vacuum deal: Save $100 on iRobot Roomba Q0120New Nvidia GeForce RTX 5090 leak suggests huge memory boost Facebook launches TheFacebook...no wait, sorry, Facebook Campus Get more from Costco with a new Gold Star Executive Membership for $120 Diana Rigg, best known for 'Game of Thrones' & 'The Avengers', has died No Mercy: SEC charges rapper T.I. over cryptocurrency scam Frances McDormand is brilliant in tender, thoughtful 'Nomadland' Apple has designed its own face masks for employees Jessica Alba announces pregnancy with a very charming Boomerang Serving up technology in the public’s interest—hard, but worth it Trump's 'Made in America' week is already failing and it's not even Tuesday Chris Evans accidentally leaked a nude and Twitter had jokes Apple's new rules for gaming services like xCloud or Stadia are a joke The internet's roasting Eric Trump for not understanding that people Google anime Google Finance will make it easier to follow TSLA rollercoaster with new design So, is this bird magically floating or what? Redditor builds wonderful Terry Crews shrine while house sitting for brother What is invisible labor? It's real and it hurts. Here's what to know. 'New Girl' is the ultimate quarantine comfort watch Facebook tries to recruit more poll workers for presidential election Paul Ryan tried to be #relatable on emoji day and it did not go well at all TikTok will reportedly sell to Oracle after Microsoft bid rejected