What is OpenAI’s new text-to-video AI model – Sora all about?

In a blog post, OpenAI stated, “Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.”

By
  • Storyboard18,
| February 16, 2024 , 1:33 pm
Think of GPT models that generate text based on words. Sora does something similar, but with images and videos. It breaks down videos into smaller pieces called patches. (Image source: Moneycontrol)
Think of GPT models that generate text based on words. Sora does something similar, but with images and videos. It breaks down videos into smaller pieces called patches. (Image source: Moneycontrol)

OpenAI, the parent company being ChatGPT and DALL-E has been testing a text-to-video model called Sora. This new AI model will help users create realistic videos using simple prompts.

Even though the platform is currently being tested, a few videos have been released of what OpenAI said was possible showcasing an input being generated into a video.

In a blog post, OpenAI stated, “Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.”

Furthermore, Sora can even use an existing still image to create a video from it.

“Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions,” OpenAI said in a post on X.

“We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction,” the blog post said.

Think of GPT models that generate text based on words. Sora does something similar, but with images and videos. It breaks down videos into smaller pieces called patches.

“Sora builds on past research in DALL·E and GPT models. It uses the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data. As a result, the model is able to follow the user’s text instructions in the generated video more faithfully,” the company said in the blog post.

OpenAI’s CEO Sam Altman shared on X that as of now, access to Sora is being given to a select few creators during the testing phase.

Altman also asked users on X to suggest prompts, the results of which he then posted on his account soon thereafter.

One example shared was of two golden retrievers podcasting on the top of a mountain. Another example Altman shared was of a half duck half dragon flying through a sunset with a hamster dressed in adventure gear on his back.

OpenAI has shared that the current version of Sora has weaknesses such as confusing left and right or failing to maintain visual continuity throughout a video.

Safety is of key importance in Sora’s testing. OpenAI said that dedicated users are being brought in who will deliberately try to cause malfunctions and produce inappropriate content, so that it can take preventive measures for the same. This process is known as red-teaming.

“We’ll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology.”

Leave a comment