OpenAI announced a new flagship Gen AI model on Monday called GPT-4o. The “o” here stands for “omni” which highlights the new models ability to multiple formats like text, speech and audio.
GPT-4o will roll out “iteratively” across the company’s developer and consumer facing products in the next few weeks.
Mira Maruti, OpenAI’s CTO said that GPT-4o has GPT-4 level intelligence but goes one step further by providing GPT-4’s capabilities across multiple modalities and media.
GPT-4 Turbo, OpenAI’s last “leading “most advanced” model could analyse images as well as text. But GPT-4o adds speech into the mix.
GPT-4o improves on the capabilities of ChatGPT. The new model now allows users to interact with ChatGPT more like an assistant. Users will be able to ask ChatGPT a question and interrupt it while it is answering. The new model delivers real-time responsiveness. OpenAI has also said that GPT-4o can pick up on nuances in a user’s voice, in response generating voices in a variety of different emotive styles including singing.
GPT-4o offers enhanced performance in around 50 languages as well.
Currently, voice isn’t a part of the GPT-4o API for all customers. OpenAI first plans to launch support for GPT-4o’s new audio capabilities to “a small group of trusted partners” in the next few weeks.