Launch of GPT-4o: a significant advance in artificial intelligence

Sam Altman predicted the arrival of a major innovation, and now it’s here: GPT-4o has been launched, and its capabilities are astonishing.

OpenAI’s flagship model regularly generates excitement and speculation. The new sensation in the AI community is GPT-4o, an OpenAI creation promising significant improvements in capability and accessibility. This update marks a step towards a much more natural human-computer interaction.

After making the GPT Store free for everyone, OpenAI is doing its utmost to make advanced AI tools accessible to as many people as possible. With GPT-4o in ChatGPT Free, users will now have access to features such as:

GPT-4 level intelligence experience
Answers from both the model and the web
Data analysis and graphing
Discussion of photos taken by the user
Downloading files to assist in writing, summarizing, or analyzing
Discovering and using GPTs and the GPT Store
Creating an enhanced experience using the Memory function

GPT-4o Highlights

Unified Multimodal Model

GPT-4o can understand and respond using text, audio, and images simultaneously. You can talk to it, show it images, or write messages, and it will understand you perfectly. For example, if you’re talking in a noisy environment, it can understand you despite the ambient noise and might even respond with a laugh or a song if it suits the context of the conversation.

Real-time Audio and Voice Conversations

GPT-4o responds almost instantly, as quickly as a person in a conversation. This responsiveness gives the impression of chatting with a friend who responds without delay.

Improved Vision and Image Understanding

GPT-4o excels at observing and understanding images. Show it a photo of an Italian restaurant menu, and it can translate it into English, tell you the history of the dishes, and advise you on what to order according to your tastes.

Fast and Cost-effective

GPT-4o is twice as fast as the previous version, providing answers quickly without waiting. It’s also cheaper to run, allowing developers and businesses to save money while leveraging advanced AI features.

Extended Multilingual Capabilities

GPT-4o has improved understanding and expression in multiple languages. This means more people around the world can use it in their native language. For instance, it can help translate a Spanish document into English more accurately and quickly.

Advanced Voice Mode and Real-time Interaction

Coming soon, GPT-4o will have a special voice mode where you can talk to it, and it can see you on video. These updates make GPT-4o a powerful tool that’s easy to use and useful in everyday situations, whether it’s for quick translations, help in different languages, or instant responses during conversations.

GPT-4o Compared with Other Models

GPT-4o matches the performance of GPT-4 Turbo on standard text, reasoning, and coding tests, while setting new records in multilingual, audio, and visual capabilities. Here are some detailed points:

Text Assessment: New record of 87.2% on the 5-trial MMLU (general knowledge questions).
Audio ASR Performance: Significant improvement over Whisper-v3 in all languages, including less well-endowed languages.
Audio Translation: Sets a new record for speech translation and outperforms Whisper-v3 on the MLS benchmark.
M3Exam Zero-Shot Results: Outperforms GPT-4 in all languages for this multilingual and visual assessment.
Vision Comprehension: Achieves peak performance on visual perception benchmarks.

GPT-4 Turbo vs. GPT-4o

GPT-4o retains the remarkable intelligence of its predecessors but offers increased speed, greater cost-effectiveness, and higher rate limits than GPT-4 Turbo. Key differences include:

Price: GPT-4o is 50% cheaper than GPT-4 Turbo, charging $5 per million input tokens and $15 per million output tokens.
Speed: GPT-4o operates twice as fast as GPT-4 Turbo.
Vision: GPT-4o shows superior vision capabilities compared to GPT-4 Turbo in evaluations.
Multilingual Support: GPT-4o offers better support for languages other than English than GPT-4 Turbo.

GPT-4o currently has a 128k pop-up window and operates with a knowledge deadline of October 2023.

Who Can Access GPT-4o?

The answer is simple – everyone. Here’s how:

ChatGPT Free Users: GPT-4o is now available to users of the free version, with certain usage limits. Once a user reaches their message limit, GPT-4o automatically switches to GPT-3.5, allowing conversations to continue seamlessly.
Plus Subscribers: Plus subscribers benefit from five times more messages with GPT-4o than users of the free version.
Team and Enterprise Users: Team and Enterprise plan users will benefit from even higher usage limits, making GPT-4o a valuable tool for collaborative working.

Accessibility for All

One of the most impressive aspects of GPT-4o is its commitment to accessibility. In a recent presentation, Mira Murati, a leading figure at OpenAI, stressed the importance of making advanced AI tools available to everyone, free of charge. With GPT-4o, OpenAI is democratizing access to cutting-edge technology, ensuring that users from all walks of life can harness its power.

Enhanced Capabilities

At the heart of GPT-4o is unrivaled intelligence, covering text, vision, and audio. Unlike its predecessors, GPT-4o offers blazing processing speeds and improved performance across a range of tasks. With its real-time conversation capabilities, users can interact with GPT-4o in a natural and fluid way.

The hype surrounding GPT-4o seems fully justified. With its mix of accessibility, intelligence, and versatility, GPT-4o represents a significant advance in the field of artificial intelligence.