OpenAI Reveals Multimodal GPT-4o To Take On Google's Gemini AI

OpenAI has announced a new model called GPT-4o to power ChatGPT. But, unlike the advancements introduced by previous models like GPT-4, this one brings a massive boost to its multimodal capabilities, allowing it to interact with text, visuals, audio, or a combination of it all. Think of it as an AI tool with eyes and ears that can make sense of the world around you, just the way you would use something like Google Lens, but now supercharged with a generative AI chatbot on your phone.

The company claims GPT-4o can answer audio queries in just about 0.2 seconds. For example, it can facilitate two-way bilingual conversations by translating one language into another, without having to prompt it at the end of each person’s speech. Notably, OpenAI says it has cut down the cost of APIs in half for developers and has also dramatically reduced the token size for each request, which means the process is faster.

Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqN
Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx
— OpenAI (@OpenAI) May 13, 2024

GPT-4o seems like a convenient all-in-one alternative to tools like Google Gemini, which is also multimodal. Notably, ChatGPT with GPT-4o has a critical advantage here. Gemini’s Nano model requires a certain hardware baseline, but ChatGPT doesn’t because it follows an entirely cloud-based workflow and can run on any modern phone. Moreover, from what we’ve seen of ChatGPT’s new vision capabilities and how it intelligently makes sense of the world around it as seen with the camera lens, AI hardware like the Rabbit R1 seems obsolete from a value and capability perspective.

What can ChatGPT vision accomplish?

In the demo videos released by OpenAI, GPT-4o can be seen identifying real-world objects and interpreting them in another language, teaching mathematics in split screen mode based on a problem appearing in another app, identifying people and their surroundings in the camera frame, and even cracking terrible dad jokes. Unfortunately, all these fancy multimodal capabilities will take some time to land on every enthusiast’s phone. In the early phase, which begins with a public rollout starting today, GPT-4o will arrive only with its upgraded text and image capabilities. “With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network,” OpenAI says in an official blog post.

In the weeks to come, the company will be extensively testing the audio and vision capabilities, but even when they are released, there will be certain limitations in the early days. For example, the audio outputs will only allow a limited selection of sound presets to pick from. The most interesting element of today’s announcement, however, is that GPT-4o will be available to all users without any subscription caveat. As for users with a ChatGPT Plus subscription, they will get a 5x higher limit for conversations powered by the new model and will also be rewarded with priority access to the audio and vision capabilities in the coming weeks.

Source: http://www.slashgear.com/1581243/openai-reveals-chatgpt-4o-multimodal-visual-ai/

Inline Feedbacks

View all comments

Is Cadillac Really Bringing Back The Escalade EXT In 2026?

Cadillac has long held a reputation for developing innovative, luxury-minded vehicles that don't short drivers in terms of...

You Can Now Use ChatGPT Without An Account: Here’s How

OpenAI has officially removed the final barrier to entry for its famous chatbot tool. You can now use...

Why Some May Not Trust Using Gemini In Their Google Workspace Account

As it competes against other companies in the AI race, Google is pushing its Gemini AI into every...

What Exactly Is Minimax’s Hailuo AI & Is It Safe To Use?

Videos generated by Artificial Intelligence have come a long way in 2024. There are several well-known players in...

Every Super Bowl 2025 AI Commercial Has The Same Problem

The Super Bowl is known for its over the top commercials. Most of the time, they're packed with...

How Do ‘AI Productivity’ Apps Like Beloga Actually Work?

While general purpose chatbots like OpenAI's ChatGPT are the focus of initial AI consumer hype, AI products with...

Exploring Advertising Options in Telegram Mini Apps

Telegram Mini Apps present a fresh approach to digital advertising through specialized formats designed for the platform's ecosystem....

Is The RTX 4090 Still Good For Gaming In 2024?

Every PC gamer knows that your games are only as good as your graphics card. Whether you're trying...

6 Things You Can Do With The New Raspberry Pi AI Kit

Raspberry Pi has just released an AI Kit which is designed to work with the Raspberry Pi 5....

Here’s Why Congress Banned Microsoft’s Copilot AI

Late in 2023, U.S. President Joe Biden signed an executive order that set new standards for AI safety...

What Is The Stargate Project? The United States’ $500 Billion AI Venture, Explained

President Donald Trump has described the launch of the Stargate Project as a "monumental undertaking" and "a resounding declaration...

AI Governance in the Age of Uncertainty: Building Regulatory Frameworks for Unknown Futures

The emergence of artificial intelligence as a transformative force in human society has created an unprecedented regulatory paradox....

Как запустить рекламу в ТМА (Telegram Mini Apps): полное руководство

Telegram Mini Apps (TMA) — это не только удобный инструмент для взаимодействия с пользователями, но и мощный канал...

ChatGPT Vs. Google Gemini: Which AI Chatbot Is Smarter?

The AI wars are officially heating up as major tech companies all try to claim a piece of...

The Hollywood Tech That’s Training Tesla’s AI-Powered Robots

Tesla has been hard at work developing its own humanoid robot since 2021, branching out beyond electric vehicles....

6 AI Features In DaVinci Resolve Worth Trying Out

BlackMagic Design's signature post-production powerhouse program DaVinci Resolve has proven its viability across the entertainment field, with its...

ChatGPT’s Image Generator Is Free – Here’s What You Can (And Can’t) Do With It

On April 1, Sam Altman announced on X that ChatGPT's new image generator will now be accessible to...

Do AI Humanizers Actually Work? We Tested Them And This Is What We Found

First, we had ChatGPT and other Generative Pre-Trained Transformers, which created AI-generated text. Next, we had AI Detectors,...

The Controversy Of Virtual Influencers And How They’re Taking Over Social Media

AI has made it dramatically easier to make artificial "personalities" within a matter of minutes. A few natural...

6 Of The Worst Mistakes Google’s New AI Overview Has Made So Far

In May 2024, Google held its latest Google I/O conference, which opened with a keynote speech that focused...

On the other hand, the Academizer converts your casual sentences and paragraphs to have a more academic tone.

Musk recently argued in a court that he will be irreparably harmed if OpenAI is allowed to complete its for-profit…

Examples of AI videos abound on social media sites like YouTube, and many users are embracing this new creative medium.

What can ChatGPT vision accomplish?

You may also like