Interesting
  • William
  • Blog
  • 5 minutes to read

OpenAI Reveals Multimodal GPT-4o To Take On Google’s Gemini AI

OpenAI has announced a new model called GPT-4o to power ChatGPT. But, unlike the advancements introduced by previous models like GPT-4, this one brings a massive boost to its multimodal capabilities, allowing it to interact with text, visuals, audio, or a combination of it all. Think of it as an AI tool with eyes and ears that can make sense of the world around you, just the way you would use something like Google Lens, but now supercharged with a generative AI chatbot on your phone.

The company claims GPT-4o can answer audio queries in just about 0.2 seconds. For example, it can facilitate two-way bilingual conversations by translating one language into another, without having to prompt it at the end of each person’s speech. Notably, OpenAI says it has cut down the cost of APIs in half for developers and has also dramatically reduced the token size for each request, which means the process is faster. 

GPT-4o seems like a convenient all-in-one alternative to tools like Google Gemini, which is also multimodal. Notably, ChatGPT with GPT-4o has a critical advantage here. Gemini’s Nano model requires a certain hardware baseline, but ChatGPT doesn’t because it follows an entirely cloud-based workflow and can run on any modern phone. Moreover, from what we’ve seen of ChatGPT’s new vision capabilities and how it intelligently makes sense of the world around it as seen with the camera lens, AI hardware like the Rabbit R1 seems obsolete from a value and capability perspective.

What can ChatGPT vision accomplish?

In the demo videos released by OpenAI, GPT-4o can be seen identifying real-world objects and interpreting them in another language, teaching mathematics in split screen mode based on a problem appearing in another app, identifying people and their surroundings in the camera frame, and even cracking terrible dad jokes. Unfortunately, all these fancy multimodal capabilities will take some time to land on every enthusiast’s phone. In the early phase, which begins with a public rollout starting today, GPT-4o will arrive only with its upgraded text and image capabilities. “With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network,” OpenAI says in an official blog post.

In the weeks to come, the company will be extensively testing the audio and vision capabilities, but even when they are released, there will be certain limitations in the early days. For example, the audio outputs will only allow a limited selection of sound presets to pick from. The most interesting element of today’s announcement, however, is that GPT-4o will be available to all users without any subscription caveat. As for users with a ChatGPT Plus subscription, they will get a 5x higher limit for conversations powered by the new model and will also be rewarded with priority access to the audio and vision capabilities in the coming weeks.


Source: http://www.slashgear.com/1581243/openai-reveals-chatgpt-4o-multimodal-visual-ai/

Inline Feedbacks
View all comments
guest

Is ChatGPT Safe? What You Should Know Before You Start Using AI Chatbots

In November 2022, the tech world was upended as OpenAI released ChatGPT, an AI chatbot with capabilities that...

4 Raspberry Pi Projects For Bicycle Riders

Raspberry Pi is a versatile device that could have a home in virtually every industry and hobby, and...

Just How Much Energy Does Generating An AI Image Actually Use?

Image generation with the use of artificial intelligence has become commonplace online, with plenty of buzz surrounding the...

The Controversy Of Virtual Influencers And How They’re Taking Over Social Media

AI has made it dramatically easier to make artificial "personalities" within a matter of minutes. A few natural...

How Do ‘AI Productivity’ Apps Like Beloga Actually Work?

While general purpose chatbots like OpenAI's ChatGPT are the focus of initial AI consumer hype, AI products with...

Celebrity Voices Like John Cena And Awkwafina Headline Meta’s Latest AI Upgrades

As part of the Meta Connect keynote, the technological giant has unveiled a variety of new developments in...

Can You Get Banned From Using ChatGPT?

When it hit the market in 2023, ChatGPT shook the world in more than one way, and there...

The Easy Way To Run An AI Chatbot Locally On Your Laptop

People are using all kinds of artificial intelligence-powered applications in their daily lives now. There are many benefits...

Microsoft Copilot Vs. ChatGPT: Which AI Is Smarter And More Useful?

Generative AI has been with us for over two years now, with most major tech companies trying to...

What Is The Stargate Project? The United States’ $500 Billion AI Venture, Explained

President Donald Trump has described the launch of the Stargate Project as a "monumental undertaking" and "a resounding declaration...

OpenAI Reveals Multimodal GPT-4o To Take On Google’s Gemini AI

OpenAI has announced a new model called GPT-4o to power ChatGPT. But, unlike the advancements introduced by previous...

Machine Learning Transparency: Making AI Understandable for Business Success

The proliferation of machine learning systems across industries has created an unprecedented challenge for business leaders: how to...

Apple Intelligence Is Official And It Promises To Supercharge Siri

At WWDC today, Apple made big updates to everything from the iPad (hooray, the Calculator app is finally...

The History Of AI: How Machine Learning’s Evolution Is Reshaping Everything Around Us

For a long time, artificial intelligence was a futuristic concept. But thankfully, the future is finally here. AI...

Saying These Simple Words To ChatGPT Is Costing OpenAI Millions Of Dollars

Growing up, we were all taught to be polite, but when you're one of the world's foremost AI...

Is Ford Really Making A 2025 Focus RS?

The Ford Focus RS' blend of functionality and performance made it a hit back when SlashGear reviewed it...

5 Ways Microsoft’s CoPilot AI Extension For Chrome Is Actually Useful

Microsoft's Copilot can help boost your productivity in several ways. You can access it through various platforms, including...

Here’s Why Congress Banned Microsoft’s Copilot AI

Late in 2023, U.S. President Joe Biden signed an executive order that set new standards for AI safety...

You Can Now Use ChatGPT Without An Account: Here’s How

OpenAI has officially removed the final barrier to entry for its famous chatbot tool. You can now use...

How Close Are We To AI Superintelligence? The 3 Types Of AI, Explained

We may receive a commission on purchases made from links. OpenAI's Sam Altman said in a blog post...