OpenAI o3 Model: Lower Benchmark Scores Raise Questions About Claims, Transparency Over AI

ChatGPT’s New Image Generator Faces Backlash for ‘Sexy Men Only’ — Solen Feyissa/Unsplash

OpenAI has long been touting the capabilities of its artificial intelligence (AI) developments, especially with their o-series models that are capable of reasoning and more advanced capabilities.

The company made significant claims about the capabilities of its o3 model, which it company unveiled last year, including its power to solve more complex math problems from FrontierMath and more.

OpenAI o3 Model Lower Benchmark Fiasco

OpenAI Chief Research Officer Mark Chen previously revealed in a livestream video that the o3 model of the company is powerful, and it is so advanced, it can answer over 25% of the questions found in FrontierMath, which is known to have a challenging set of problems for users and machines.

However, it was argued by EpochAI (via TechCrunch) that its recent independent benchmark test of the o3 model shows that OpenAI's claims may not actually be truthful.

It was revealed by the independent test that OpenAI's o3 can only answer 10% of the mathematical problems presented to the chatbot, and it is significantly lower compared to their claims.

EpochAI is known for being the research firm behind FrontierMath.

Users Have Issues with OpenAI Over AI Transparency

Users are now reacting to these new benchmarks, calling out OpenAI over their transparency and supposed erroneous claims.

However, TechCrunch reported that while the percentage varies, the previous claim of OpenAI matches the lower-bound score from Epoch, with ARC Prize Foundation claiming that this public o3 model was tuned for chat use and is different from the one used for the earlier tests.

OpenAI's ChatGPT Advancements

OpenAI has been trending for several weeks now not solely because of the latest power that its large language model can do, but most especially because they gave ChatGPT the ability to generate images natively.

Instead of having the need to rely on DALL-E, users may go directly to the chatbot and ask it to create them different kinds of images, including a Ghibli-style one or the trending "Barbie Box" challenge.

Apart from this, OpenAI has also given ChatGPT a massive boost with regards to its academic-focused features, particularly as there is now the Deep Research Tool. While it still requires a paid subscription to access, users no longer have to pay $200 per month to get this more powerful version of the chatbot that boasts of its capabilities to help with studies.

Tags:OpenAI

Join the Discussion

OpenAI o3 Model: Lower Benchmark Scores Raise Questions About Claims, Transparency Over AI

Is OpenAI being true about their claims?

OpenAI o3 Model Lower Benchmark Fiasco

Users Have Issues with OpenAI Over AI Transparency

OpenAI's ChatGPT Advancements

Apple watchOS 12 Refresh Rumors: Best New Features That May Be Coming to the Apple Watch

Amazon Project Kuiper Launch Delayed to Next Week, But Will This Satellite Project Beat Starlink?

Best AI-Powered Symptom Checkers: Are They Accurate?

OpenAI o3 Model: Lower Benchmark Scores Raise Questions About Claims, Transparency Over AI

The Rise of Foldable Phones: Are They the Future or Just the Gimmick?

OpenAI o3 Model: Lower Benchmark Scores Raise Questions About Claims, Transparency Over AI

Is OpenAI being true about their claims?

OpenAI o3 Model Lower Benchmark Fiasco

Users Have Issues with OpenAI Over AI Transparency

OpenAI's ChatGPT Advancements

Apple watchOS 12 Refresh Rumors: Best New Features That May Be Coming to the Apple Watch

Amazon Project Kuiper Launch Delayed to Next Week, But Will This Satellite Project Beat Starlink?

Best AI-Powered Symptom Checkers: Are They Accurate?

OpenAI o3 Model: Lower Benchmark Scores Raise Questions About Claims, Transparency Over AI

The Rise of Foldable Phones: Are They the Future or Just the Gimmick?

Subscribe to Tech Times!