OpenAI’s next-generation o3 model will arrive early next year


After nearly two weeks of announcements, OpenAI concluded its 12 Days of OpenAI livestream series with a preview of its next-generation frontier model. “Out of respect for our friends at Telefónica (owner of the O2 cellular network in Europe), and in the great tradition of OpenAI being really, really bad at names, it’s called o3,” said OpenAI’s CEO. Sam Altman among those watching the announcement on YouTube.

The new model is not yet ready for public use. Instead, OpenAI first made o3 available to researchers who wanted to help safety test. OpenAI also announced the existence of o3-mini. Altman said the company plans to launch that model “by the end of January,” with the o3 following “shortly after that.”

As you might expect, the o3 offers better performance than its predecessor, but how much better it is than the o1 is the headline feature here. For example, when put through this year American Invitational Mathematics Examinationo3 achieved an accuracy score of 96.7 percent. In contrast, o1 got a more modest 83.3 percent rating. “What this means is that o3 is always missing a question,” said Mark Chen, senior vice president of research at OpenAI. In fact, o3 performed so well in the usual suite of benchmarks that OpenAI puts its models through that the company had to find more challenging tests to benchmark it against.

An ARC AGI test.An ARC AGI test.

ARC THROUGH

One of them is ARC-AGIa benchmark that tests the ability of an AI algorithm to intuit and learn on the fly. According to the creator of the test, the non-profit ARC Prizean AI system that successfully defeats ARC-AGI would represent “an important milestone toward artificial general intelligence.” Since its inception in 2019, no AI model has won ARC-AGI. The test consists of input-output questions that most people will know intuitively. For example, in the example above, the correct answer is to make squares out of four polyominos using dark blue blocks.

In its low-compute setting, the o3 scored 75.7 percent in the test. With more processing power, the model achieved a rating of 87.5 percent. “Human performance is comparable to the 85 percent threshold, so surpassing it is an important milestone,” according to Greg Kamradt, president of the ARC Prize Foundation.

A graph comparing the performance of the o3-mini against the o1, and the cost of that performance. A graph comparing the performance of the o3-mini against the o1, and the cost of that performance.

OpenAI

OpenAI also features o3-mini. The new model uses OpenAI’s recently announced Adaptive Thinking Time API to offer three different reasoning modes: Low, Medium and High. In practice, this allows users to adjust how long the software “thinks” about a problem before providing an answer. As you can see from the above graph, o3-mini achieves results comparable to OpenAI’s current o1 reasoning model, but at a fraction of the computational cost. As mentioned earlier, the o3-mini will arrive for public use before the o3.



Source link

  • Related Posts

    The Trump administration plans Yemen to attack an unauthorized signal chat

    The National Security President of Trump accidentally includes editor-in-chief of the Atlantic, Jeffrey Goldberg, In a signal chat Discuss confidential plans to attack the Houthis of Yemen. “I can’t believe…

    The Amazon Spring Sale Vacuum Deals: The beetle robot is up to 47 percent

    the Sales Amazon Spring brought a number of discounts to Sharksboth cordless and robotic varieties. At the robot vacuum side of things, you can get caution Shark Ai Ultra Servant…

    Leave a Reply

    Your email address will not be published. Required fields are marked *