OpenAI hurried to defend its market position on Friday with the release of O3-Mini, a direct answer to the R1 model of the Chinese startup Deepseek that sent shock waves through the AI industry by matching the top performance against a fraction of the calculation costs.
“We release OpenAI O3-Mini, the newest, most cost-efficient model in our reasoning series, available in both chatgpt and the API today,” Openai said in an official Blog post. “Example in December 2024, this powerful and fast model promotes the boundaries of what small models can reach (…), while the low costs and reduced latency of OpenAI O1-Mini can maintain.”
OpenAI has also made reasoning options available for the first time for users, while they tripled the daily message limits for paying customers, from 50 to 150, to stimulate the use of the new family of reasoning models.
Unlike GPT-4O and the GPT family of models, the “O” family of AI models is aimed at reasoning tasks. They are less creative, but have embedded reasons, so that they are better able to solve complex problems, to build back tracking on incorrect analyzes and build a better structural code.
At the highest level, OpenAI has two main families of AI models: generative pre-trained transformers (GPT) and “Omni” (O).
- GPT is as the artist of the family: a right-hence type, it is good for role play, conversation, creative writing, summarizing, explanation, brainstorming, chat, etc.
- O is the Nerd of the family. It is bad at telling stories, but is great in coding, solving mathematical comparisons, analyzing complex problems, planning his reasoning process step by step, comparing research documents, etc.
The new O3 Mini is supplied in three versions – low, medium or high. These subcategories offer users better answers in exchange for more “deduction(What is more expensive for developers who have to pay per token).
OpenAI O3-Mini, aimed at efficiency, is worse than OpenAI O1-Mini in general knowledge and multilingual thought, but it scores better on other tasks such as coding or facts. All other models (O3-Mini Medium and O3-Mini high) beat OpenAi O1-Mini in every benchmark.
The breakthrough of Deepseek, which yielded better results than OpenAi’s flagship model, while only used a fraction of the computing power, led to an enormous technical sale that wiped almost $ 1 trillion from the American markets. NVIDIA only raised $ 600 billion in market value when investors questioned the future demand for his expensive AI chips.
The efficiency gorge came from Deepseek’s new approach to model architecture.
While American companies focused on throwing more computing power at AI development, the Deepseek team found ways to streamline how models process information, making them more efficient. The competitive pressure increased when the Chinese tech giant Alibaba Qwen2.5 Max released, an even more capable model than one deep chat that was used as a base and opened the road to what a new Gulf of Chinese AI innovation could be.
OpenAI O3-Mini tries to increase that gap again. The new model runs 24% faster than its predecessor and corresponds to or beats older models on important benchmarks, while it costs less to work.
The prices are also more competitive. The rates of OpenAI O3-mini $ 0.55 per million input tokens and $ 4.40 per million are much higher than Deepseek’s R1 prices of $ 0.14 and $ 2.19 for the same volumes, but they reduce them The gap between OpenAi and Deepseek and represent a major cut compared to the prices charged to run OpenAI O1.

And that can be the key to its success. OpenAI O3-Mini is closed Source, in contrast to Deepseek R1 that is available free of charge for those who are willing to pay for use on hosted servers, the profession will increase depending on the intended use.
OpenAI O3 Mini-Medium scores 79.6 at the Aime benchmark of mathematical problems. Deepseek R1 scores 79.8, a score that is only defeated by the most powerful model in the family, OpenAI mini-O3 High, which scores 87.3 points.
The same pattern can be seen in other benchmarks: the GPQA markings, which measure the skill in various scientific disciplines, are 71.5 for Deepseek R1, 70.6 for O3-Mini Low and 79.7 for O3-Mini High. R1 is on the 96.3rd percentile in CodeforcesA benchmark for coding tasks, while O3-Mini Low is on the 93rd percentile and O3-Mini High is on the 97th percentile.
So the differences exist, but in terms of benchmarks they can be negligible, depending on the model that was chosen to perform a task.
OpenAI O3-Mini Testing against Deepseek R1
We tried the model with a few tasks to see how things were performing against Deepseek R1.
The first task was a spy game to test how good it was with Multi-Step reasoning. We choose the same monster from the database with large bench Gitub Which we used to evaluate Deepseek R1. (The full story is available here And includes a school trip to a remote, snowy location, where students and teachers are confronted with a series of strange disappearances; The model must find out who the stalker was.)
OpenAI O3-Mini did not do well and reached the wrong conclusions in the story. According to the test’s answer, the name of the Stalker Leo is. Deepseek R1 got well, while OpenAI O3-Mini was wrong and said that the name of the stalker was Eric. (Nice fact, we cannot share the link to the conversation because it was marked as unsafe by OpenAi).

The model is fairly good at logical language -related tasks that do not entail mathematics. For example, we asked the model to write five sentences that end in a specific word, and it was able to understand the task, to evaluate results before it gave the final answer. It thought of the answer for four seconds, corrected one wrong answer and gave an answer that was completely correct.

It is also very good at mathematics, which is able to solve problems that are considered extremely difficult in some benchmarks. The same complex problem that Deepseek R1 lasted 275 seconds to solve was completed by OpenAI O3-Mini in just 33 seconds.


So a pretty good effort, OpenAi. Your movement deepseek.
Published by Andrew Hayward
Generally intelligent Newsletter
A weekly AI trip told by Gen, a generative AI model.