
Getting started with Chinese AI MiniMaxheadquartered in Shanghai, sent shockwaves through the AI industry today with the release of its new M2.5 language model in two variants, which promises to make high-end artificial intelligence so cheap you can stop worrying about the bill altogether.
It is the same "open source," however the weights (settings) and code have not been posted, nor the exact license type or terms. But that’s almost beside the point given how cheaply MiniMax serves it through its API and partners.
For the past few years, using the world’s most powerful AI has been like hiring an expensive consultant—it’s brilliant, but you’re watching the clock (and the token count) all the time. M2.5 changes that math, reducing border costs by as much as 95%.
By delivering performance that rivals top-tier models from Google and Anthropic at a fraction of the cost, the tool’s agent-centric use for business tasks, including create Microsoft Word, Excel and PowerPoint filesMiniMax is betting that the future is not just how smart a model is, but how often you can afford to use it.
In fact, so far, MiniMax says it works "with senior professionals in fields such as finance, law, and social sciences" to ensure that the model performs the actual work up to their specifications and standards.
This release is important because it signals the transition from AI as a "chatbot" of AI as a "workers". When the intellect becomes "very cheap in the metro," developers stop making simple Q&A tools and start building "agents"—software that can spend hours autonomously coding, researching, and organizing complex projects without breaking the bank.
In fact, MiniMax has already deployed this model in its own operations. Currently, 30% of all tasks in MiniMax HQ can be completed in M2.5and a surprising one 80% of their new code is built on M2.5!
As the MiniMax team wrote in their release blog post, "we believe that M2.5 provides almost unlimited possibilities for the development and operation of economic agents."
Technology: less power and the collapse of CISPO
The secret to M2.5’s efficiency lies in its Mixture of Experts (MoE) architecture. Instead of running all 230 billion parameters for each word it generates, only the model "moves" 10 billion. This allows the reasoning depth of a large model to be maintained while operating with the agility of a smaller one.
To train this complex system, MiniMax developed a proprietary Reinforcement Learning (RL) framework called Forge. MiniMax engineer Olive Song expressed in ThursdAI podcast on YouTube that this technique was instrumental in scaling performance even when using a relatively small number of parameters, and that the model was trained for two months.
Forge is designed to help the model learn from it "real world environment" – Virtually allows AI to practice coding and using tools in thousands of simulated workplaces.
"What we know is that there is a lot of potential with a small model like this if we train its reinforcement learning with many environments and agents," Song said. "But it is not easy to do," added that is what they spent "many hours" in.
To keep the model stable during this intense training, they used a mathematical method called CISPO (Clipping Importance Sampling Policy Optimization) and shared the formula on their blog.
This formula ensures that the model is not overcorrected during training, allowing it to develop what MiniMax calls a "Architect’s Mind". Instead of jumping straight into writing code, M2.5 learns to actively plan the structure, features, and interface of a project first.
State-of-the-art (and near) benchmarks
The results of this architecture can be seen in the latest industry leaderboards. The M2.5 didn’t just improve; it has entered the highest level of coding models, approaching the latest Anthropic model, Claude Opus 4.6, released just a week agoand it shows that Chinese companies are now only days away from arriving at better resourced (in terms of GPUs) labs in the US.
Here are some of the new MiniMax M2.5 benchmark highlights:
-
Verified by SWE-Bench: 80.2% — Match Claude Opus 4.6 speed
-
BrowseComp: 76.3% — Find industry-leading tool functionality.
-
Many-SWE-Bench: 51.3% — SOTA in multi-language coding
-
BFCL (Tool Calling): 76.8% — High accuracy agent workflows.
On the ThursdAI podcast, host Alex Volkov pointed out that the MiniMax M2.5 operates extremely fast and therefore uses fewer tokens to complete tasks, on the order of $0.15 per task compared to $3.00 for Claude Opus 4.6.
Breaking the cost barrier
MiniMax offers two versions of the model through its API, both focused on high-volume production use:
-
M2.5-Lightning: Optimized for speed, delivering 100 tokens per second. It costs $0.30 per 1M input tokens and $2.40 per 1M output tokens.
-
Standard M2.5: Optimized for cost, running at 50 tokens per second. It costs half of the Lightning version ($0.15 per 1M input tokens / $1.20 per 1M output tokens).
In simple language: MiniMax claims that you can run on four "agents" (AI workers) continuously throughout the year for roughly $10,000.
For business users, this price is roughly 1/10 to 1/20 the cost of competing proprietary models such as the GPT-5 or Claude 4.6 Opus.
|
model |
Input |
Output |
Total Cost |
SOURCE |
|
Qwen 3 Turbo |
$0.05 |
$0.20 |
$0.25 |
|
|
deepseek-chat (V3.2-Exp) |
$0.28 |
$0.42 |
$0.70 |
|
|
deepseek-reasoner (V3.2-Exp) |
$0.28 |
$0.42 |
$0.70 |
|
|
Grok 4.1 Fasting (reasoning) |
$0.20 |
$0.50 |
$0.70 |
|
|
Grok 4.1 Fasting (not reasoning) |
$0.20 |
$0.50 |
$0.70 |
|
|
MiniMax M2.5 |
$0.15 |
$1.20 |
$1.35 |
|
|
MiniMax M2.5-Lightning |
$0.30 |
$2.40 |
$2.70 |
|
|
Gemini 3 Flash Preview |
$0.50 |
$3.00 |
$3.50 |
|
|
Kimi-k2.5 |
$0.60 |
$3.00 |
$3.60 |
|
|
GLM-5 |
$1.00 |
$3.20 |
$4.20 |
|
|
ERNIE 5.0 |
$0.85 |
$3.40 |
$4.25 |
|
|
Claude Haiku 4.5 |
$1.00 |
$5.00 |
$6.00 |
|
|
Qwen3-Max (2026-01-23) |
$1.20 |
$6.00 |
$7.20 |
|
|
Gemini 3 Pro (≤200K) |
$2.00 |
$12.00 |
$14.00 |
|
|
GPT-5.2 |
$1.75 |
$14.00 |
$15.75 |
|
|
Claude Sonnet 4.5 |
$3.00 |
$15.00 |
$18.00 |
|
|
Gemini 3 Pro (>200K) |
$4.00 |
$18.00 |
$22.00 |
|
|
Termination of Employment 4.6 |
$5.00 |
$25.00 |
$30.00 |
|
|
GPT-5.2 Pro |
$21.00 |
$168.00 |
$189.00 |
Strategic implications for businesses and leaders
For technical leaders, M2.5 represents more than just a cheaper API. It has changed the operations playbook for today’s businesses.
The pressure of "optimized" The incentives to save money are gone. You can deploy high-context, high-reasoning models for routine tasks that were previously costly.
A 37% speed improvement in end-to-end task completion means the "agent" The pipelines valued by AI orchestrators – where models talk to other models – will finally work fast for real-time user applications.
In addition, the high scores of M2.5 in the financial model (74.4% of MEWC) suggest that it can manage the "tacit knowledge" in specialized industries such as law and finance with less supervision.
Because M2.5 is set up as an open-source model, organizations can run intensive, automated code audits on a scale that was impossible before without massive human intervention, all while maintaining better control over data privacy, but until licensing terms and weights are posted, it remains just a moniker.
The MiniMax M2.5 is a signal that the frontier of AI is no longer about who can make the biggest brain, but who can make that brain the most useful—and cheap—worker in the room.






