z.ai’s open source GLM-5 achieves record low hallucination rate and uses new RL ‘slime’ technique



Chinese AI startup Zhupai aka z.ai is back this week with a new frontier big language model: GLM-5.

The latest in z.ai’s ongoing and still impressive GLM series, it remains an open source MIT License — perfect for business deployment — and, in one of its many notable achievements, achieved a low independent hallucination rate record. Artificial Intelligence Index Analysis v4.0.

With a score of -1 in the AA-Omniscience Index—which represents a massive 35-point improvement over its predecessor—GLM-5 now leads the entire AI industry, including US competitors such as Google, OpenAI and Anthropic, in the reliability of knowledge by knowing when to avoid rather than generate information.

Beyond reasoning skills, the GLM-5 is built for high-use knowledge work. This shows the native "Agent Mode" capabilities that allow it to turn raw prompts or source materials directly into professional office documents, including ready-to-use .docx, .pdfand .xlsx files.

Whether creating detailed financial reports, high school sponsorship proposals, or complex spreadsheets, GLM-5 delivers results in real-world formats that integrate directly into business workflows.

It is also disruptively priced at approximately $0.80 per million input tokens and $2.56 per million output tokens, approximately 6x cheaper than proprietary competitors such as Claude Opus 4.6, making state-of-the-art agentic engineering more cost-effective than ever. Here’s what else business decision makers need to know about the model and its training.

Technology: scaling for agent effectiveness

At the heart of GLM-5 is a big jump in raw parameters. The model scales from 355B parameters in GLM-4.5 to a staggering 744B parameters, with 40B active per token in its Mixture-of-Experts (MoE) architecture. This growth is supported by an increase in pre-training data to 28.5T tokens.

To address training inefficiencies of this magnitude, Zai was developed "mud," a novel asynchronous reinforcement learning (RL) infrastructure.

Traditional RL has always suffered from "long tail" bottlenecks; Slime breaks this lockstep by allowing trajectories to be generated independently, enabling the fine-grained iterations required for complex agent behavior.

By combining system-level optimizations such as Active Partial Rollouts (APRIL), slime addresses generation bottlenecks that typically consume more than 90% of RL training time, significantly speeding up the iteration cycle for complex agent tasks.

The design of the framework is centered on a tripartite modular system: a high-performance training module powered by Megatron-LM, a rollout module that uses SGLang and custom routers for high-throughput data generation, and a centralized Data Buffer that manages rapid startup and rollout storage.

By enabling adaptive verifiable environments and multi-turn compilation feedback loops, slime provides the solid, high-throughput foundation needed to move AI from simple chat interactions to rigorous, long-horizon systems engineering.

To keep the deployment manageable, GLM-5 integrates DeepSeek Sparse Attention (DSA), which preserves 200K context capacity while drastically reducing costs.

Final knowledge work

Zai frames the GLM-5 as a "office" tools for the AGI era. While previous models focused on snippets, GLM-5 is built to deliver ready-to-use documents.

It can autonomously convert prompts into formatted .docx, .pdf, and .xlsx files—from financial reports to sponsorship proposals.

In practice, this means that the model is able to decompose high-level goals into actionable subtasks and can "Engineering Agent," where humans define quality gates while AI handles execution.

High performance

GLM-5 benchmarks make it the new most powerful open source model in the world, according to Artificial Analysisoutperformed the Chinese rival New Kimi K2.5 by Moonshot was released just two weeks ago, showing that Chinese AI companies are about to be caught up by better-sourced proprietary Western rivals.

According to z.ai’s own materials shared today, the GLM-5 ranks near the state-of-the-art in several key benchmarks:

Verified by SWE-bench: The GLM-5 achieved a score of 77.8, surpassing the Gemini 3 Pro (76.2) and coming close to the Claude Opus 4.6 (80.9).

Sales Bench 2: In a simulation of running a business, GLM-5 ranked #1 among open-source models with a final balance of $4,432.12.

Beyond performance, the GLM-5 aggressively undercuts the market. Live on OpenRouter on February 11, 2026, it is priced at approximately $0.80–$1.00 per million input tokens and $2.56–$3.20 per million output tokens. It falls in the mid-range compared to other top LLMs, but based on top-tier performance in the bechmarking, it can be called a "to steal"

model

Input (per 1M token)

Output (per 1M tokens)

Total Cost (1M in + 1M out)

SOURCE

Qwen 3 Turbo

$0.05

$0.20

$0.25

Alibaba Cloud

Grok 4.1 Fasting (reasoning)

$0.20

$0.50

$0.70

xAI

Grok 4.1 Fasting (not reasoning)

$0.20

$0.50

$0.70

xAI

deepseek-chat (V3.2-Exp)

$0.28

$0.42

$0.70

DeepSeek

deepseek-reasoner (V3.2-Exp)

$0.28

$0.42

$0.70

DeepSeek

Gemini 3 Flash Preview

$0.50

$3.00

$3.50

Google

Kimi-k2.5

$0.60

$3.00

$3.60

Moon shot

GLM-5

$1.00

$3.20

$4.20

Z.ai

ERNIE 5.0

$0.85

$3.40

$4.25

Qianfan

Claude Haiku 4.5

$1.00

$5.00

$6.00

Anthropic

Qwen3-Max (2026-01-23)

$1.20

$6.00

$7.20

Alibaba Cloud

Gemini 3 Pro (≤200K)

$2.00

$12.00

$14.00

Google

GPT-5.2

$1.75

$14.00

$15.75

OpenAI

Claude Sonnet 4.5

$3.00

$15.00

$18.00

Anthropic

Gemini 3 Pro (>200K)

$4.00

$18.00

$22.00

Google

Termination of Employment 4.6

$5.00

$25.00

$30.00

Anthropic

GPT-5.2 Pro

$21.00

$168.00

$189.00

OpenAI

It is almost 6x cheaper in input and almost 10x cheaper in output than Claude Opus 4.6 ($5/$25). This release confirms the rumors that Zhipu AI is behind "Pony Alpha," a stealth model that previously crushed OpenRouter’s coding benchmarks.

However, despite the high benchmarks and low cost, not all first users are enthusiastic about the model, as its high performance does not tell the whole story.

Lukas Petersson, co-founder of safety-focused autonomous AI protocol startup Andon Labs, X said: "After hours of reading the GLM-5 traces: a more effective model, but less situational awareness. Goals are achieved through aggressive tactics but without reasoning about its status or experience of use. It’s scary. This is how you get a paperclip maximizer."

the "paper clip maximizer" refers to a hypothetical situation It was first described by Oxford philosopher Nick Bostrom in 2003where an AI or other autonomous creation accidentally leads to an apocalyptic scenario or human extinction by following a seemingly bad instruction – such as increasing the number of paper clips produced – to an extreme degree, redirecting all the resources necessary for the human (or other life) or otherwise making life impossible by its commitment to fulfill the seemingly bad purpose.

Should your business use GLM-5?

Businesses looking to escape vendor lock-in will find GLM-5’s MIT License and open-weights availability a significant strategic advantage. Unlike closed-source competitors who hide intelligence behind proprietary walls, GLM-5 allows organizations to host their own frontier-level intelligence.

Adoption is not without friction. The full scale of GLM-5—744B parameters—requires a large hardware floor that may be out of reach for small companies without significant cloud or on-premise GPU clusters.

Security leaders must weigh the geopolitical implications of a flagship model from a China-based lab, especially in regulated industries where data residency and provenance are closely audited.

In addition, the shift towards more autonomous AI agents introduces new management risks. While the models work from "chat" on "work," they started operating apps and files autonomously. Without strong agent-specific authorizations and human-in-the-loop quality gates established by business data leaders, the risk of autonomous error increases exponentially.

Finally, the GLM-5 is a "CHANGE" for organizations that have outgrown simple copilots and are ready to build a truly autonomous office.

This is for engineers who need to refactor a legacy backend or need a "self healing" pipeline that doesn’t sleep.

While Western labs continue to optimize for "thought" and reasoning depth, Zai optimizes for execution and scale.

Businesses adopting GLM-5 today aren’t just buying a cheaper model; they are betting on a future where the most valuable AI will be the one that can finish the project without being asked twice.



Source link

  • Related Posts

    Uber Eats launches AI assistant to help create grocery cart

    Uber Eats Office has partnered a new AI feature, “Cart Assistant,” on Wednesday designed to fill customers’ grocery carts faster and easier. The beta version is now available in the…

    Missouri Attorney General Hanaway launches crackdown on illegal gambling machines

    Missouri Attorney General Hanaway and his office have announced a new civil enforcement campaign to remove illegal gambling machines. The games covered by this act are video gaming terminals (VGTs)…

    Leave a Reply

    Your email address will not be published. Required fields are marked *