Shortly after OpenAI was released o1the first “reasoning” model of AI, people began to notice a strange phenomenon. The model sometimes begins to “think” in Chinese, Persian, or some other language – even when asked in English.
Given a problem to solve — eg “How many R’s are in the word ‘strawberry?’” — o1 begins its “thinking” process, arriving at the answer by performing a series of reasoning steps . If the question is written in English, the final answer in o1 will be in English. But the model takes several steps in another language before making its conclusion.
“(O1) randomly started thinking in Chinese in the middle,” one Reddit user said SAYS.
“Why does (o1) randomly start thinking in Chinese?” another user asked a post on X. “There is no conversation segment (5+ messages) in Chinese.”
Why does o1 pro randomly start thinking in Chinese? No part of the conversation (5+ messages) in Chinese… very interesting… training data influence pic.twitter.com/yZWCzoaiit
– Rishab Jain (@RishabJainK) January 9, 2025
OpenAI doesn’t offer an explanation for o1’s strange behavior — or even acknowledge it. So what happened?
Well, AI experts aren’t sure. But they have some theories.
Several of X, including Hugging Face CEO Clément Delangue, cited to the fact that reasoning models such as o1 are trained on data sets containing large numbers of Chinese characters. Ted Xiao, a researcher at Google DeepMind, said that companies including OpenAI use third-party services to label Chinese data, and that the Chinese move is an example of “the influence of linguistic Chinese in reasoning.”
“(Labs like) OpenAI and Anthropic use (third-party) data labeling services for PhD-level reasoning data for science, math, and coding,” Xiao wrote in a post on X. “(F)or the availability of labor experts and cost factors, most data providers are based in China.”
Labels, also known as tags or annotations, help models understand and interpret data during the training process. For example, labels to train an image recognition model can take the form of markers around objects or captions that refer to each person, place, or object depicted in an image.
Studies show that biased labels create biased models. For example, the average annotator more likely to label phrases in African-American Vernacular English (AAVE), the informal grammar used by some Black Americans, as toxic, leading AI toxicity detectors trained on labels to detect AAVE which is not toxic.
Some experts don’t buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is equally likely to move to not, Thaior a language other than Chinese while trying a solution.
However, these experts say, o1 and other models of reasoning just do it using languages they find it most efficient to achieve a goal (or hanging hallucinate).
“The model doesn’t know what the language is, or that the languages are different,” Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. “Just text it all.”
In fact, models do not directly process words. They use tokens instead. Signs CAN be words, like “wonderful.” Or they can be syllables, like “fan,” “tas” and “tic.” Or it can even be individual characters in words – eg “f,” “a,” “n,” “t,” “a,” “s,” “t,” “i,” “c. “
Like labeling, tokens can introduce biases. For example, many word-to-token translators assume that a space in a sentence means a new word, despite the fact that not all languages use spaces to separate words. word.
Tiezhen Wang, a software engineer at AI startup Hugging Face, agreed with Guzdial that the logical inconsistencies in the models’ language can be explained by the associations the models made during training.
“By embracing every linguistic nuance, we expand the model’s worldview and allow it to learn from the full range of human knowledge,” Wang said. WRITES in a post on X. “For example, I prefer Chinese math because each digit is just one syllable, which makes calculations crisp and efficient. But when it comes to topics like unconscious bias, I automatically switch to English, mainly because that’s where I first learned and absorbed those ideas.
Wang’s theory is plausible. Models are probabilistic machines, after all. Trained with many examples, they learn patterns for making predictions, such as how “to whom” in an email usually precedes “may be concerned.”
But Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, warns that we won’t know for sure. “This kind of observation in a distributed AI system is impossible to back up because of how opaque these models are,” he told TechCrunch. “This is one of many cases of why transparency in how AI systems are built is important.”
Short answer from OpenAI, we are left to wonder why o1 thought so songs in French but synthetic biology in Mandarin.






