NextFin News - In a significant shift for the field of artificial intelligence, researchers from Google and the University of Chicago have published a study revealing that the advanced reasoning capabilities of models like DeepSeek-R1 and Alibaba’s QwQ-32B are not merely the result of increased computational steps. Instead, these models appear to simulate a complex, multi-agent-like interaction structure during their reasoning process, a phenomenon the researchers have termed the "society of thought." The findings, published in late January 2026, suggest that when faced with difficult logical or mathematical problems, these models generate internal dialogues between different "characters" or roles that argue, correct, and reconcile views to reach a correct conclusion.
According to a report by 36Kr, the research team, led by Junsol Kim of the University of Chicago and overseen by Google Vice President Blaise Agüera y Arcas, utilized mechanism interpretability methods to analyze the "reasoning trajectories" of these models. They found that DeepSeek-R1 exhibits significantly higher perspective diversity compared to traditional instruction-tuned models. By activating heterogeneous features related to different personalities and expertise, the model creates a digital "debate team" within its architecture. This internal structure manifests through specific conversational behaviors, including question-answer sequences, perspective switching, and the integration of conflicting views.
The study utilized the Gemini-2.5-Pro model as an evaluator to categorize these internal behaviors. The data showed that DeepSeek-R1 and QwQ-32B frequently engage in "viewpoint conflict"—where the model explicitly corrects itself with phrases like "Wait, this can't be right"—and "viewpoint reconciliation," where it integrates opposing insights into a coherent final answer. Furthermore, the researchers applied the Bales Interaction Process Analysis (IPA) framework to identify social-emotional roles within the reasoning traces. Unlike older models that provide information in a monologue-like manner, DeepSeek-R1 adopts reciprocal roles, such as seeking suggestions and expressing disagreement, mimicking the collective intelligence found in human social groups.
This structural evolution in AI reasoning has profound implications for the industry's understanding of "test-time compute." While it was previously assumed that models simply "thought longer" by performing more calculations, the Google research proves that the quality of that thought—specifically the diversity of internal perspectives—is the true driver of accuracy. In controlled reinforcement learning experiments, the researchers found that even when only accuracy was used as a reward signal, the models spontaneously developed these conversational behaviors. Moreover, by using a sparse autoencoder (SAE) to intervene and boost specific "conversational features," the team was able to nearly double the accuracy of DeepSeek-R1 on certain tasks, such as the Countdown game, from 27.1% to 54.8%.
The emergence of the "society of thought" marks a transition from solitary problem-solving entities to collective reasoning architectures. This trend suggests that future AI development will focus less on raw scaling of parameters and more on the structured interplay of distinct internal voices. As U.S. President Trump’s administration continues to navigate the competitive landscape of global AI dominance, the reliance of U.S. researchers on Chinese open-weight models like DeepSeek-R1 for such fundamental discoveries underscores a shifting dynamic in academic and industrial research. According to The Star, the accessibility of these Chinese models has made them the primary subjects for interdisciplinary research in the U.S., as they offer transparency in reasoning traces that closed-source American models often lack.
Looking forward, the industry is likely to see a surge in "agentic" training methods that explicitly encourage internal debate. Google has already proposed a new research direction aimed at systematically leveraging this "collective wisdom" through advanced agent organization. As models move toward more human-like social cognition, the boundary between individual processing and group-like intelligence will continue to blur. This discovery not only explains the current performance leap in reasoning models but also provides a roadmap for achieving higher levels of artificial general intelligence by simulating the social evolutionary paths that defined human cognitive development.
Explore more exclusive insights at nextfin.ai.
