NextFin News - Alibaba Cloud has led a 2 billion yuan ($290 million) Series B funding round for ShengShu, the Beijing-based startup behind the Vidu video generation tool, signaling a strategic pivot toward "world models" as the industry confronts the inherent limitations of text-based artificial intelligence. The investment, announced Friday, includes participation from TAL Education and Baidu Ventures, coming just two months after ShengShu secured 600 million yuan in an earlier round. While the startup declined to disclose its current valuation, the scale of the capital injection underscores a growing conviction among major Chinese tech players that the next frontier of AI lies in simulating physical reality rather than merely predicting the next word in a sentence.
The shift in capital allocation follows a period of intensifying debate over the "scaling laws" of Large Language Models (LLMs). While LLMs like ChatGPT have demonstrated remarkable linguistic fluency, they often lack a fundamental grasp of cause-and-effect or the physical laws that govern the tangible world. ShengShu founder Zhu Jun stated that the company aims to build a "general world model" that bridges the gap between digital video generation and the physical requirements of robotics and autonomous driving. By training on multimodal data—including vision, audio, and touch—these models are designed to perceive and act within a three-dimensional environment, a prerequisite for the "embodied AI" that many researchers believe is the true path to artificial general intelligence.
This transition is not without its skeptics. Tim Lechleider, an analyst at American Century Investments who has closely followed the divergence between LLMs and world models, suggests that while the potential for investors is significant, the technical hurdles remain immense. Lechleider has noted that world models could serve as the "brains" for humanoid robots and self-driving vehicles, yet he maintains a cautious stance on the timeline for commercial viability. According to Lechleider, the current enthusiasm for world models is a "scenario-based projection" rather than a guaranteed outcome, as the computational costs of processing high-fidelity video data for physics simulation far exceed those of text processing.
The competitive landscape for this technology is rapidly globalizing. In early 2026, Yann LeCun, the Turing Award winner who recently departed Meta to launch AMI Labs, raised €500 million to pursue similar physics-based AI systems. Meanwhile, Google DeepMind’s release of Genie 3 and NVIDIA’s Cosmos platform have already begun providing the infrastructure for synthetic, physics-aware training data. Alibaba’s lead in the ShengShu round suggests a determination to ensure that the Chinese ecosystem does not fall behind in this architectural shift, particularly as U.S. firms like World Labs, led by Fei-Fei Li, begin commercializing world model generation.
For Alibaba, the investment serves a dual purpose: it secures a stake in a potential successor to the LLM paradigm while driving demand for its underlying cloud infrastructure. Processing the massive datasets required for world models—which ShengShu claims more naturally capture how the physical world works—requires a level of compute density that favors established cloud giants. However, the success of this bet depends on whether ShengShu can solve the "identity drift" and physics violations that still plague most AI-generated video. If these models cannot maintain consistent physical logic over long durations, their utility for robotics will remain confined to the digital realm of gaming and entertainment.
Explore more exclusive insights at nextfin.ai.
