【AI前沿】Alibaba Launches Qwen-Robot Series Embodied Large Models: Three Models Collaborate to Solve the Pain Points of Heterogeneous Robot Adaptation

2026-06-16

AI NEWSLatest AI NewsArticleAlibaba Launches Qwen-Robot Series Embodied Large Models: Three Models Collaborate to Solve the Pain Points of Heterogeneous Robot AdaptationPublished in Latest AI NewsTime :Jun 16, 2026Read :3minuteOn June 16, Alibaba officially launched the Qwen-Robot series of embodied intelligence large models. The series consists of three core components: the VLA operation model Qwen-RobotManip, the VLN navigation model Qwen-RobotNav, and the world model Qwen-RobotWorld. This strategic move marks a further deepening of major companies’ layout in the field of embodied intelligence foundation models, achieving coordinated operations in robot control, navigation, and physical law reasoning.To address the industry pain point of traditional VLA models having insufficient migration capabilities when changing hardware or scenarios, Qwen-RobotManip introduces an 80-dimensional unified action representation, defining a universal “body language” for different hardware forms, allowing it to automatically adapt with only a few steps of feedback across different devices. The VLN model Qwen-RobotNav, responsible for running errands and navigating, is built upon Qwen-VL, and for the first time unifies five task families—language instruction navigation, target search, autonomous driving, etc.—into a single framework, eliminating the model switching costs under complex tasks.As the thinking brain, Qwen-RobotWorld endows the system with the ability to reason about the physical world, enabling predictions and simulations of the next action and state. Currently, embodied intelligence is entering a critical phase of transitioning from single scenarios to generalization. With the simultaneous release of the three models, Alibaba is expected to accelerate the practical deployment of heterogeneous robots through the decoupling of the technical architecture and the integration of multimodal capabilities.

← 返回首页