Alibaba unveils Qwen-Robot Suite for navigation, manipulation and world modeling

Alibaba has introduced the Qwen-Robot Suite, a set of AI models for robots and tasks in the physical world: Qwen-RobotNav for navigation, Qwen-RobotManip for object manipulation, and Qwen-RobotWorld for scene prediction. The team described the project as “a full stack for embodied artificial intelligence.”

📣 Introducing the Qwen-Robot Suite — Qwen-RobotNav, Qwen-RobotManip, Qwen-RobotWorld, three foundation models, a full stack for embodied intelligence.

🧭 Qwen-RobotNav — the gateway to mobility.
• Unifies 5 navigation tasks in one model: instruction following, point-goal,… pic.twitter.com/noumjTtTeS

— Qwen (@Alibaba_Qwen) June 16, 2026

These are software models intended to help physical agents perceive their surroundings, plan actions, and execute natural-language commands. The Qwen-Robot Suite is already in pilot testing with select Alibaba Cloud enterprise customers in robotics.

Why Alibaba is bringing Qwen to the physical world

Large language and multimodal models can handle text, images, video, and speech, but that is not enough for robots. Physical agents must not only understand a command but translate it into motion while accounting for space, object properties, sensor constraints, and the consequences of actions.

Alibaba calls this direction physical AI, or “embodied AI.” In this approach, a model must operate not only on digital data but also in the physical environment: move, find objects, control manipulators, and predict what will happen after an action.

Qwen-RobotNav: five navigation tasks in one model

Qwen-RobotNav handles navigation. The model unifies five groups of tasks:

instruction following;
reaching a specified point;
object search;
target tracking;
autonomous driving.

According to Alibaba, Qwen-RobotNav is built on Qwen3-VL and trained on 15.6 million samples related to route planning and vision-language reasoning.

The company reported a 76.5% success rate on VLN-CE RxR and 90% on EVT-Bench. Alibaba also said the model can serve as a tool within larger agentic systems: a high-level model plans the task, and Qwen-RobotNav handles movement.

Снимок экрана — 2026-06-17 в 12.01.51 — Source: Qwen.

In demonstrations, Alibaba describes scenarios such as searching for a lost item indoors or checking whether a specific object in a building is open. In such tasks, the robot must not only move but also collect visual evidence and return an answer to the user.

Qwen-RobotManip: object manipulation

Qwen-RobotManip is designed for physical actions with objects. The model is intended to help robots pick, move, and place items, as well as transfer skills across different types of devices.

Снимок экрана — 2026-06-17 в 12.03.11 — Source: Qwen-RobotManip.

One key challenge in robotics is that robots describe actions differently. A manipulator, a two-armed platform, a robot with a hand, or a mobile system use different coordinates, joints, and command formats. Qwen-RobotManip seeks to bring these data into a common representation so that training on one type of robot helps another.

For training, Alibaba used more than 38,100 hours of data. This includes 11,320 hours of open robotics data, 1,933 hours of first-person human action video, and 24,808 hours of synthetic robotic demonstrations generated from such video.

The company said the model took first place in RoboChallenge Table30 v1 in the universal models track. According to Alibaba, Qwen-RobotManip also showed resilience to new instructions, unfamiliar objects, and skill transfer across different robots.

Qwen-RobotWorld: a world model for robots

Qwen-RobotWorld is a language-driven video world model. It is intended to predict how a scene will evolve after a given action.

Снимок экрана — 2026-06-17 в 12.08.31 — Source: Qwen-RobotWorld.

For example, the model takes the current observation and a text command, then generates a likely future state of the environment. This approach can be used for manipulation, autonomous driving, navigation, planning, and generating synthetic training data for robots.

To train Qwen-RobotWorld, the team assembled the Embodied World Knowledge corpus. It includes 8.6 million video-text pairs and more than 200 million frames, covering over 20 types of robotic platforms and more than 500 action categories.

Alibaba said Qwen-RobotWorld ranked first in EWMBench and DreamGen Bench, and outperformed all open models in WorldModelBench and PBench. The technical description also claims the model shows strong consistency with basic physical principles — motion, conservation of mass, fluids, and gravity.

Mass-market robots are still a long way off

Despite the reported results, the Qwen-Robot Suite remains a set of models rather than a ready-made consumer robotics platform. Real-world deployment faces sensor noise, actuator wear, non-standard situations, perception errors, and a large number of rare scenarios. Many of the benchmarks used to compare such systems run in simulation or under constrained experimental conditions.

Alibaba has also not disclosed pricing, public launch timing, or the list of customers already testing the Qwen-Robot Suite.

In April, Alibaba Cloud introduced the agentic model Qwen3.6-Plus with a 1 million-token context window and support for external tools.

Why Alibaba is bringing Qwen to the physical world

Qwen-RobotNav: five navigation tasks in one model

Qwen-RobotManip: object manipulation

Qwen-RobotWorld: a world model for robots

Mass-market robots are still a long way off

Steven M. Crimmins

Related Post

IMF Warns of Global Financial Risks from Stablecoins

AI Agents Can Now Hire Humans via New Platform

Russia Proposes Fines Up to 2.5 Million Rubles for Illegal Cryptocurrency Mining

Crypto Fund Outflows Surpass $3 Billion Since October