Researchers released ACE-Brain-0 on March 3, 2026, a generalist foundation model that unifies autonomous driving, robotics, and UAV control within a single multimodal large language model. The breakthrough demonstrates that spatial intelligence serves as a universal scaffold enabling AI systems to operate across radically different physical embodiments.
Spatial Intelligence as Universal Foundation
The research team identified that vehicles, robots, and UAVs share a fundamental requirement despite their drastically different morphologies: modeling 3D mental space. This insight positions spatial cognition as a domain-agnostic foundation for cross-embodiment transfer. Rather than training separate models for each physical form, ACE-Brain-0 leverages this shared spatial reasoning capability as a common substrate.
Scaffold-Specialize-Reconcile Training Paradigm
The team developed the SSR paradigm to address training challenges in multi-embodiment systems. The approach first establishes a shared spatial foundation, then cultivates domain-specialized experts for specific embodiments, and finally harmonizes them through data-free model merging. This three-stage process avoids common pitfalls like gradient interference and catastrophic forgetting that plague naive multi-task training.
Group Relative Policy Optimization Strengthens Capabilities
ACE-Brain-0 adopts Group Relative Policy Optimization (GRPO), a reinforcement learning technique first introduced in DeepSeekMath, to strengthen comprehensive capabilities across diverse tasks and embodiments. This optimization technique helps the model balance universal generalization with domain-specific proficiency, addressing the challenge of maintaining performance across 24 spatial and embodiment-related benchmarks while avoiding long-tail data problems.
State-of-the-Art Performance Across 24 Benchmarks
Extensive experiments demonstrate that ACE-Brain-0 achieves competitive and state-of-the-art performance across all tested benchmarks spanning autonomous driving, robotic manipulation, and UAV control tasks. The results validate that spatial intelligence provides sufficient structure for a single model to excel across fundamentally different physical systems.
Key Takeaways
- ACE-Brain-0 unifies autonomous driving, robotics, and UAV control in a single multimodal large language model using spatial intelligence as a common foundation
- The Scaffold-Specialize-Reconcile paradigm establishes shared spatial reasoning, cultivates domain experts, then merges them without additional training data
- Spatial intelligence serves as a universal scaffold because vehicles, robots, and UAVs all require 3D mental space modeling despite different morphologies
- The model achieves competitive or state-of-the-art performance across 24 spatial and embodiment-related benchmarks
- Group Relative Policy Optimization helps balance universal generalization with domain-specific proficiency while avoiding gradient interference and catastrophic forgetting