ClawGUI Framework Unifies GUI Agent Training, Evaluation, and Deployment Across Platforms

Researchers from multiple institutions have released ClawGUI, the first complete open-source infrastructure for training, evaluating, and deploying GUI agents across mobile platforms. Published on arXiv on April 13, 2026, the framework addresses a critical bottleneck: while GUI agents can control any application through visual interfaces, progress has been limited by the absence of coherent full-stack infrastructure rather than modeling capacity.

ClawGUI Provides Three Integrated Components for End-to-End Development

The framework consists of three main components that work together as a unified pipeline. ClawGUI-RL serves as the first open-source GUI agent reinforcement learning infrastructure with validated support for both parallel virtual environments and real physical devices, integrating GiGPO with a Process Reward Model for dense step-level supervision. ClawGUI-Eval enforces a fully standardized evaluation pipeline across 6 benchmarks and 11+ models, achieving 95.8% reproduction against official baselines. ClawGUI-Agent brings trained agents to Android, HarmonyOS, and iOS through 12+ chat platforms with hybrid CLI-GUI control and persistent personalized memory.

ClawGUI-2B Achieves 17.1% Success Rate on MobileWorld Benchmark

The researchers trained ClawGUI-2B end-to-end within this pipeline and tested it on the MobileWorld GUI-Only benchmark. The model achieved a 17.1% success rate, outperforming the baseline MAI-UI-2B model at the same 2B parameter scale, which scored 11.1%. This represents a 6.0 percentage point improvement, or a 54% relative gain over the baseline.

Framework Addresses Critical Evaluation Drift Problem

The authors identified that evaluation protocols in GUI agent research "drift silently across works," making it difficult to compare results fairly. ClawGUI-Eval standardizes this process, ensuring reproducible comparisons. The framework also distinguishes itself by supporting real physical devices like smartphones, not just simulators, enabling more realistic testing conditions.

Key Takeaways

ClawGUI is the first complete open-source pipeline for GUI agents, covering training, evaluation, and deployment in a unified framework
ClawGUI-2B achieved 17.1% success rate on MobileWorld GUI-Only benchmark, a 54% relative improvement over the 2B baseline model MAI-UI-2B
The framework supports real physical devices across Android, HarmonyOS, and iOS, not just virtual environments
ClawGUI-Eval achieves 95.8% reproduction accuracy against official baselines across 6 benchmarks and 11+ models
The infrastructure integrates reinforcement learning with Process Reward Models for dense step-level supervision during training

ClawGUI Provides Three Integrated Components for End-to-End Development

ClawGUI-2B Achieves 17.1% Success Rate on MobileWorld Benchmark

Framework Addresses Critical Evaluation Drift Problem

Key Takeaways

ClawGUI is the first complete open-source pipeline for GUI agents, covering training, evaluation, and deployment in a unified framework

ClawGUI-2B achieved 17.1% success rate on MobileWorld GUI-Only benchmark, a 54% relative improvement over the 2B baseline model MAI-UI-2B

The framework supports real physical devices across Android, HarmonyOS, and iOS, not just virtual environments

ClawGUI-Eval achieves 95.8% reproduction accuracy against official baselines across 6 benchmarks and 11+ models

The infrastructure integrates reinforcement learning with Process Reward Models for dense step-level supervision during training