OpenMobile: Open-Source Mobile Agent Framework Hits 64.7% on AndroidWorld with Task Synthesis

Researchers have released OpenMobile, an open-source framework that achieves 64.7% accuracy on AndroidWorld by addressing the data gap in mobile agent research. Published on arXiv April 16, 2026, the framework introduces a scalable task synthesis pipeline and policy-switching trajectory rollout that captures error-recovery data missing from standard imitation learning approaches.

Scalable Task Synthesis Pipeline Addresses Data Scarcity

While leading mobile agent models approach 70% on AndroidWorld, their training data and synthesis methods remain closed. OpenMobile's key innovation is a three-step synthesis pipeline:

Building global environment memory from exploration
Leveraging memory to generate diverse, grounded instructions
Ensuring tasks are executable on real Android environments

This approach generates training data that covers broad app functionality without requiring proprietary datasets. The researchers conducted transparent overlap analysis between synthetic instructions and benchmark test sets, verifying that performance gains stem from functionality coverage rather than benchmark overfitting.

Policy-Switching Captures Essential Error-Recovery Data

OpenMobile introduces policy-switching trajectory rollout, which alternates between learner and expert models during data collection. When the learner model fails, the expert demonstrates recovery, creating training examples that teach agents how to handle mistakes—a critical capability missing from standard imitation learning that only captures successful trajectories.

Performance results demonstrate the effectiveness of this approach:

Qwen3-VL fine-tuned on OpenMobile data: 64.7% on AndroidWorld
Qwen2.5-VL fine-tuned on OpenMobile data: 51.7% on AndroidWorld
Performance far surpasses existing open-data approaches
Competitive with leading closed models

Transparent Evaluation Across Multiple Benchmarks

The framework was evaluated on AndroidWorld and two additional dynamic mobile agent benchmarks. Unlike many mobile agent projects, OpenMobile provides complete transparency about its synthesis pipeline and verification methods, not just final data.

The researchers emphasize that existing approaches fail because closed models don't share synthesis methods, standard imitation learning lacks error-recovery demonstrations, and synthetic tasks often lack grounding in real app functionality. OpenMobile addresses all three limitations.

Complete Release Facilitates Broader Research

The complete framework is available at https://njucckevin.github.io/openmobile/, including the synthesis pipeline, training data, and model checkpoints. By making high-quality mobile agent development accessible without proprietary data access, OpenMobile could accelerate mobile automation research across the research community.

The release demonstrates that transparent, open approaches can achieve competitive performance with closed systems while enabling broader participation in advancing mobile agent capabilities.

Key Takeaways

OpenMobile achieves 64.7% on AndroidWorld using Qwen3-VL, competitive with leading closed models
Policy-switching trajectory rollout captures error-recovery data by alternating between learner and expert models during training
Transparent overlap analysis verifies performance comes from broad functionality coverage, not benchmark overfitting
The complete synthesis pipeline and training data are publicly available, addressing the data gap in mobile agent research
The framework makes high-quality mobile agent development accessible without requiring proprietary datasets

Scalable Task Synthesis Pipeline Addresses Data Scarcity

While leading mobile agent models approach 70% on AndroidWorld, their training data and synthesis methods remain closed. OpenMobile's key innovation is a three-step synthesis pipeline:

Building global environment memory from exploration

Leveraging memory to generate diverse, grounded instructions

Ensuring tasks are executable on real Android environments

Policy-Switching Captures Essential Error-Recovery Data

Performance results demonstrate the effectiveness of this approach:

Qwen3-VL fine-tuned on OpenMobile data: 64.7% on AndroidWorld

Qwen2.5-VL fine-tuned on OpenMobile data: 51.7% on AndroidWorld

Performance far surpasses existing open-data approaches

Competitive with leading closed models

Transparent Evaluation Across Multiple Benchmarks

Complete Release Facilitates Broader Research

The release demonstrates that transparent, open approaches can achieve competitive performance with closed systems while enabling broader participation in advancing mobile agent capabilities.

Key Takeaways

OpenMobile achieves 64.7% on AndroidWorld using Qwen3-VL, competitive with leading closed models

Policy-switching trajectory rollout captures error-recovery data by alternating between learner and expert models during training

Transparent overlap analysis verifies performance comes from broad functionality coverage, not benchmark overfitting

The complete synthesis pipeline and training data are publicly available, addressing the data gap in mobile agent research

The framework makes high-quality mobile agent development accessible without requiring proprietary datasets