Microsoft Research Creates 1,000 Synthetic Computers for Long-Horizon AI Agent Training

Microsoft researchers have developed a scalable methodology for training AI agents on realistic, long-horizon productivity tasks by creating thousands of synthetic computer environments. Published to arXiv on April 30, 2026, the paper "Synthetic Computers at Scale for Long-Horizon Productivity Simulation" introduces a breakthrough approach to generating diverse training data for agentic AI systems.

Researchers Generated 1,000 User-Specific Computer Environments

The methodology creates synthetic computers populated with realistic folder hierarchies and content-rich artifacts including documents, spreadsheets, and presentations. Using large language models, researchers progressively elaborate personas into user-specific computer environments with professional-grade content tailored to different roles and professions.

In preliminary experiments, the team created 1,000 synthetic computers and ran long-horizon simulations requiring over 8 hours of agent runtime each, spanning more than 2,000 turns on average. These simulations correspond to approximately one month of human work per computer.

Two-Agent System Creates and Completes Complex Productivity Tasks

The system employs a two-agent architecture. A setup agent creates productivity objectives tailored to each synthetic computer's user, typically requiring multiple challenging professional deliverables such as reports, spreadsheets, and presentations. A second agent then assumes the user's role, navigating the filesystem, coordinating with simulated collaborators, and producing professional artifacts until objectives are completed.

This approach generates rich experiential learning signals that the researchers validated through significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations.

Methodology Can Scale to Billions of Synthetic Environments

The researchers argue that since personas are abundant at billion scale, this methodology can theoretically scale to millions or even billions of synthetic user worlds with sufficient compute. This enables broader coverage of diverse professions, roles, contexts, environments, and productivity needs—addressing a key bottleneck in agent training.

The paper was authored by Tao Ge, Baolin Peng, Hao Cheng, and Jianfeng Gao from Microsoft Research and published under arXiv ID 2604.28181.

Foundation for Agent Self-Improvement and Agentic RL

The researchers position scalable synthetic computer creation, combined with at-scale simulations, as a foundational substrate for agent self-improvement and agentic reinforcement learning in long-horizon productivity scenarios. This addresses the difficulty of collecting diverse, realistic long-horizon task data—one of the primary challenges in developing capable AI agents.

Key Takeaways

Microsoft researchers created 1,000 synthetic computers with realistic folder structures and professional-grade artifacts for AI agent training
Each simulation run requires over 8 hours of agent runtime and spans more than 2,000 turns, corresponding to approximately one month of human work
A two-agent system creates complex productivity objectives and then completes them through realistic interaction with synthetic computer environments
The methodology can theoretically scale to millions or billions of synthetic user worlds, enabling coverage of diverse professions and contexts
Experiments validated significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations

Researchers Generated 1,000 User-Specific Computer Environments

Two-Agent System Creates and Completes Complex Productivity Tasks

Methodology Can Scale to Billions of Synthetic Environments

The paper was authored by Tao Ge, Baolin Peng, Hao Cheng, and Jianfeng Gao from Microsoft Research and published under arXiv ID 2604.28181.

Foundation for Agent Self-Improvement and Agentic RL

Key Takeaways

Microsoft researchers created 1,000 synthetic computers with realistic folder structures and professional-grade artifacts for AI agent training

Each simulation run requires over 8 hours of agent runtime and spans more than 2,000 turns, corresponding to approximately one month of human work

A two-agent system creates complex productivity objectives and then completes them through realistic interaction with synthetic computer environments

The methodology can theoretically scale to millions or billions of synthetic user worlds, enabling coverage of diverse professions and contexts

Experiments validated significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations