Andrej Karpathy Releases Autoresearch for Autonomous LLM Training Experiments

Andrej Karpathy released autoresearch on March 6, 2026, an open-source tool that enables AI agents to autonomously run LLM pretraining experiments. The project gained 535 GitHub stars within 24 hours of release, attracting attention from the machine learning research community for its novel approach to automating model optimization.

Autoresearch Uses Fixed Time Budget for Fair Experiment Comparison

The system works by having an AI agent modify a single training file (train.py), run training for exactly 5 minutes, check validation bits-per-byte (val_bpb), and either keep or discard changes based on performance improvement. This fixed time budget ensures fair comparison across experiments, unlike traditional approaches where faster-converging models might be preferred despite lower quality.

Karpathy explained the methodology differs from his earlier work: "It's a commit that lowered val loss but increased the wall clock time so it gets rejected for being slower. must improve one, the other or both in this version. In my (new) autoresearch repo I have an alternative approach where you always train for eg 5 minutes."

The system uses vocabulary-independent BPB metrics so architectural changes don't introduce measurement bias. By constraining modifications to a single file, the scope remains manageable for the AI agent while still allowing meaningful experimentation.

Technical Requirements and Implementation Details

Autoresearch requires an NVIDIA GPU (tested on H100), Python 3.10+, and operates on single GPU setups without distributed training complexity. The implementation is modular with separate files for constants, data preparation, training, and agent instructions. The project is released under MIT license.

The codebase consists of Python (82.7%) and Jupyter Notebook (17.3%) files across 9 commits. As of March 6, the repository had accumulated 52 forks alongside its 535 stars, indicating early adoption by researchers.

Key Takeaways

Andrej Karpathy released autoresearch on March 6, 2026, gaining 535 GitHub stars in under 24 hours
The system uses a fixed 5-minute training window for all experiments to ensure fair performance comparison
AI agents autonomously modify train.py, run experiments, and keep or discard changes based on validation bits-per-byte metrics
The tool requires NVIDIA GPU (tested on H100) and Python 3.10+, operating on single GPU setups
Released under MIT license with modular implementation separating constants, data prep, training, and agent instructions

Autoresearch Uses Fixed Time Budget for Fair Experiment Comparison

Technical Requirements and Implementation Details

Key Takeaways

Andrej Karpathy released autoresearch on March 6, 2026, gaining 535 GitHub stars in under 24 hours

The system uses a fixed 5-minute training window for all experiments to ensure fair performance comparison

AI agents autonomously modify train.py, run experiments, and keep or discard changes based on validation bits-per-byte metrics

The tool requires NVIDIA GPU (tested on H100) and Python 3.10+, operating on single GPU setups

Released under MIT license with modular implementation separating constants, data prep, training, and agent instructions