MIT Discovers Neural Thickets: Task Experts Are Dense Around Pretrained Weights in Large Models

MIT CSAIL researchers Yulu Gan and Phillip Isola have discovered that large, well-pretrained models contain numerous task-specific solutions densely packed around their original weights, challenging fundamental assumptions about model fine-tuning. Their paper, "Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights," published on arXiv on March 12, 2026, introduces RandOpt—a simple random sampling method that matches the performance of sophisticated techniques like PPO and GRPO without any gradient-based optimization.

Large Models Have Dense Expert Solutions, Small Models Do Not

The researchers treat pretrained weights not as a single starting point but as a distribution containing multiple task-specific solutions. In large, well-pretrained models, diverse task-improving specialists populate a substantial fraction of the neighborhood around pretrained weights. In contrast, smaller models have sparse expert solutions that require gradient-based optimization to discover. This fundamental difference in loss landscape geometry means the optimal post-training strategy depends heavily on model scale.

RandOpt: Random Sampling Matches Sophisticated Optimization Methods

The RandOpt method is remarkably simple: randomly sample N parameter perturbations around pretrained weights, select the top K performers, and combine them via majority voting. This fully parallel approach achieves competitive results with PPO, GRPO, and evolutionary strategies. The success of such a simple method suggests that for large models, finding good task specialists may be easier than previously assumed—reframing post-training from an optimization problem into a sampling and selection problem.

Open Implementation and Interactive Demos Available

The researchers have released their code on GitHub at sunrainyg/RandOpt and created an interactive project website at thickets.mit.edu. The work has significant practical implications: it suggests that practitioners working with large pretrained models may be able to skip expensive gradient-based fine-tuning in favor of simple random search, potentially reducing both computational cost and implementation complexity for many tasks.

Key Takeaways

Large pretrained models contain dense concentrations of task-specific solutions around their original weights, unlike smaller models where experts are sparse
RandOpt achieves competitive performance with PPO and GRPO using only random sampling and majority voting, no gradient descent required
The finding reframes post-training as a sampling problem rather than an optimization problem for large models
Code is available on GitHub at sunrainyg/RandOpt with interactive demos at thickets.mit.edu
The discovery challenges fundamental assumptions about how loss landscape geometry changes with model scale

Large Models Have Dense Expert Solutions, Small Models Do Not

RandOpt: Random Sampling Matches Sophisticated Optimization Methods

Open Implementation and Interactive Demos Available

Key Takeaways

Large pretrained models contain dense concentrations of task-specific solutions around their original weights, unlike smaller models where experts are sparse

RandOpt achieves competitive performance with PPO and GRPO using only random sampling and majority voting, no gradient descent required

The finding reframes post-training as a sampling problem rather than an optimization problem for large models

Code is available on GitHub at sunrainyg/RandOpt with interactive demos at thickets.mit.edu

The discovery challenges fundamental assumptions about how loss landscape geometry changes with model scale