Expanse, a Y Combinator P26 company, launched on June 1, 2026, with an HPC resource prediction system that increases the effective capacity of GPU clusters running schedulers like Kubernetes and SLURM. The platform outperformed frontier language models including GPT-5.5, Claude Opus 4.8, and Gemini 3.5 Pro by 8x on resource prediction benchmarks, addressing a problem that wastes billions in compute resources annually.
Datacenters Waste 59% of Compute Through Over-Requesting
The four founders—Ismaeel, Eren, Yafet, and Nikodem—identified a critical inefficiency in HPC operations: datacenters run at roughly 30-40% effective utilization because users systematically over-request resources. The asymmetric risk drives this behavior: while over-requesting wastes expensive capacity, under-requesting kills jobs mid-run and destroys days of work.
Expanse measured one national-scale HPC cluster for a month and found:
- 122,000 jobs analyzed over the monitoring period
- 59% of compute resources were wasted
- $8.5 million in compute wasted monthly at on-demand cloud rates
- Users typically over-request resources by 2-3x their actual needs
This pattern extends across large-scale compute industries including quantitative finance, AI labs, and manufacturing.
Multimodal Predictor Ingests Code, Scripts, and Hardware Telemetry
Expanse installs on every node and hooks into SLURM or Kubernetes schedulers. The system ingests live hardware telemetry from DCGM, CUPTI, Cgroups, and network/IO monitoring to create custom embeddings of hardware performance. Before jobs submit, Expanse scans workloads and feeds data into deep learning models that provide accurate resource recommendations, failure detection, and optimization suggestions.
The technology originated from research at EPCC (Edinburgh's Parallel Computing Centre), where founder Ismaeel built the first multimodal HPC resource predictor under Adrian Jackson. The model ingests job source code, submission scripts, hardware telemetry, and cluster metadata to determine actual compute requirements. On real EPCC cluster workloads, it scored 34% better than any baseline.
LLM Benchmark Shows No Correlation Between Model Size and Accuracy
Expanse benchmarked its system against Gemini 3.5 Pro, Claude Opus 4.8, GPT-5.5, and Codex 5.3. The results showed Expanse outperforming these models by 8x, with no correlation between model size or iteration and accuracy improvement. Claude Haiku actually performed better than Opus on many workloads, and coding-specific models like Codex 5.3 matched GPT-5.5 accuracy without improvements.
The platform provides three core capabilities:
- Resource prediction at submit time: Predicts GPU VRAM, utilization, memory, CPUs, and walltime with confidence intervals
- Live observability: Dashboard showcasing telemetry with low single-digit overhead
- Failure diagnosis: Correlates stack profiling and hardware telemetry to surface solution-oriented logs with code-line-level suggestions
Founders Bring Experience from Quant Funds and National HPC Facilities
All four founders previously ran HPC and GPU training workloads at the largest quantitative funds and HPC facilities. Their direct experience with the over-requesting problem and exposure to national-scale infrastructure informed Expanse's design. The team's background in both research computing and commercial high-performance workloads positioned them to address inefficiencies across academic, financial, and AI research sectors.
Key Takeaways
- Expanse measured 59% compute waste across 122,000 jobs on a national HPC cluster, equivalent to $8.5 million monthly at cloud rates
- The platform outperformed GPT-5.5, Claude Opus 4.8, and Gemini 3.5 Pro by 8x on HPC resource prediction benchmarks
- Expanse ingests job source code, submission scripts, and live hardware telemetry to predict actual resource needs before jobs run
- No correlation exists between LLM size and accuracy on HPC prediction tasks; Claude Haiku outperformed Opus on many workloads
- The system integrates with SLURM and Kubernetes to provide resource recommendations, live observability, and failure diagnosis at submission time