Expanse Launches HPC Resource Predictor That Outperforms Frontier LLMs 8x

Expanse, a Y Combinator P26 company, launched on June 1, 2026, with an HPC resource prediction system that increases the effective capacity of GPU clusters running schedulers like Kubernetes and SLURM. The platform outperformed frontier language models including GPT-5.5, Claude Opus 4.8, and Gemini 3.5 Pro by 8x on resource prediction benchmarks, addressing a problem that wastes billions in compute resources annually.

Datacenters Waste 59% of Compute Through Over-Requesting

The four founders—Ismaeel, Eren, Yafet, and Nikodem—identified a critical inefficiency in HPC operations: datacenters run at roughly 30-40% effective utilization because users systematically over-request resources. The asymmetric risk drives this behavior: while over-requesting wastes expensive capacity, under-requesting kills jobs mid-run and destroys days of work.

Expanse measured one national-scale HPC cluster for a month and found:

122,000 jobs analyzed over the monitoring period
59% of compute resources were wasted
$8.5 million in compute wasted monthly at on-demand cloud rates
Users typically over-request resources by 2-3x their actual needs

This pattern extends across large-scale compute industries including quantitative finance, AI labs, and manufacturing.

Multimodal Predictor Ingests Code, Scripts, and Hardware Telemetry

Expanse installs on every node and hooks into SLURM or Kubernetes schedulers. The system ingests live hardware telemetry from DCGM, CUPTI, Cgroups, and network/IO monitoring to create custom embeddings of hardware performance. Before jobs submit, Expanse scans workloads and feeds data into deep learning models that provide accurate resource recommendations, failure detection, and optimization suggestions.

The technology originated from research at EPCC (Edinburgh's Parallel Computing Centre), where founder Ismaeel built the first multimodal HPC resource predictor under Adrian Jackson. The model ingests job source code, submission scripts, hardware telemetry, and cluster metadata to determine actual compute requirements. On real EPCC cluster workloads, it scored 34% better than any baseline.

LLM Benchmark Shows No Correlation Between Model Size and Accuracy

Expanse benchmarked its system against Gemini 3.5 Pro, Claude Opus 4.8, GPT-5.5, and Codex 5.3. The results showed Expanse outperforming these models by 8x, with no correlation between model size or iteration and accuracy improvement. Claude Haiku actually performed better than Opus on many workloads, and coding-specific models like Codex 5.3 matched GPT-5.5 accuracy without improvements.

The platform provides three core capabilities:

Resource prediction at submit time: Predicts GPU VRAM, utilization, memory, CPUs, and walltime with confidence intervals
Live observability: Dashboard showcasing telemetry with low single-digit overhead
Failure diagnosis: Correlates stack profiling and hardware telemetry to surface solution-oriented logs with code-line-level suggestions

Founders Bring Experience from Quant Funds and National HPC Facilities

All four founders previously ran HPC and GPU training workloads at the largest quantitative funds and HPC facilities. Their direct experience with the over-requesting problem and exposure to national-scale infrastructure informed Expanse's design. The team's background in both research computing and commercial high-performance workloads positioned them to address inefficiencies across academic, financial, and AI research sectors.

Key Takeaways

Expanse measured 59% compute waste across 122,000 jobs on a national HPC cluster, equivalent to $8.5 million monthly at cloud rates
The platform outperformed GPT-5.5, Claude Opus 4.8, and Gemini 3.5 Pro by 8x on HPC resource prediction benchmarks
Expanse ingests job source code, submission scripts, and live hardware telemetry to predict actual resource needs before jobs run
No correlation exists between LLM size and accuracy on HPC prediction tasks; Claude Haiku outperformed Opus on many workloads
The system integrates with SLURM and Kubernetes to provide resource recommendations, live observability, and failure diagnosis at submission time

Datacenters Waste 59% of Compute Through Over-Requesting

Expanse measured one national-scale HPC cluster for a month and found:

122,000 jobs analyzed over the monitoring period

59% of compute resources were wasted

$8.5 million in compute wasted monthly at on-demand cloud rates

Users typically over-request resources by 2-3x their actual needs

This pattern extends across large-scale compute industries including quantitative finance, AI labs, and manufacturing.

Multimodal Predictor Ingests Code, Scripts, and Hardware Telemetry

LLM Benchmark Shows No Correlation Between Model Size and Accuracy

The platform provides three core capabilities:

Resource prediction at submit time: Predicts GPU VRAM, utilization, memory, CPUs, and walltime with confidence intervals

Live observability: Dashboard showcasing telemetry with low single-digit overhead

Failure diagnosis: Correlates stack profiling and hardware telemetry to surface solution-oriented logs with code-line-level suggestions

Founders Bring Experience from Quant Funds and National HPC Facilities

Key Takeaways

Expanse measured 59% compute waste across 122,000 jobs on a national HPC cluster, equivalent to $8.5 million monthly at cloud rates

The platform outperformed GPT-5.5, Claude Opus 4.8, and Gemini 3.5 Pro by 8x on HPC resource prediction benchmarks

Expanse ingests job source code, submission scripts, and live hardware telemetry to predict actual resource needs before jobs run

No correlation exists between LLM size and accuracy on HPC prediction tasks; Claude Haiku outperformed Opus on many workloads

The system integrates with SLURM and Kubernetes to provide resource recommendations, live observability, and failure diagnosis at submission time