Developer jrandolf launched sllm.cloud on Hacker News on April 4, 2026, introducing a cohort-based pricing model that allows developers to share the cost of dedicated GPU nodes for running large language models. The Show HN post received 122 points and 62 comments, highlighting interest in alternatives to expensive dedicated hosting and traditional pay-per-token APIs.
Shared Cohorts Split $14,000/Month GPU Node Costs Among Multiple Developers
Running DeepSeek V3 (685B parameters) requires 8×H100 GPUs at approximately $14,000 per month, far exceeding the needs of most developers who only require 15-25 tokens per second throughput. The sllm.cloud platform lets developers join cohorts that share a dedicated GPU node, with pricing ranging from $5-$40 per month depending on model and cohort size. Users reserve spots with payment cards, but charges only process once the cohort fills completely and the node spins up.
Platform Offers Six Models With Unlimited Token Usage
The service supports llama-4-scout-109b, qwen-3.5-122b, glm-5-754b, kimi-k2.5-1t, deepseek-v3.2-685b, and deepseek-r1-0528-685b models. Commitment options include 1-month or 3-month terms, with throughput tiers delivering 15-35 tokens per second per user. Unlike traditional APIs, sllm.cloud provides unlimited tokens without per-token billing. The platform runs vLLM backend with OpenAI-compatible API endpoints, allowing developers to swap base URLs without code changes. The service promises complete privacy with no traffic logging.
Novel Business Model Bridges Gap Between API and Dedicated Hosting
The Hacker News discussion focused on the trust model requiring prepayment, pricing sustainability questions, and comparisons to traditional cloud inference pricing. Several commenters praised the innovative approach to making expensive models accessible. The platform represents a cooperative middle ground between pure pay-per-token APIs and expensive dedicated hosting, targeting developers who need predictable costs and dedicated performance without bearing the full expense of GPU infrastructure.
Key Takeaways
- sllm.cloud enables developers to share dedicated GPU node costs through cohort-based pricing at $5-$40/month versus $14,000/month for full nodes
- The platform supports six models including DeepSeek V3 (685B) and offers unlimited tokens with 15-35 tokens/second throughput per user
- Users reserve cohort spots with payment cards but are only charged once the cohort fills and the node spins up
- The service runs vLLM backend with OpenAI-compatible API and promises complete privacy with no traffic logging
- The model bridges the gap between pay-per-token APIs and dedicated hosting, offering predictable monthly costs for developers