sllm Wants to Split Your GPU Costs With a Cohort Sharing Model

A new platform called sllm is letting developers pool resources on dedicated GPU nodes, cutting access to models like DeepSeek V3 from $14,000 a month down to as little as five.

Running a 685-billion parameter model such as DeepSeek V3 is not cheap. You need eight H100 GPUs, and renting that kind of infrastructure runs roughly $14,000 per month. For a startup building an AI-powered product, or a solo developer experimenting with inference, that price tag puts the hardware entirely out of reach. A new project called sllm, recently shared on Hacker News, proposes a straightforward fix: split the node with other developers and share the cost.

The idea works through a cohort model. You reserve a spot on a dedicated GPU node by registering your card, but nobody is charged until the cohort fills up. Once it does, the node spins up and everyone gets access. The platform says pricing starts at $5 per month for smaller models, scaling up based on the size of the model and the compute required. The team behind sllm argues that most developers only need between 15 and 25 tokens per second for typical workloads, which means a single high-end node can comfortably serve multiple users simultaneously without noticeable degradation in performance.

What makes this more than just a clever billing trick is the privacy angle. Sllm says it does not log any traffic, positioning the service as a completely private alternative to mainstream API providers. For companies handling proprietary data, customer interactions, or sensitive code generation, that is a meaningful distinction. OpenAI, Google, and Anthropic all have enterprise agreements that address data privacy, but the default developer tiers on those platforms typically involve some level of data processing for abuse monitoring and model improvement. Sllm is betting that a subset of developers wants zero logging by default, not as a negotiated add-on.

The economics of large language model inference have become one of the defining constraints in AI development. Training costs dominated headlines throughout 2023 and early 2024, with figures like the reported $100 million-plus spent training GPT-4 setting expectations for what frontier models cost to build. But inference, the actual running of models in production, is where the recurring spend piles up. As the Financial Times recently noted, enterprise AI spending is shifting heavily toward operational inference costs as companies move from experimentation to deployment. A model that costs tens of millions to train can cost multiples of that to serve at scale over its useful lifetime.

This is where the GPU rental market has flourished. Companies like Together AI, Fireworks, and Anyscale have built businesses around making inference cheaper and more accessible. Cloud giants continue to dominate raw compute, but a growing ecosystem of smaller providers is carving out space by offering better pricing, more flexibility, or specific technical advantages. Sllm is entering that crowd with a slightly different proposition. Rather than competing on raw throughput or custom silicon, it is competing on cost efficiency through direct resource sharing. The model is closer to a timeshare than a traditional cloud service.

Technically, sllm runs vLLM under the hood, which has become a widely adopted open-source inference engine known for its efficient memory management and high throughput. The API is OpenAI-compatible, meaning developers can point their existing code at sllm by simply swapping the base URL. That is a deliberate design choice. Switching costs in AI infrastructure are already low, and any new provider that requires developers to rewrite integration code starts at a significant disadvantage. By maintaining compatibility with the de facto standard API format, sllm removes the friction of adoption entirely.

The Question of Demand and Reliability

The cohort model does introduce one obvious risk: you are dependent on other people signing up. If a cohort for a specific model never fills, the node never launches. Sllm has addressed this by not charging until the group is complete, so there is no financial loss for the user, but there is an opportunity cost. Developers who need guaranteed, immediate access to a large model may find the uncertainty frustrating compared to spinning up an on-demand instance elsewhere. The platform is currently offering a limited selection of models, which also constrains its appeal. Expanding that library will likely determine how quickly it can attract enough users to keep cohorts filling at a reasonable pace.

The broader trend, though, is clear. GPU access is no longer just a problem of supply. H100s are far more available today than they were 18 months ago. The challenge now is cost efficiency. Startups and independent developers are discovering that inference at scale can burn through funding faster than expected, and every dollar saved on compute is a dollar available for product development, hiring, or simply extending runway. Platforms that can reduce those costs without sacrificing performance or privacy will find an audience, particularly among the long tail of developers who do not have enterprise budgets but do have real production workloads. Sllm is an early experiment in what that market might look like. Whether cohort-based sharing becomes a standard model or a niche offering depends entirely on execution, but the underlying problem it addresses is not going away.