vLLM

High-throughput LLM serving engine.

What is vLLM?

High-throughput LLM serving engine. The default choice for self-hosting open-source models at scale.

Run open-source AI models via API.

Ultra-low-latency LPU inference.

Serverless cloud for AI workloads.

Hosted inference for open models.

Newsletter

Updates from the AI world — what shipped, what we’re using in production, and what’s worth your attention. Two emails a month, no spam.