Cloud Inference Engineer

Posted on May 14th, 2026

Job Description

Qualifications CUDA + GPU inference optimization vLLM, SGLang, or TensorRT-LLM experience KV caching, paged attention, batching, token streaming, etc. Distributed compute (with GPUs is a super plus) No degree required Company Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line. Role Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud. Day to day responsibilities: Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc. Conducting model performance reviews Improve scheduler, batcher, autoscaling; profile latency, cost, utilization Sometimes write kernels and, yes, occasional tasteful shitposting

Location

San Francisco

Salary

$150K - $350K

Experience

Not specified