Systems Engineer (SF) at Butter (W25) $150K - $250K • 1.00% - 2.00% Muscle Memory Cache for Agents San Francisco, CA, US Full-time Will sponsor 3+ years About Butter Butter is building an LLM proxy that records and deterministically replays tool call trajectories. Our goal is to get LLMs out of the hotpath for repetitive tasks, increasing speed, reducing variability, and eliminating token costs for the many cases that could have just been a script. Why We discovered this problem after experiencing it first-hand building computer-use agents. We realized that many process-automation tasks are deeply repetitive, simple data transformation tasks that could be run as scripts. Critically, the user pull to agents was not to replace these scripts with agents, but to use agents to discover and self-heal the scripts when new edge cases are encountered. We believe these cases exist even beyond computer-use, to any agent tasked to perform repeat workflows. Learn a skill once, run it forever. As an LLM proxy, we act as the LLM, spoofing responses to deterministically guide agents down cached paths, or cleanly falling back to actual LLMs on cache miss. About the role Skills: Go, Rust, Distributed Systems, Amazon Web Services (AWS) Work on Butter’s data layer, building a distributed storage and query engine that scales to thousands of RPS. Requires extensive experience with systems programming, including compiled systems languages (Go, C++, Rust, Zig), kernel APIs & syscalls, file formats, consensus algorithms, and more. We’re building a new type of database, creating structure out of natural language, and we need your help! Technology Stack Our core service is the chat-completions proxy (aka router, gateway) which is written in Go, backed by S3 and deployed to AWS via Pulumi. The UI is written in SvelteKit/TS, with Drizzle & Supabase, deployed to Vercel. We work in a monorepo with great tests and CI (this is not our first rodeo). Interesting Challenges In practice, no two LLM context windows look exactly the same! Even for workflows you could deem "deterministic", dynamic content such as names, addresses, IDs are templated into the context window, making a simple hash-lookup cache have a nearly 0% hit rate. We're building the cache to be template-aware, meaning we need to infer and extract dynamic variables from context windows, and treat them as data when serving results. Just as deterministic scripts separate data from code, we need to do the same. This raises two major challenges you'll get to help us solve: Research: how do you split a context window into the parts that are code (for cache-lookup) and the parts that are data or noise (for rendering responses)? Engineering: how do we ingest megabytes of context data per second, organize it, compact it, and test live requests for cache-hit while keeping overhead less than 50ms?