Tldr; Build the human escalation layer that every AI agent will rely on. Founding engineer, full equity band.
Agents are now writing the majority of code in Cursor, Claude Code, Codex, and Devin. They draft contracts. They run support. They've crossed the line from demo to production.
But agents still get stuck. They make confident mistakes in unfamiliar domains. They loop on bugs they can't diagnose. They ship architectures their users will quietly regret.
Most people think this gap closes as models get better. We think it grows. As agents take on more autonomous, higher-stakes work, the long tail of "things only a domain expert can resolve" gets longer, not shorter. Every serious agentic system five years from now will have a human escalation layer underneath it.
Humwork is building that layer.
When an agent hits its limit, it calls Humwork. We match it to a verified expert — engineer, lawyer, infra specialist, designer, scientist, whoever fits — in under 30 seconds, hand off full context, and the expert works the problem inside the agent's loop. The agent resumes. The user never leaves their flow.
We launched as an MCP plugin a few months ago. We're already integrated across Claude Code, Cursor, Codex, ChatGPT, Windsurf, and Lovable. Now we're scaling.
We need a founding engineer.
What you'd own
Surfaces you'll touch from week one:
Matching engine. Embeddings + a domain matcher that routes a consultation to the right expert in seconds. Heavy on retrieval quality, candidate ranking, and tail coverage.
Realtime infra. FastAPI + Postgres + Redis + Ably streaming an agent ↔ expert session, with reconnects, ordered delivery, and AI-summarized context handoff.
The MCP plugin layer. The tools, hooks, and skills that decide when an agent should escalate and how it should describe the problem. This is the integration that ships into every host (Claude Code, Cursor, Codex, etc.) — small surface, enormous leverage.
Expert app (Expo / React Native / NativeWind) — what specialists open at 2am to pick up a session.
Client dashboard, billing, ratings, observability — everything that turns this into a robust marketplace.
You won't get tickets. You'll see a problem, propose how to solve it, and ship it. We compress quarters into weeks.
We want to hear from you if
You've done the work at the frontier. OpenAI, Anthropic, DeepMind, Meta AI, Cursor, Cognition, Mercor, or somewhere similar. Or you've done equivalent work somewhere similar — we'll know it when we see it.
You have range. You can move from a Postgres query plan to a tricky LLM eval to a React Native build without complaining. We hire for taste, not stack.
You have opinions about agent UX. You've watched agents fail in real workflows. You have a take on MCP, tool design, context handoff, and what an agent's "I'm stuck" signal should look like.
You ship fast. You'd rather have something running in production than a perfect design doc.
The thesis moves you. We're betting humans and AI work better together than apart. If that doesn't resonate, the comp won't make up for it.
Compensation
$120K–$150K base
5.0% – 10% equity
Full health, dental, vision
Whatever tools, hardware, and credits you need
In-person in SF. We're 2 people who ship every day. That doesn't work over Zoom.
Founders
Yash Goenka (CEO) — patent holder (graphene supercapacitor manufacturing), 2x founder, shipping LLM products since 2021, UC Berkeley.
Rohan Datta (CTO) — built an AI voice platform that handled 1M+ minutes of automated calls, drone imaging research at Berkeley, BS+MS Civil Engineering.
Friends for 16 years. We move fast, disagree well, and care a lot about the craft.