Skills: Django, Python, Docker, PostgreSQL, Amazon Web Services (AWS)
THE ROLE
Every voice AI pipeline has at least three models (transcription, LLMs, voice activity detection, text-to-speech), and they're all non-deterministic, all changing constantly, all need to work together. Evaluating that at scale is a distributed systems problem from day one.
We're already ingesting millions of calls per month and simulating hundreds of thousands of interactions. Our evaluation infrastructure gates releases and monitors production deployments for leading AI companies across healthcare, finance, and customer support. When we go down, customers feel it immediately.
You'll own core pieces of this infrastructure:
• Building production systems that hold up under real load. We've laid foundations built by engineers who've done this at Google-scale before: queuing, elastic compute, reliability patterns from day one. You'll take that further — scaling our pipelines, improving throughput, and keeping the systems our Fortune 500 customers depend on running.
• Making the right calls on architecture and observability. Our systems need to be scalable and transparent. You'll build with internal monitoring and observability in mind so we know what's happening before our customers do.
• Staying close to the voice AI space. What do the latest voice architectures look like? What models are people using? This isn't idle curiosity. It directly impacts our core infrastructure, our CLI, and how we serve engineers in their development workflows.
• Knowing the difference between vibe coding and agentic engineering. We move fast with AI-assisted development, but we don't write a prompt and skip the output. You'll think about what the best code you've ever seen looks like and embed that standard into your workflow, then share those practices with the team.
Senior at Coval doesn't mean years of experience or where you were before. It means you can take an ambiguous problem, break it into small achievable steps, and balance short-term speed with long-term foundations.
WHAT WE'RE LOOKING FOR
• You've built and operated backend systems at meaningful scale and you know what excellent production infrastructure looks like.
• You can take ambiguity and turn it into a plan, then execute on it without waiting for someone to hand you a spec.
• You have a high bar for engineering quality: reliability, maintainability, observability. You've felt the cost of cutting the wrong corners.
• You're deeply curious about AI and the voice space. Not just as a user, but in how it shapes the systems you build.
• You think about what the engineering team needs today, what it'll need in four months, and how to keep pivoting in the right direction.
• You're excited about AI-assisted engineering done right. You know how to move faster with it without creating AI slop.
WHAT YOU'LL WORK WITH
You'll work across our Python backend and cloud infrastructure (AWS), with modern observability tooling, containerized deployments, and infrastructure-as-code. We invest in the right tools and practices to keep production-grade systems reliable.