Applied AI Engineer

Posted on May 14th, 2026

Job Description

About us Infer is building the operating system for insurance agencies. We make AI agents(including voice agents) that handle the work agencies have always done by hand: qualifying inbound leads, helping producers during live calls, auditing calls after, running renewals, and bringing churned customers back. Our long bet is that AI eventually sells insurance directly. Agencies are the wedge because that is where the work, the data, and the customer relationships actually live. Get good there, and the rest follows. We are a YC company and have raised from Stellaris Venture partners and others. Founders are: Vaibhav, Urvin and Suneel. Vaibhav was an architect and AI researcher(at Purdue) now a licensed insurance agent. Urvin worked at BCG, is a surfer with six pack abs. Suneel is an IITian and a philomath. A few reasons to join us: We like pushing each other on team to test the limits because that's when you rediscover yourself. We’re paranoid about making customers succeed (we challenge whats already good) We love people who question-challenge-build. We're highly transparent founders to work with & love getting challenged. Finally, we love people who’re interdisciplinary. About the role You'll own the model quality bar for our voice AI platform, building the evals that tell us if we're getting better, and driving real, measurable improvements in transcription accuracy and TTS quality. This role sits at the intersection of applied ML, audio, and rigorous experimentation: if you ship a change, you'll know exactly what it bought us. What you'll do Build and maintain the eval framework that scores voice agent quality end-to-end transcription, response quality, TTS, and full-conversation outcomes Design voice agent behavior: system prompts, tool use, conversation flow, error recovery, and guardrails for real-time interactions Drive transcription accuracy improvements across STT providers and configurations (Deepgram, Whisper, AssemblyAI, Nvidia, etc.) Drive TTS quality improvements voice selection, latency vs. fidelity tradeoffs, prosody, edge cases Curate and grow our evaluation datasets, including hard-case mining from production traffic Run rigorous A/B experiments and report results that the team can actually act on Partner with backend engineers to wire eval signals into CI so regressions get caught before they ship Must-haves ML engineering experience shipping production systems Strong Python and a working ML stack (PyTorch, Huggingface, pandas, scikit-learn) Hands-on experience designing LLM-based agents: prompting, tool/function calling, multi-turn state, structured outputs Hands-on experience building evals or eval frameworks for ML, LLM, or voice systems. Built LLM-as-judge eval pipelines and know their failure modes Practical experience with ASR/STT comparing providers, fine-tuning, or running open models like Whisper Practical experience with TTS systems (ElevenLabs or open models) Comfortable working with audio data: sample rates, codecs, noise, alignment Nice-to-haves Designed voice agents specifically handled barge-in, interruption recovery, disfluencies, and natural turn-taking at the prompt/behavior layer Experience with diarization, VAD, or endpointing models Audio dataset curation, labeling, or annotation pipelines Trained or fine-tuned ASR or TTS models from scratch or on domain audio Experience with active learning or data-flywheel patterns over production traffic Open-source contributions to AI/ML frameworks Familiarity with cost/latency tradeoffs across model providers for real-time voice

Location

Bengaluru

IN / Bengaluru

Karnataka

Salary

₹2M - ₹5M INR

Experience