Role Summary
Own the end-to-end lifecycle of memory features, from research to production. You'll fine-tune models for extraction, updates, consolidation, forgetting, and conflict resolution. You'll turn customer pain points into research hypotheses, implement and benchmark ideas from papers, and ship with Engineering to hit latency, reliability, and cost targets. You'll also build evaluation at scale (offline metrics + online A/Bs) and close the loop with real-world feedback to continuously improve quality.
This is not a pure research role. You'll read papers on Monday, prototype on Tuesday, benchmark on Wednesday, and ship to production by Friday. If that pace sounds right, keep reading.
What You'll Ship Early
First 30 days: Deeply understand the memory pipeline and identify the highest-leverage quality gap. Have a concrete research plan with baselines and success metrics.
First 60 days: Ship your first model or retrieval improvement to production and measure the impact. Run your first A/B test on real traffic.
First 90 days: Own a core research workstream end-to-end. Be the person the team turns to for "how do we make memory quality better and how do we know it's working?"
What You'll Do
Train models for memory extraction, updates, consolidation/forgetting, and conflict resolution. Iterate based on data and outcomes, not vibes.
Read, reproduce, and implement research. Quickly prototype paper ideas, benchmark against baselines, and productionize what wins.
Build evaluation at scale. Automated relevance/accuracy/consistency metrics, gold sets, online A/B and interleaving experiments, and clear dashboards.
Work closely with customers to uncover pain points, turn them into research hypotheses, and validate solutions through field trials.
Partner with Engineering to ship. Design APIs and data contracts, plan safe rollouts, and maintain low latency, high reliability, and reasonable cost at scale.
Minimum Qualifications
Experience in RAG or information retrieval (retrieval, ranking, query understanding) for real products, not just toy demos.
Model training/fine-tuning experience (LLMs/encoders) with a strong footing in experimental design and iteration.
Strong Python. Deep experience with PyTorch and familiarity with vLLM and modern serving frameworks.
Built evaluation pipelines for complex AI tasks (gold sets, offline metrics, online tests).
Able to orchestrate data pipelines to run models in production with low-latency SLAs (batch + streaming).
Clear, concise communication with stakeholders across engineering, product, GTM, and customers.
Nice to Have
Publications at venues like CVPR, NeurIPS, ICML, ACL, etc.
Experience with privacy-preserving ML (redaction, differential privacy, data governance).
Deep familiarity with memory/retrieval literature or prior work on memory systems.
Expertise with embeddings, vector DB internals, deduplication, and contradiction detection.
Experience building custom RAG architectures beyond off-the-shelf LangChain/LlamaIndex setups.
Compensation
Salary: $150K to $250K base (depending on experience)
Equity: 0.10% to 0.50%
Location: San Francisco (in-person)
About Mem0
We're building the memory layer for AI agents. Long-term memory that lets AI remember conversations, learn from interactions, and build context over time. We already power millions of memory operations daily across companies building AI-native products.
Mem0 is a Y Combinator (S24) company, backed by top-tier investors including Peak XV and Basis Set Ventures. We raised $24M to make this the default memory infrastructure for AI.
The Founders
Deshraj Yadav, Co-founder and CTO. Led the AI Platform at Tesla Autopilot, enabling large-scale training, model evaluation, and observability for Tesla's full self-driving development. MS in CS from Georgia Tech (ML specialization). Created EvalAI as his master's thesis, an open-source ML evaluation platform used by researchers at CMU, Stanford, Facebook, and Google. Published at CVPR, ECCV, AAAI.
Taranjeet Singh, Co-founder and CEO. Started as a software engineer at Paytm, then built an AI-powered tutoring app at Gradeup (acquired by Byju's) that was featured at Google I/O. Joined Khatabook (YC S18) as first growth engineer and became Senior PM. Built CookupAI, the first GPT app store, and scaled it to 1M+ users with zero marketing spend. Co-authored an O'Reilly book chapter on industrial NLP alongside researchers from Google AI, CMU, and Microsoft Research.
Together, Deshraj and Taranjeet co-created EvalAI and later built Embedchain, an open-source RAG framework with 2M+ downloads. While building Embedchain, they saw firsthand how LLMs forget everything between sessions, leading to repetitive, impersonal interactions. Mem0 was born to fix that: a hybrid memory architecture combining graph, vector, and key-value stores that makes AI applications stateful, personalized, and cost-efficient.
How We Work
Office-first in SF. Hallway chats, whiteboard sessions, and shared meals. The best ideas happen in person.
Velocity with craftsmanship. We ship fast but build for the long term. Every system needs to be fast, reliable, and elegant.
We debug retrieval quality over lunch. Half our Slack is embedding comparisons. If you've ever argued about chunk sizes at 11pm, you'll fit right in.
Data-driven, not ego-driven. The best solution wins, whether it comes from a founder or an engineer who joined yesterday. Results and metrics guide decisions.
Small team, big leverage. You'll work directly with the founders and a tight research + backend team. No layers, no committees.