Skills: Distributed Systems, Machine Learning, Reinforcement learning (RL)
Founding Research Engineer, Model Training
Location: New York City
Type: Full-time
About CellType
CellType is building foundation models and agent systems for biology.
We believe the next major advances in biotech AI will come from models trained to reason over biological data, experiments, and translational outcomes, not from lightweight wrappers around generic models. We work with pharma and biotech partners on problems such as preclinical-to-clinical translation, response prediction, biomarker discovery, and scientific reasoning across complex biological datasets. Our core technology was originally developed at Yale in collaboration with Google DeepMind, and has been published at top ML venues including ICML.
We are building the core intelligence layer for biology.
About the role
We are hiring a Founding Research Engineer to build and scale the systems that improve our models.
This role sits at the boundary of research and engineering. You will work on training, post-training, evaluation, performance optimization, and the systems needed to support all of that. You should be excited by both novel model development and the operational reality of making training systems run reliably.
What you'll do
Build and improve training and post-training systems for biological foundation models and agentic model workflows
Design and run experiments across supervised fine-tuning, reinforcement learning, tool use, evaluation, and model behavior optimization
Build and maintain distributed RL and post-training infrastructure
Improve reliability of rollout, evaluation, and reward pipelines
Own critical parts of the model training stack, including performance, reliability, observability, and debugging
Investigate and resolve issues across the full stack, from training dynamics and evaluation infrastructure to distributed systems and hardware bottlenecks
Profile and eliminate performance bottlenecks across GPU, networking, and storage layers
Build clean abstractions for experiments, model evaluation, and distributed training workflows
Improve training efficiency, stability, and throughput
Work closely with founders and domain experts to translate biological problems into model tasks, environments, and evaluation frameworks
Help turn research improvements into real product and customer advantage
You may be a fit if you
Have hands-on experience training or materially improving serious LLM or generative ML systems
Have strong software engineering and distributed systems fundamentals
Have deep experience with Python and modern ML frameworks such as PyTorch, JAX, or equivalent systems
Have experience with reinforcement learning or post-training methods
Have built evaluation systems for tool-using or open-ended models
Have a deep understanding of GPU execution constraints and memory trade-offs
Have experience debugging performance issues in production ML systems
Can reason about system-level trade-offs between latency, throughput, and cost
Have a track record of owning critical production infrastructure
Can balance research exploration with engineering implementation
Have experience with distributed systems, large-scale training, or performance-sensitive ML workloads
Care about code quality, testing, performance, and maintainability
Are comfortable in a small team where priorities move toward whatever is most important
Communicate clearly and collaborate well under both normal and high-pressure conditions
Want broad ownership rather than a narrow role boundary
This role will directly shape the quality and speed of CellType's core model systems. The right person will help determine not only how good our models become, but how fast we can improve them and how confidently we can deploy them.
If you want to work on difficult model problems with real scientific and commercial consequences, we'd love to talk.