We build world models that simulate manipulation scenes faithfully enough to validate, and one day, train policies without touching a robot. You'll develop generative models that make this work, with the controllability and physical fidelity to match real-robot behavior.
What you'll do:
Train video and dynamics models: Develop world models with action conditioning for manipulation policies.
Push long-horizon coherence: Develop architectures and training methods that extend rollout quality on hard physical tasks.
Own training infrastructure: Run multi-GPU clusters, write custom CUDA, debug at scale.
Build the world-model data engine: Design, implement, and improve a data engine that allows the world model to compound learning across customers and manipulation tasks.
Requirements:
Very strong coding in Python and PyTorch (or similar).
Video generation experience: Deep experience training image or video generation models end-to-end.
Large-scale training: Track record operating training runs at cluster scale.
3D vision: Working knowledge of multi-view geometry, scene reconstruction, and physical priors.