Senior Software Engineer, Product Infrastructure at Atlas (S21) $76K - $84K • 0.50% - 1.00% Operating system for restaurants, focusing on Southeast Asia SG Full-time US citizenship/visa not required 6+ years About Atlas Atlas is building the operating system for restaurants. Atlas is the easiest way to start, run and grow any restaurant online and offline. We are the team that previously built Grain, a venture-backed online restaurant, to millions in revenue. Team and investors are from Grain, Accenture, Microsoft, Udacity, McKinsey, Salesforce, Y Combinator and others. https://atlas.kitchen About the role Skills: Kubernetes, Google Cloud, Ruby on Rails, PostgreSQL Atlas is building the operating system for restaurants — the easiest way to start, run, and grow any restaurant, both online and offline. The team at Atlas previously built Grain, a venture-backed online restaurant that grew to millions in revenue. Atlas helps restaurants power online storefronts, POS systems, third-party logistics, sync with food platforms, connect with other tools, leverage AI, and much more Existing customers include SaladStop, Killiney, and Haidilao, and we’re adding brands like Casa Vostra, Artichoke, and Wewa — with new restaurants joining every week The team and investors come from Grain, Accenture, Microsoft, Udacity, McKinsey, Salesforce, Y Combinator, and others Read our hiring memo here Role The product infrastructure engineers exist to make every engineer at Atlas move faster. You build the systems that make shipping safer, faster, and more predictable You’ll work at the intersection of infrastructure and product. The systems you design will directly power core experiences from compute, databases and APIs to deploy pipelines and measurement frameworks. Your work won’t just support scale; it will define how Atlas evolves as a product What you’ll do Develop resilient infrastructure for multi-tenant compute, databases, queues, and observability Evolve deployment pipelines, feature gating, and canary rollouts to make shipping safe and fast Scale shared services and core platform components used across Atlas Create internal tools for monitoring, metrics, and experimentation to drive learning and reliability Partner with product engineers to design for scale, performance, and fault tolerance from day one Rethink abstractions and defaults that limit speed or resilience Required skills and experience 6+ years of experience in Software Engineering and Site Reliability Engineering (or Infrastructure Engineering) Experience with container orchestration platforms and tools like Docker, Kubernetes Experience with infrastructure as code and configuration management tools Experience leading incident response and having strong incident management skills Experience working with Google Cloud Platform services and tools Experience working with modern observability platforms like Prometheus, Grafana, ScoutAPM, etc Experience working with Ruby on Rails and PostgreSQL is a bonus You’ll do great here if You care about speed and craft equally You’ve built something that improves both the product and the way we build it You think in systems and understand how code, data, and infra shape the product You enjoy working close to product, designing APIs, shaping architecture, and helping ship features end-to-end What's in it for you Work with a fast-growing, at the same time, lean and mean team, to make a real-world impact Have a lot of ownership and drive your own results and progression Smart people who sweat the details and push for the highest standards Training and in-house opportunities to help you grow Other benefits include a competitive compensation package, birthday leave and sick leave Technology Languages: Ruby, Javascript Frameworks (frontend): React, React Native, Tailwind CSS, GraphQL Frameworks (backend): Ruby on Rails, Node.js, GraphQL DevOps: Google Cloud Platform, CloudRun, Docker, Terraform, Cloudflare Workers, Firebase, Craco Storage: PostgreSQL (CloudSQL), Workers KV Payments: Adyen, Stripe Technologies we are experimenting with: Cloudflare Durable Objects for real-time applications. Migrating our app to Vite for faster build times. Micro-mono repo to scale out different services using the same application. Our application is a multi-tenant SaaS product with support for isolated databases and compute instances, meaning each tenant gets a dedicated database (isolated data) and dedicated compute (highly scalable) to process requests. We built a custom API router (like DNS) to redirect requests to the right destinations. We primarily write code in Ruby and use Rails for our backend framework and we use React (with Ant design + Tailwind CSS) for the frontend applications. We run our Rails applications on Cloud Run and serve our frontend applications using a mix of Firebase and Cloudfare Workers, they communicate primarily through GraphQL. Our custom API router is built on Cloudflare Workers and KV store for high availability and performance. We use managed Postgres (Cloud SQL) and Workers KV to store application data. Interview Process What’s the interview process like? Quick 45-minute screening interview to make sure we are on the same page. Here we try to briefly understand cultural fit, where you’re at, what you’re optimising for, strengths and weaknesses, relevant experience, etc. 1-1.5-hour deeper interview with the functional lead. There may be some overlapping questions here because we don’t like comparing notes between interviews to keep the process independent (for technical roles). 6-8 hour practical interview (for technical roles). Reference interviews (both from you and us, finding mutual connections). Optional, but we also like to have a “cultural immersion” with the rest of the team over a meal/drink if possible.