Native iOS & Android AI Engineer - Swift & Kotlin - India at Skarvo
Your Location and Job
You will work in a remote-first work environment from anywhere in India.
We offer competitive salary.
Founders of Skarvo previously worked at Apple as senior engineers and designers. They are graduates of MIT.
Skarvo founders are located in Silicon Valley, California.
The Role
We're looking for a Mobile AI Engineer expert in Swift and Kotlin to build the runtime infrastructure that makes our on-device LLM an agent — the skill execution engine, native-to-WebView bridges, tool dispatch pipeline, and the chat UI that renders rich interactive AI outputs.
You'll integrate pre-trained and pre-quantized models (Gemma family via LiteRT-LM on Android and MLX Swift on iOS), wire them to a modular skill system, and build the full agentic loop — all running locally on the device with zero cloud dependency.
This is a live product with real users in 175 countries. You'll work directly with the founding team and ship features into existing Swift and Kotlin native codebases.
Key Requirements
5+ years of professional mobile engineering experience
Strong experience with both Swift/SwiftUI or Kotlin/Jetpack Compose, with expertise to work across both platforms
Experience with async/concurrent programming — Kotlin Coroutines and Swift Concurrency (async/await, actors)
Hands-on experience with on-device ML inference SDKs — Apple MLX / MLX Swift, Core ML, LiteRT-LM, LiteRT (TensorFlow Lite), MediaPipe, or ExecuTorch
Experience with small language models for on-device inference — Gemma, Qwen, Bonsai, LFM, Ministral 3, and similar
Experience building native-to-WebView bridges — WKWebView on iOS or WebView + JavascriptInterface on Android
Strong JavaScript proficiency — async/await, DOM APIs, fetch, Canvas, Web Audio
Understanding of LLM function-calling, tool-use, skills and how tool schemas interact with model inference
Familiarity with model quantization tradeoffs and on-device memory/latency constraints (you select and benchmark models, not train them)
Preferred
Familiarity with agentic AI concepts — planning loops, function calling, tool use, multi-step reasoning
Understanding of text embeddings, vector search, and semantic retrieval — ideally on-device using SQLite with vector extensions (sqlite-vec, SQLite-Vector), FAISS, or similar
Experience designing RAG (Retrieval-Augmented Generation) pipelines — combining embedding models, vector indexing, and language model inference, ideally in an on-device or resource-constrained environment
Familiarity with AI agent orchestration and infrastructure — systems that wire agents together, manage tool dispatch, memory, and multi-agent coordination, such as NullClaw, LangGraph, CrewAI, AutoGen, or similar — and understanding of how these patterns apply to on-device, resource-constrained environments
Awareness of agentic protocols — MCP, A2A, AP2
Startup or high-growth company experience preferred
Key Skills
Track record of shipping on-device ML features on real devices — not just demos
Track record of shipping complex mobile architectures involving embedded web engines, native bridges, dynamic UI rendering, and local model inference.
Ability to design systems that span three async boundaries: model inference, native UI, and WebView JavaScript execution
Systems thinking for agent architecture — tool registries, planning loops, state persistence, error recovery
Performance intuition for on-device constraints — memory budgets, battery impact, context window limits, WebView lifecycle
Ability to work autonomously on ambiguous problems
Clear communicator who can explain architecture decisions to engineers and non-engineers
Experience with AI development tools — Claude Code, Codex, Copilot — integrated into daily workflow
Responsibilities
Own on-device AI architecture — drive model selection, inference pipeline design, and build-vs-buy decisions for all AI features
Own the agentic runtime end-to-end — model integration, function-calling pipeline, skill execution engine, and rich chat UI on both iOS and Android
Integrate pre-trained models using Apple MLX / MLX Swift on iOS and LiteRT-LM on Android
Design and build the AI skill system — a modular architecture where the LLM discovers, loads, and executes skills that extend its capabilities with JavaScript logic, interactive UI, and native device actions
Build the native-to-JavaScript bridge — a sandboxed WebView execution environment with bidirectional communication on both platforms
Implement the chat rendering layer — heterogeneous message types including embedded interactive WebViews, images, progress panels, and native action confirmations
Architect the async orchestration pipeline that coordinates LLM inference, tool execution, WebView JS, and UI updates
Ship on-device AI features — local AI chat, tool calling, semantic search, smart suggestions, and agentic capabilities on the product roadmap
Benchmark and select models for Skarvo's on-device requirements — evaluate new Gemma releases, quantization variants, and inference configurations
Build agentic AI systems — design and implement on-device agents that can plan, call functions, use tools, and act on behalf of users — all locally with no cloud calls
Design on-device RAG pipelines — build local embedding, vector indexing, and retrieval systems that power semantic search and context-aware AI features
Collaborate with iOS and Android engineers — integrate ML pipelines into the native Swift codebase using Core ML, MLX Swift, and Swift Concurrency on iOS, and into the native Kotlin codebase using LiteRT-LM (TensorFlow Lite) and Kotlin Coroutines on Android
Stay current — evaluate new models, frameworks, and techniques as the on-device AI landscape evolves rapidly
Tech Stack
On-device inference (iOS): Apple MLX, MLX Swift, Core ML
On-device inference (Android): LiteRT-LM, MediaPipe
Models: Gemma family (Gemma 4 (E2B, E4B), FunctionGemma, EmbeddingGemma)
iOS: Swift 6, SwiftUI, WKWebView, Swift Concurrency
Android: Kotlin, Jetpack Compose, WebView, Kotlin Coroutines
JavaScript: Vanilla JS, Web APIs, CDN library integration
Agentic protocols: MCP, function-calling schemas
Embeddings & search: sqlite-vec, on-device embedding models
CI/CD: GitHub Actions
Backend: Firebase (ephemeral relay only — zero server persistence)