Data / ML Engineer at Floe Labs
Why this role exists
Floe is the spend layer for AI agents: operators connect once, their agents pay across thousands of APIs through one proxy, and spend is governed with programmable, context-aware budgets — walletless, no crypto. Every governed transaction produces something nobody else has: per-agent, per-vendor behavioral spend data, keyed on agent identity rather than raw wallet.
The reputation graph is the asset that turns that data exhaust into an underwriting substrate — the thing that lets Floe eventually replace locked-up collateral with a creditworthiness score. It is in active design and early build (v0 spec and the agent-creditworthiness scoring service are in review; the activity/data foundation that feeds it is shipping). You would own the data and ML layer of that build.
This is the single highest-leverage technical hire on the credit roadmap and, candidly, our largest execution risk — we are hiring someone who can de-risk it.
To be clear about scope and stage: you are building this graph, not maintaining a mature one. There is real greenfield here and real ambiguity. The behavioral dataset is accruing but is still small relative to what a defensible model needs. If you want a finished pipeline to optimize, this isn't it. If you want to define the schema, the features, and the first scoring models that a new asset class gets underwritten on, it is.
What you'll own
You'll build the pipeline that ingests every agent action through Floe — x402 tool-call payments, facility loans, on/off-ramp events, wallet transfers, key lifecycle events — into a unified, agent-id-keyed behavioral graph, and the models that score it.
Concretely, in the first two quarters:
Design the feature store and event schema on top of the activity data layer (a discriminated-union activity stream already exists; you'll shape it into model-ready features: spend patterns, velocity, counterparty mix, repayment timing, utilization, anomaly signals).
Build the agent→wallet identity resolution layer — aggregating multiple wallets to a single agent/operator entity, which is the part of the signal genuinely proprietary to Floe and the hardest data problem in the stack.
Stand up the creditworthiness scoring service: a v1 that starts as a transparent, defensible heuristic mapping behavior to a collateral-requirement multiplier, with a provider interface so external signals are swappable inputs rather than dependencies.
Define and instrument the model evaluation framework — the KPIs that matter are repayment-rate prediction and graph predictive accuracy, and right now neither has a baseline. You'll establish them honestly, including being the person who says when the data isn't yet sufficient to support a given limit.
Build the backtesting and monitoring harness so scores can be validated against realized repayment outcomes as the loan book grows, with drift detection and clear provenance for every score (it must be explainable for dispute resolution and, eventually, third-party reads).
What we're looking for
How to Apply Step 1 is dogfood gate. Send us:
What your agent does. one paragraph. how does it pay? what APIs does it use, if any?
https://github.com/Floe-Labs/floe-guard/ star and try an agent circuit you've shipped (link or 60-sec Loom is fine) . this is free no deppsit required.
2–3 things you wish your agent's spend layer did differently.
Your framework: CrewAI LangChain, custom, etc.
Star our repo https://github.com/Floe-Labs/floe-guard/
-4+ years building production data pipelines and/or ML systems, with real ownership of something end-to-end (ingestion → features → model → serving), not just model training in notebooks.
-Strong applied background in credit risk, fraud, anomaly detection, recommender/graph systems, or behavioral scoring — anywhere you've turned messy event streams into a calibrated score that drove a real decision.
-Fluency in Python and the modern data stack (e.g. a warehouse or columnar store, dbt-style transforms or equivalent, an orchestration layer, a feature store or hand-rolled equivalent). Comfort with TypeScript is a plus, since the SDK and serving layer are TS.
-Real judgment about evaluation and calibration: you understand the difference between a model that looks good offline and one that holds up against realized outcomes, and you'd rather ship a transparent heuristic you can defend than a black box you can't.
-Comfort with sparse-data and cold-start problems — you've built scoring systems where the training signal was thin at launch and grew over time, and you have opinions about how to bootstrap responsibly without over-claiming.
Strong pluses (not required)
-Experience with graph data models / graph ML (entity resolution, node embeddings, link signals).
-Background in lending, payments, treasury, or financial-services data — you've seen what underwriting data actually needs to look like.
-Familiarity with blockchain data and on-chain indexing (Base, USDC flows, x402, ERC-8004 identity). -You do not need to be crypto-native — most of our users aren't either — but you should be willing to learn the on-chain side of the data.
-Experience making model outputs explainable and externally consumable (the longer-term goal is a machine-readable, standardized score third parties can read, analogous to a credit bureau).
What this role is not
It is not a research role producing papers, and it is not a pure infra role. It sits at the seam between data engineering and applied ML, at an early-stage company where the schema you design and the first scores you ship become the foundation everything else is underwritten on. You will work directly with both founders, set your own tooling, and be held to two numbers: the quality and coverage of the behavioral dataset, and the predictive accuracy of the scores it produces.
Compensation Competitive salary + meaningful equity + USDC option. Remote-first, async-default.
How to Apply Step 1 is dogfood gate. Send us:
What your agent does. one paragraph. how does it pay? what APIs does it use, if any?
https://github.com/Floe-Labs/floe-guard/ star and try an agent circuit you've shipped (link or 60-sec Loom is fine) . this is free no deppsit required.
2–3 things you wish your agent's spend layer did differently.
Your framework: CrewAI LangChain, custom, etc.
Star our repo https://github.com/Floe-Labs/floe-guard/