agents, copilots, RAG

AI Software Development

AI products that ship to production. Agents that triage your queue, copilots embedded in your app, RAG over the docs your customers actually search. We build them in your repo, with evals you can defend in a board meeting.

Start a project → See pricing

Glyph ▲

Cadence 4–16 wks

Starting $96k · 4-week sprint · fixed

Ownership You, day one

01 · What you get

Six things in every engagement.

Not a checklist of deliverables. Things that show up because we work the way we work.

Eval-first, always

Every build starts with a gold set and a precision/latency target. If we can't measure it, we don't ship it. You get the eval harness on day one.

In your codebase

No black-box "AI platform" rental. We commit to your monorepo. The code reads like the rest of your service. Your team can hire against it tomorrow.

Models you can swap

OpenAI, Anthropic, Bedrock, Vertex, open-weight. The model is a config line, not an architectural decision. We benchmark them on your eval set.

Cost & latency budgeted

Every feature ships with a cost/run and p95 latency number. We blow the whistle if a prompt change pushes either past your budget.

Safety & redaction baked

PII redaction, prompt-injection defenses, audit logs, refusal traces. Not a checkbox at the end — wired in from the first prototype.

Handoff in week one

Your engineers are in our PRs from day one. By week 4 they're reviewing ours. By week 8 they're shipping without us in the loop.

02 · How it runs

A typical engagement, laid out.

Every engagement varies. The shape stays the same.

week 0–1
Discovery & eval set

Stakeholder interviews, prior-art audit, a written discovery doc, and a labelled gold set. End of week: a clickable prototype you can show leadership.
week 2
Prototype → harness

Eval harness in CI. First end-to-end pipeline in your repo. Cost & latency tracked. Internal demo on Friday.
week 3
Build to threshold

Iterate on prompts, retrieval, and tools until we hit the precision / latency / cost gates we agreed on.
week 4–6
Stage & ship

Staging deploy. Red-teaming. Rollout plan. Production behind a feature flag. Postmortem-style handoff doc.

03 · Sample SOW

What we'd write down before we started.

A real-shaped excerpt. Names changed. Numbers honest.

SOW · sample-ai-software.txt ● fixed price

engagement intake-agent = {
  outcome:   'agent that triages support tickets',
  success:   '≥85% precision · <2s p95 · $0.04/run',
  team:      ['@rin (lead)', '@marco', '@anika'],
  stack:     ['your repo', 'your VPC', 'your eval set'],
  duration:  '4 weeks · fixed',
  price:     '$96,000',
  ownership: 'you · MIT-licensed code, day 1',
}

04 · What we won't take on

The honest list.

Saving us both a bad-fit call.

Demos we can't put behind an eval gate.
"Replace our entire support team" briefs.
Workloads that require us to fine-tune frontier models from scratch.
Vendor-lock-in plays. We won't build you into a corner you can't exit.

05 · Common questions

The ones we hear on most discovery calls.

Will you sign our MSA / DPA?

Yes. We also publish our own short-form MSA if you'd like a starting point.

Do you build inside our VPC?

Default yes. Most clients run inference inside their own cloud. We adapt to your model provider, key store, and observability stack.

What if precision doesn't hit the bar?

You don't pay the final milestone. We've only had to invoke this twice in 38 sprints — both times we ended up shipping a smaller scope that did clear the bar.

← Previous service Performance Marketing Next service → Software Development

Start with AI Software Development.

A 30-minute call. A written discovery doc in a week. Fixed-price SOW by day 10.

Start a project → See all services