Home / Services / /04 · AI & AGENTS

/04

LOG ENTRY · UPDATED MAYIS 2026 12 MIN READ

AI products
& autonomous
agents.

Autonomous conversational agents, custom LLM integrations, voice and multimodal interfaces — built for production, not for demos. Evals, observability, guardrails, escalation paths.

CHAPTER 01

Why this practice exists.

Most "AI products" you see are demos in a wig. They work in the pitch and collapse in the field. We built this team for the other thing — the production system that stands in front of paying customers and handles tens of thousands of conversations a day without a human in the loop.

In production, the cost of a wrong answer is asymmetric. A retail assistant that invents a promo costs you a refund. A medical triage agent that invents a dose costs you a lawsuit. Eval-driven development isn't a luxury — it's the only honest way to ship.

We work with the small number of clients for whom the agent is the product, or sits on the critical path. We don't take chatbot briefs bolted onto an existing app — there are agencies for that, and we'll happily point you to one.

If a model can fail, it will. Our job is to make that failure boring, observable and recoverable.

— DEFNE ARSLAN, PRACTICE LEAD

CHAPTER 02

The architecture we ship.

Every agent we put into production follows the same skeleton. The parts change; the form does not.

The orchestrator is where the opinions live. It picks the model per task — GPT-4 for nuanced reasoning, Claude for tool-heavy chains, self-hosted Mistral for cheap classification. It applies the evals before any side effect is committed. It logs every decision to a queryable trace, so when a client asks "why did the agent do that?" — three weeks later — the answer is on your dashboard, not in your memory.

CHAPTER 03

What we ship.

Every engagement ships with the same seven deliverables. Slack on any one of them and the agent becomes a demo, not a product.

Eval suite

200+ test cases per agent. Runs on every commit, every model swap.

Observability

Every prompt, response, tool call and latency in a queryable trace.

Guardrails

PII redaction, topic boundaries, jailbreak detection — model-agnostic.

Human escalation

A handoff routed to your support team with full conversation context.

Model abstraction

Swap providers with a single config line. Never any vendor lock-in.

Cost dashboard

Tokens per user, query, quarter. Forecast before you scale.

CHAPTER 04

How we work.

A few ways to work together. Most engagements begin with a short Sprint that de-risks model selection and user research before a line of production code is written.

SPRINT

We compress the idea into a tangible prototype, fast — real data, a working flow. Enough to decide direction.

BUILD

A full production engagement from design to launch — every deliverable, end to end, together.

OPERATE

A standing team that stays alongside you — pausing or scaling as the need demands.

CHAPTER 05

The tech we trust.

Opinionated but not dogmatic. We pick per task — the right tool, not the fashionable one.

FRONTIER OpenAI

FRONTIER Anthropic

OPEN Mistral

EMBEDDING Cohere

STT/TTS Whisper · 11Labs

VECTOR Qdrant

VECTOR pgvector

TRACING Langfuse

EVALS Promptfoo

DELIVERY Modal · Fly

CHAPTER 06

Selected work.

Three production agents from this team. Each shipped in under nine months.

CASE · SALPRE

Salpre AI

Hardware + AI · the world's first phone-agent adapter · iOS.

→

CASE · ADREP

Adrep AI

GPT-powered ad analytics · shareable reports · iOS + web.

→

CASE · METO

Meto CRM

In-product AI-powered omnichannel CRM · WhatsApp + IG + email.

→

Have a brief? AI or otherwise.

A senior partner replies personally within 24 hours.

Start a project ↗