WorkPortfolioServicesStudioInsightsContact Start a project ↗ EN / TR
Home / Services / /04 · AI & AGENTS
/04
LOG ENTRY · UPDATED MAYIS 2026 12 MIN READ

AI products
& autonomous
agents.

Autonomous conversational agents, custom LLM integrations, voice and multimodal interfaces — built for production, not for demos. Evals, observability, guardrails, escalation paths.

CHAPTER 01

Why this practice exists.

Most "AI products" you see are demos in a wig. They work in the pitch and collapse in the field. We built this team for the other thing — the production system that stands in front of paying customers and handles tens of thousands of conversations a day without a human in the loop.

In production, the cost of a wrong answer is asymmetric. A retail assistant that invents a promo costs you a refund. A medical triage agent that invents a dose costs you a lawsuit. Eval-driven development isn't a luxury — it's the only honest way to ship.

We work with the small number of clients for whom the agent is the product, or sits on the critical path. We don't take chatbot briefs bolted onto an existing app — there are agencies for that, and we'll happily point you to one.

If a model can fail, it will. Our job is to make that failure boring, observable and recoverable.

— DEFNE ARSLAN, PRACTICE LEAD
CHAPTER 02

The architecture we ship.

Every agent we put into production follows the same skeleton. The parts change; the form does not.

The orchestrator is where the opinions live. It picks the model per task — GPT-4 for nuanced reasoning, Claude for tool-heavy chains, self-hosted Mistral for cheap classification. It applies the evals before any side effect is committed. It logs every decision to a queryable trace, so when a client asks "why did the agent do that?" — three weeks later — the answer is on your dashboard, not in your memory.

CHAPTER 03

What we ship.

Every engagement ships with the same seven deliverables. Slack on any one of them and the agent becomes a demo, not a product.

01
Eval suite
200+ test cases per agent. Runs on every commit, every model swap.
02
Observability
Every prompt, response, tool call and latency in a queryable trace.
03
Guardrails
PII redaction, topic boundaries, jailbreak detection — model-agnostic.
04
Human escalation
A handoff routed to your support team with full conversation context.
05
Model abstraction
Swap providers with a single config line. Never any vendor lock-in.
06
Cost dashboard
Tokens per user, query, quarter. Forecast before you scale.
CHAPTER 04

How we work.

A few ways to work together. Most engagements begin with a short Sprint that de-risks model selection and user research before a line of production code is written.

CHAPTER 05

The tech we trust.

Opinionated but not dogmatic. We pick per task — the right tool, not the fashionable one.

FRONTIER OpenAI
FRONTIER Anthropic
OPEN Mistral
EMBEDDING Cohere
STT/TTS Whisper · 11Labs
VECTOR Qdrant
VECTOR pgvector
TRACING Langfuse
EVALS Promptfoo
DELIVERY Modal · Fly

Have a brief? AI or otherwise.

A senior partner replies personally within 24 hours.

Start a project