The Chorus

Multi-Agent Architecture for High-Stakes AI Decisions

Screenshot 2026-02-09 at 12.16.05 PM.png

Brief Overview

The Chorus is a multi-agent system designed for contract obligation extraction—a problem domain where "mostly right" creates real consequences. Five specialized agents coordinate through a deterministic state machine with verification checkpoints, error recovery, and procedural memory that compounds over time.

This demonstrates production-grade thinking for agentic automation: observable state management, behavioral contracts, systematic reliability, and architecture that acknowledges uncertainty instead of hiding it.

Problem Statement

You ask: "What's my liability if there's a data breach?"

The contract says: "Liability capped at $2M."

Most AI systems stop there. Clean answer. Fast response. Completely wrong.

Because buried 40 pages later: "EXCEPT for data breaches, gross negligence, or willful misconduct—no cap."

The pattern repeats everywhere complex reasoning matters:

Stated terms that look reasonable until you find the exceptions
Rules that modify each other in ways that flip the original meaning
Temporal logic scattered across documents creating hidden obligations
Buried conditions that turn a $2M cap into $20M+ exposure

The simple answer exists. It's just surrounded by complexity that changes what it means.

The architectural challenge: Systems that look intelligent in demos have no internal structure for reasoning under uncertainty. They improvise brilliantly until they don't, and when they fail, no one can explain why.

My Approach

1. Right Before Fast

Reliability compounds. Speed doesn't. The system prioritizes verification checkpoints over throughput—every agent output passes through honesty gates before progressing. This comes from building regulated healthcare AI (United Healthcare/Optum, 2M+ calls/week) where getting it right matters more than getting it fast.

2. Deterministic State Machine (Not Agentic Workflows)

20 states with explicit transitions. Every state change logged. Auditable in <5 minutes. Recoverable. Agentic workflows optimize for autonomy; I optimized for observable governance.

3. Agents as Fallible Collaborators

Rather than treating agents as black boxes, I designed each with:

Explicit behavioral contracts defining scope and decision boundaries
Named error taxonomies with severity levels and recovery protocols
Character-driven design (Leslie Knope for orchestration, Hermione for extraction, Atticus Finch for reasoning) to encode reliable behavioral pattern

Why characters? "Be like Hermione" means: prioritize accuracy over speed, cite sources meticulously, flag ambiguities immediately. This creates predictable collaboration between agents.

System Architecture

Five Specialized Agents:

Orchestrator → State machine coordinator, audit trail manager, verification gatekeeper
Retrieval → Forensic extraction with 95%+ verbatim accuracy
Reasoning → Plain language translation, temporal logic, ensemble debate
Risk → Multi-layer exposure calculation, exception detection
Seanchaí → Procedural memory that compounds over time (Contract 100 takes 50% less time than Contract 1)

Three Verification Checkpoints:

After Extraction (citations verifiable in <30 seconds?)
After Interpretation (party attribution preserved?)
After Risk Calculation (confidence intervals disclosed?)

Error Recovery Paths: Named errors (CITATION_DRIFT, PARTY_ATTRIBUTION_UNCLEAR, FALSE_CAP_DETECTED) with specific recovery protocols.

When Retrieval drifts from verbatim text → automatic correction. When Risk detects buried exceptions → escalate with evidence.

Key Insights

Architecture Is What Handles the Mess

Demos work in controlled environments. Production has documents that contradict themselves, missing information that can't be inferred, nested logic creating circular dependencies. Architecture is what lets you say "I don't know" when confidence is low instead of hallucinating certainty.

Verification Checkpoints > End-to-End Validation

Errors caught early cost less. If Retrieval extracts incorrectly, catching it before Reasoning interprets prevents cascade failures. Honesty checkpoints after each agent create natural recovery points.

Bounded Autonomy Creates Reliability

Plan → Authorization → Execution separation. The system proposes; humans decide. Not because I don't trust AI, but because clarity creates predictability. When each part knows exactly what it should and shouldn't do, the whole system becomes reliable.

70% Transfers Across Domains

Whether analyzing contracts, insurance policies, healthcare regulations, or financial compliance, you need the same capabilities: finding buried exceptions, calculating temporal logic, surfacing conflicts, maintaining audit trails. The domain model changes. The reasoning infrastructure stays the same.

What This Demonstrates for Agentic Product Automation

✓ Multi-agent coordination with observable state management
✓ Behavioral contracts that define interaction boundaries
✓ Systematic reliability through error taxonomies and verification
✓ Production-grade thinking: audit trails, recovery protocols, learning mechanisms
✓ Compound intelligence via procedural memory

The core insight: When AI starts making decisions that matter, the question isn't "how do we make it smarter?" The question is: "how do we make it trustworthy?"

Trust doesn't come from better prompts. It comes from architecture that respects complexity, acknowledges uncertainty, and preserves human judgment.

Explore the System

→ Interactive Demo
Visualize the 20-state workflow, explore agent prompts with behavioral excerpts, and review system-level design artifacts.

→ Detailed State Machine (Available in interview discussion)
Production specification with sub-states, error recovery trees, and validation layers.