Case Study · Side Project · Agentic AI Platform
OrchestraAI — Designing a
Multi-Agent ML Platform.
A production-architecture multi-agent platform built as a fully importable Git project. The goal: validate the architectural patterns for human-in-the-loop agentic workflows outside of a client environment — no political constraints, no legacy systems. Pure architecture decisions.
01 / The Design Question
"How do you build an agentic ML platform where autonomous agents can act at speed, but humans can intercept, correct, and resume at any point — without architectural friction?"
The existing literature offered two extremes: fully autonomous pipelines (fast, brittle) or human-approval-at-every-step systems (safe, slow). The interesting architecture is the space between — agents that can operate autonomously on low-stakes decisions and pause for humans on high-stakes ones, with the boundary defined explicitly in the state machine.
02 / System Architecture
State Machine Orchestrator
Central coordinatorLangGraph-based state machine with typed transitions, checkpoint persistence, and explicit HITL gates. Every state is observable. Every transition is logged. Restartable from any checkpoint.
Domain Expert Agents
4 specialized agentsDataScout (data discovery + quality), Analyzer (statistical analysis + anomaly detection), Architect (feature engineering + model selection), Validator (evaluation + deployment decision). Each agent has a constrained tool set and a defined output schema.
HITL Control Layer
Human intercept interfaceReact dashboard with real-time agent state visualization. Human can approve, reject, or modify any agent output before the state machine continues. Approval is a typed state transition, not a webhook.
API Layer
FastAPI + asyncAsync FastAPI backend with WebSocket for real-time agent state streaming to the frontend. RESTful endpoints for workflow management, agent configuration, and audit log retrieval.
03 / State Machine — Simplified Flow
HITL gates are first-class state transitions — not event handlers bolted on the outside.
04 / Architecture Decisions
Explicit state machine over implicit LLM routing
LLM routing feels elegant until an agent hallucinates a non-existent tool call or loops silently. An explicit state machine makes every transition observable, testable, and restartable from a checkpoint. In production, you choose debuggability over elegance every time.
Human-in-the-loop as first-class architecture primitive
Bolting HITL onto an existing autonomous pipeline creates race conditions between human decisions and agent continuations. Designing checkpoints into the state graph from day one meant human approval was a typed state, not an afterthought. No agent could proceed past a HITL gate without an explicit human signal.
4 specialized agents over 1 generalist
A single agent with 20 tools degrades in quality as context grows. Specialized agents with constrained tool sets maintain reasoning quality within their domain. The orchestrator's job becomes routing and sequencing — not reasoning. Separation of concerns applied to agents.
05 / What This Validates for Production
OrchestraAI is not a product — it's an architecture proof. The patterns validated here translate directly to enterprise agentic deployments:
- → Explicit state machines make agentic workflows auditable and restartable — critical for regulated environments.
- → HITL as a state (not a webhook) eliminates the race condition between human and agent continuations.
- → Specialized agents with constrained tools outperform generalist agents on domain-specific reasoning at scale.
- → WebSocket state streaming to a human dashboard is the minimal viable observability layer for HITL workflows.
Stack