← Back to Resources

The Delivery Squad

A small team of engineers and AI agents that owns one feature end-to-end — team shape, the human/agent split, and the daily routine.

Purpose: Define the smallest team that can take a feature from business intent to verified value with AI agents doing the heavy lifting.
Pairs with: The Delivery Model, The Delivery Workflow.
Key principle: Humans make the decisions. Agents do the work. The squad gives both a shared surface to work on.

What’s in a squad

A delivery squad is one Squad Lead, a few Squad Engineers who each run several AI agent sessions at the same time, and the agent sessions themselves. The squad is small enough to fit in a single planning conversation, and complete enough to ship a feature end-to-end without handing work to another team.

1 ×
Squad Lead
Senior engineer
Runs the squad. Reviews agent PRs for intent alignment, not just correctness. Records significant decisions in the decision log. Coaches the Squad Engineers.
×
Squad Engineers
A small group
Each runs several concurrent agent sessions. They feed agents the right context, review drafts before PR, and write the integration tests agents struggle with.
×
Agent sessions
Concurrent
Draft code, tests, docs. Scaffold structure. Refactor. Generate candidate implementations the humans then shape and verify.
Shape, not headcount. The exact numbers — one lead, how many engineers, how many concurrent agents — depend on the feature, the domain complexity, and how mature the shared context is. Start with one senior plus two mid-level engineers, then increase agent concurrency until the Squad Lead can no longer review the output within a day. That’s your ceiling.

What the human decides vs. what the agent does

The whole model rests on this split. Miss it, and you get either glacial progress (humans trying to do everything) or expensive rework (agents making judgment calls they shouldn’t).

Holder Decision Why
Human What to build and for whom Product intent lives in people and stakeholders, not in code or history.
Human What “correct” means in context Correctness is domain-specific. An agent cannot infer the unwritten rule that always applies in your world.
Human Whether a change is safe to ship Risk judgment integrates signals from history, relationships, and business context agents don’t see.
Human When to stop an agent Knowing a generation is drifting before it’s obvious is a senior-engineer skill. The Squad Lead and Engineering Practice Lead coach this explicitly.
Both How to structure the work Humans propose the shape of a feature. Agents draft the task breakdown. Humans approve and edit.
Both Spec content Humans own intent; agents draft, pressure-test, and surface gaps. Ceremony makes the split explicit.
Agent Generating candidate code and tests This is where agents are strongest: synthesizing patterns from context and producing reviewable drafts quickly.
Agent Scaffolding repeatable structure Folder layouts, boilerplate, migrations between patterns, repetitive refactors. Faster and more consistent than humans.
Agent Running the regression suite Mechanical verification. A good place to push agent autonomy.
Agent Drafting documentation from code Agents produce high-quality first drafts of API docs, changelogs, and runbooks. Humans edit for voice and omissions.

How a squad’s day runs

A squad’s routine is built around a simple fact: agents can work overnight, humans work during the day. The rituals match that cycle.

Morning: triage the overnight

  • The Squad Lead reviews agent drafts and overnight PRs first thing.
  • Anything clean goes into the review queue; anything confused is flagged for the Context Manager.
  • Priorities for the day are adjusted based on what landed.

Midday: co-work

  • Squad Engineers pair with their agent sessions on in-flight tasks.
  • Context gaps surface immediately and get fed to the Context Manager.
  • The Squad Lead unblocks, reviews tricky PRs, and updates the decision log.

Late afternoon: set up the night

  • Agents are briefed on the next batch of tasks, with explicit scope and references to the spec.
  • The Squad Lead spot-checks the briefs; bad briefs mean bad overnight output.
  • Any work that needs human judgment stays with humans for tomorrow.

Weekly: value check-in

  • The squad demos what shipped this week against the feature’s success metrics.
  • Context retrospective: what confused the agents, and what change to the shared context would have prevented it.
  • The Squad Lead pairs with the Context Manager on the top one or two context updates.

What a squad owns, what it doesn’t

A squad owns a feature, not a layer. That means end-to-end delivery of a bounded slice of the product — data model, business logic, API, UI, tests — not “all the backend” or “all the frontend.” Squads that own layers have handoffs; handoffs are where intent dies.

Good squad boundaries. The squad owns “the feature that lets users configure and run a scheduled calculation.” That includes the config UI, the config API, the scheduler integration, and the calculation engine plumbing. One team. One spec. One merge.
Bad squad boundaries. The squad owns “the calculation service.” Every feature needs multiple squads to ship. Coordination overhead multiplies. Context fragments across repositories. Agents lose the plot.

When to add another squad

Scaling the delivery model is a question of squad count, not squad size. Growing a squad past the point where the Squad Lead can review everyone’s work within a day breaks the model.

A common mistake. Growing a squad to a dozen people like a traditional team. The Squad Lead becomes a bottleneck or stops reviewing carefully; either way, agent output drifts. Hold the line on squad size; add squads instead.
← Back to Resources