The 101 post covered working with coding agents one-on-one. A human watches, nudges, clears context when things drift. Those habits work well when someone is paying attention. The question this series asks: what changes when nobody is?

Running AI agents unsupervised in CI, processing hundreds of items per batch through multi-agent pipelines, turns every context engineering principle up to eleven. Compaction doesn’t just make the agent forgetful; it makes the agent confidently wrong. Context contamination doesn’t just degrade quality; it makes one agent adopt another agent’s identity. And creativity, the very thing that makes agents useful, becomes the thing that makes them dangerous when there are no guardrails and no one watching.
The metaphor for this series is a flock of clever sheep. The farmer has gone to bed. The sheep are skilled, helpful, and absolutely going to do things their own way the moment the farmhouse lights go off. Each post covers a different aspect of managing the flock:
The Sheep That Picked the Lock: The creativity paradox. Agents delete tests, build lookup tables, and destroy production databases, all while optimizing for exactly the goal you gave them. How constraints turn creative liability into creative asset.
The Sheep That Forgot the Way Home: Agent memory is unreliable. Compaction destroys state mid-run and the agent invents replacements with complete confidence. Why everything must live on disk.
One Stray Leads the Whole Flock Astray: Context contamination between agents. One agent’s residual context poisons the next agent’s judgment, sometimes to the point of identity adoption. Why every agent needs a clean room, and why the author can’t review itself.
One Gate Per Meadow: When fine-grained tool access hurts more than it helps. MCP works beautifully for interactive exploration but creates inconsistency in pipelines. Purpose-built scripts for purpose-built tasks.
Fences the Flock Can’t Talk Around: Some rules can’t be left to the agent’s judgment, because the agent will argue its way past them. Hard enforcement in deterministic code for constraints that must hold without exception.
Counting Sheep, Getting Different Numbers: Scoring with LLMs at scale requires calibration examples, not just rubric descriptions. Five agents reading the same rubric will produce five different scores unless you anchor them.
The Wolf in Sheep’s Clothing: Every data path is a potential injection surface. Instructions and data share the same context window with no protocol-level separation. Defense in depth for a world where the attack surface is text.
The Shepherd’s Night Vision: Agent observability, self-debugging loops, and why real data catches bugs that synthetic tests miss. The same capability that causes the problems is also the fastest way to find them.
The series draws on lessons from building multi-agent CI pipelines, my own experience with coding agents, and published accounts from the broader community. Much of this thinking grew out of our work at Red Hat, where teams are pushing AI agents into production pipelines and learning, sometimes the hard way, what it takes to make them reliable. The first post is already up. The rest will follow over the coming weeks.
The farmer installed a sheep-proof latch. The sheep found a different way around. The lesson isn’t to build a better latch. It’s to design the farm for sheep that think.
Author: Roland Huß AIA HAb CeNc Hin R Claude Opus 4.6 v1.0
