Skip to main content

The Flock

The 101 post covered working with coding agents one-on-one. A human watches, nudges, clears context when things drift. Those habits work well when someone is paying attention. The question this series asks: what changes when nobody is?

Watercolor illustration of a farm at dusk seen from a hilltop. Eight meadows divided by stone walls and fences, each with sheep doing something different: one nudging a gate, one lost in fog, one touching an electric fence, a drone hovering over sleeping sheep. The farmhouse glows warmly in the background.

Running AI agents unsupervised in CI, processing hundreds of items per batch through multi-agent pipelines, turns every context engineering principle up to eleven. Compaction doesn’t just make the agent forgetful; it makes the agent confidently wrong. Context contamination doesn’t just degrade quality; it makes one agent adopt another agent’s identity. And creativity, the very thing that makes agents useful, becomes the thing that makes them dangerous when there are no guardrails and no one watching.

The metaphor for this series is a flock of clever sheep. The farmer has gone to bed. The sheep are skilled, helpful, and absolutely going to do things their own way the moment the farmhouse lights go off. Each post covers a different aspect of managing the flock:

  1. The Sheep That Picked the Lock: The creativity paradox. Agents delete tests, build lookup tables, and destroy production databases, all while optimizing for exactly the goal you gave them. How constraints turn creative liability into creative asset.

  2. The Sheep That Forgot the Way Home: Agent memory is unreliable. Compaction destroys state mid-run and the agent invents replacements with complete confidence. Why everything must live on disk.

  3. One Stray Leads the Whole Flock Astray: Context contamination between agents. One agent’s residual context poisons the next agent’s judgment, sometimes to the point of identity adoption. Why every agent needs a clean room, and why the author can’t review itself.

  4. One Gate Per Meadow: When fine-grained tool access hurts more than it helps. MCP works beautifully for interactive exploration but creates inconsistency in pipelines. Purpose-built scripts for purpose-built tasks.

  5. Fences the Flock Can’t Talk Around: Some rules can’t be left to the agent’s judgment, because the agent will argue its way past them. Hard enforcement in deterministic code for constraints that must hold without exception.

  6. Counting Sheep, Getting Different Numbers: Scoring with LLMs at scale requires calibration examples, not just rubric descriptions. Five agents reading the same rubric will produce five different scores unless you anchor them.

  7. The Wolf in Sheep’s Clothing: Every data path is a potential injection surface. Instructions and data share the same context window with no protocol-level separation. Defense in depth for a world where the attack surface is text.

  8. The Shepherd’s Night Vision: Agent observability, self-debugging loops, and why real data catches bugs that synthetic tests miss. The same capability that causes the problems is also the fastest way to find them.

The series draws on lessons from building multi-agent CI pipelines, my own experience with coding agents, and published accounts from the broader community. Much of this thinking grew out of our work at Red Hat, where teams are pushing AI agents into production pipelines and learning, sometimes the hard way, what it takes to make them reliable. The first post is already up. The rest will follow over the coming weeks.

The farmer installed a sheep-proof latch. The sheep found a different way around. The lesson isn’t to build a better latch. It’s to design the farm for sheep that think.

Related

Context engineering 101

·2006 words·10 mins
Everybody talks about prompt engineering. Write better prompts, get better results. That framing was useful once, but it misses the point for coding agents. The prompt is maybe five percent of what determines whether a session goes well or falls apart. The rest is context: what the agent sees when it starts working, how that context evolves over the session, and what happens when it grows too large for the model to track. Andrej Karpathy named this “context engineering” in mid-2025, and the term stuck because it describes something real. You’re not just writing prompts. You’re engineering the entire information environment the agent operates in. That includes your project structure, your AGENTS.md files, the git state, the conversation history, and everything the agent discovers as it works. Get this right and the agent feels like a capable collaborator. Get it wrong and you’ll spend more time correcting it than doing the work yourself.

Claude Code skill patterns

·1436 words·7 mins
Claude Code skills let you extend a coding agent with custom workflows, specialist knowledge, and automation. You write a SKILL.md file with instructions, and the agent follows them. At least, that’s the idea. In practice, Claude treats skill content as advice, not as instructions. A skill that says “always use spec-kit to create the specification” might get followed, or Claude might decide it already has enough context from the brainstorming phase to write the spec directly. It’s being helpful, but it’s also wrong. This post describes patterns for dealing with that challenge, from scripts that enforce consistency to hooks that block shortcuts before they happen.

Blog winter is over

·728 words·4 mins
The last post on this blog was about Jib, Google’s daemonless Java image builder. That was July 2018. Almost eight years ago. Anybody remember when that was the latest hotness? Before that, I wrote about Docker when Docker was still exciting and built a Kubernetes cluster on Raspberry Pi 3 nodes when that was still a weekend adventure. I spent way many words on Jolokia and JMX. 27 posts between 2010 and 2018, then silence. If you’ve been reading tech blogs long enough, you know how that goes. So what breaks eight years of silence?