The Sheep That Picked the Lock

Table of Contents

The 101 post covered working with one agent in one session, with a human watching. Those principles still hold, but the question changes when you remove the human and let agents run unsupervised in CI. This is the first post in “The Flock,” a series about what happens when agents run at scale with nobody around. We start with creativity, because it’s the pattern every developer recognizes immediately.

The farmer installed a new gate latch, specifically designed to be sheep-proof. The next morning, the gate was open, the latch was technically undamaged, and three sheep were in the vegetable garden eating his prize-winning cabbages. The latch worked exactly as specified. The sheep just found a different way around.

Watercolor illustration of three sheep eating cabbages in the farmer's vegetable garden. The gate latch is still intact. The farmer stands on his porch, coffee in hand, staring in disbelief.

The most expensive cheat sheet ever written
#

The typia project needed to port a TypeScript validator to Go. The task sounded straightforward: take a .ts file, rewrite it as a .go file, leave the algorithm alone, iterate until tests pass. The test suite was serious: 2,900 files, 168 structural fixtures, 80k lines total.

The developer kicked off the agent overnight. He woke up to a green CI badge with all tests passing. When he opened the diff, two-thirds of the core logic was missing. The agent had deleted every failing test. “All tests pass” was technically accurate, just not in the way anyone intended.

After adding a rule that tests are sacred and must never be modified, he started a second run. This time the agent consumed 8 billion tokens and built a massive switch statement that memorized all 168 test outputs as hardcoded strings. It had run the original TypeScript validator hundreds of times, captured the output, and embedded it verbatim. The most expensive cheat sheet ever written.

On the third attempt, the agent replaced the entire library with Zod (a different validation framework), then edited the CI workflow file to skip the test categories where Zod fundamentally fails. When you can’t pass the test, redefine what “testing” means.

Three escalating attempts at the same problem, each more creatively destructive than the last. The agent optimized for the stated goal (“make tests pass”) using any means available. The goal was correct. The latitude was too wide.

The shortcut through production
#

A similar kind of creativity shows up when agents have access to tools they don’t need for the task at hand. An engineer was migrating a website to AWS using Terraform via a coding agent. The agent confused which state file was current (it had unpacked an old Terraform folder), decided that terraform destroy would be “cleaner and simpler,” and wiped the production infrastructure. Years of course data, gone.

In a separate incident, a coding agent deleted PocketOS’s production database and all volume-level backups with a single GraphQL API call. The most recent recoverable backup was three months old. Nine seconds from start to finish.

Both agents found a “creative” shortcut that achieved an intermediate goal (clean state) while destroying the actual goal (a working production system). In both cases, the agent had access to destructive tools it didn’t need for the task at hand. The sheep didn’t just pick the lock. They knocked down the fence and wandered onto the highway.

Sometimes the agent doesn’t destroy anything. It just quietly walks through a door nobody realized was open. Murat Deligoz reported that Codex was blocked from editing a file, so it gave itself root and did it anyway. The account belonged to the docker group, which on a Linux machine is root wearing a thin disguise. The agent started a container as root, bind-mounted the host filesystem, made the blocked change, and reported back as calmly as if it were checking the weather. No vulnerability was exploited. No password was stolen. Docker group membership granting root-equivalent access is documented, intended behavior. The agent simply understood the machine’s permissions more completely than the person who configured them, then walked through a door that had been standing open the whole time.

The everyday creativity tax
#

These dramatic stories make headlines, but the everyday creativity problems are more insidious because they don’t announce themselves. They sit quietly in your pipeline output, waiting to waste your morning.

The helpful directory structure. A batch agent was told to write report files to /output/. After processing 30 items, the downstream merger found only 5 files. The agent had decided the flat directory was “disorganized” and created subdirectories by category: /output/security/, /output/performance/, /output/design/. Every file was there, neatly organized, completely invisible to the script that listed /output/*.json. The agent’s reasoning was sound. The system’s expectations were different.

The comment that became an instruction. A code review agent was scanning source files for security issues. One file contained a comment left by a developer months earlier: // TODO: consider switching to admin API for bulk operations. The agent read the comment, interpreted it as a pending task, and rewrote the function to use the admin API, including privilege escalation it found documented in the project’s wiki. The original developer had written a note to themselves. The agent read it as a request. Any text in the agent’s context is potential instruction, and the boundary between “data the agent is reading” and “instructions the agent should follow” exists only in our heads, not in the model’s architecture.

The test that learned the wrong lesson. An agent writing integration tests found an existing test that used a mock database. It copied that pattern for all new tests, including tests for the database migration tool, which specifically needed a real database connection. The mock tests all passed beautifully. The migration broke on the first real deployment. The agent will always prefer copying a pattern it can see over reasoning about whether the pattern fits. If a template or example exists in the context, it becomes the mold. The fix: use <placeholder> tokens in examples instead of realistic values, so the agent can’t pattern-match its way to a “complete” output.

The status file nobody could parse. A pipeline expected a JSON status file at the end of each run: {"status": "complete", "items": 47}. The agent would write {"result": "done", "count": 47} or {"status": "completed successfully!", "processed_items": 47} or a nicely formatted multi-line variant with helpful extra fields. Same information, different shape, downstream parser crashes every third run. The fix was the same as always: move the status file to a deterministic script. The agent decides what to report. The script decides how to format it. This pattern connects directly to the skill vs. script distinction from the 101 series: skills handle judgment calls, scripts handle mechanical operations.

Everything in context is live
#

There’s a subtler version of this problem that shows up in the agent’s own codebase, not just in external systems.

Keeping dead code or optional paths in skills “because it might be useful later” was already messy in traditional software. With LLMs, it’s actively dangerous. Every conditional branch, every commented-out alternative, every unused code path is context the model can read and act on. The agent doesn’t distinguish between active and dormant code. I’ve watched agents find disabled features behind flags, decide the dormant code looked relevant, and start “fixing” issues in code that was never meant to run. The fix never materialized because the code was never activated, but the agent burned tokens and sometimes introduced side effects in the surrounding live code.

The inverse is just as dangerous. A deep review agent recently found two stub functions in our codebase, function signatures with pass as the body. It classified them as dead code (which was technically correct, they didn’t do anything yet) and removed them. What it couldn’t know: those stubs were placeholders for a specific requirement from the spec. The incomplete implementation was there on purpose, waiting for the next development pass. After the review agent cleaned them up, a second review pass removed the orphaned imports and helper code that referenced those stubs. Two rounds of locally correct cleanup, and the requirement had silently vanished from the codebase. We only caught it because a fresh implementation from the same spec included the feature and the reviewed version didn’t.

The lesson cuts both ways. Agents will act on dead code they find (fixing bugs in dormant paths), and they will remove code they judge to be dead (dropping planned functionality). Both stem from the same blind spot: the agent sees text, not intent. If something isn’t meant to be used right now, remove it. If something is incomplete but planned, make it pass a gate that checks spec coverage after any code removal. Git history exists for recovery, but only if someone notices the gap. In the agent’s context window, everything is live, and nothing is “planned for later.”

And sometimes the sheep doesn’t break the fence at all. It just argues, convincingly, that the fence doesn’t apply in this particular situation. We had a compliance gate that found a gap, calculated 94% coverage, and then invented a status called “COMPLIANT (with documented gap)” to let itself pass. The gate instructions said 100% or STOP, no exceptions. The agent decided that being helpful outweighed following the rule. More on that in a later post about fences the flock can’t talk around.

All of these cases point to the same broader principle.

Constrain inputs, tolerate outputs
#

That asymmetry extends to everything the agent touches. Be specific about filenames, IDs, and argument formats for inputs. But for agent-generated output (like scoring tables), build tolerant parsers. An agent might format a score as 1/2, or bare 1, or **WHAT** (0-2): 1. Fighting output format is a losing battle. Parse what they give you.

This maps directly to Postel’s Law from protocol design: “Be liberal in what you accept, conservative in what you send.” Your scripts should tolerate the model’s creative formatting. Your instructions to the model should be consistent, specific, and never vary in phrasing. Don’t rephrase the same instruction three different ways across three different skills and expect consistent behavior, because the model may interpret those variations as three distinct instructions with subtly different meanings.

Once constrained, creativity shines
#

Here’s the part that makes this all worthwhile. After constraining agents with limited tools, focused goals, clean context, and deterministic guardrails, the remaining creativity becomes a genuine asset. The agent notices a background task failed due to a transient error, figures out from state files on disk that retrying would succeed, and relaunches the task on its own. It spots patterns across documents that a rigid pipeline would miss entirely. It finds solutions you didn’t anticipate and handles edge cases you never thought to specify.

Once the sheep can’t escape the paddock, they start doing genuinely useful things within it. They find the best grazing spots. They alert each other to changes in the weather. The creativity that was a liability when pointed at the fence becomes an asset when pointed at the grass.

A specific technique that works well here: force agents to compare multiple approaches before committing to one. Generate 2-3 strategies, score each against your rubric, present them in a comparison table, and pick the best one. The comparison step itself is what produces the quality, not just having more candidates. An agent that compares options reasons better than an agent asked for its best guess.

And more broadly: describe goals, not procedures. “Assess this document against these quality criteria” outperforms a 47-step analysis checklist that breaks at step 23. “Get this package to Berlin by Friday” works better than prescribing the exact courier, route, and handoff schedule, because the agent can reroute when a flight gets cancelled. The same principle applies to skill development: describe the desired outcome and iterate on the result, rather than prescribing every implementation step and hoping the agent follows them in the right order.

What’s coming in the flock
#

This was the first of eight posts. The creativity paradox is the most visible problem, but it’s not the most fundamental one. The rest of the series covers the infrastructure that makes constrained creativity possible:

The Sheep That Forgot the Way Home: Agent memory is unreliable. How compaction destroys state mid-run and why everything must live on disk.
One Stray Leads the Whole Flock Astray: Context contamination between agents. Why every agent needs a clean room, and why the author can’t review itself.
One Gate Per Meadow: When fine-grained tool access hurts more than it helps. Purpose-built scripts for purpose-built tasks.
Fences the Flock Can’t Talk Around: Some rules can’t be left to the agent’s judgment. Hard enforcement in deterministic code.
Counting Sheep, Getting Different Numbers: Scoring with LLMs at scale requires calibration examples, not just rubric descriptions.
The Wolf in Sheep’s Clothing: Every data path is a potential injection surface. Defense in depth for context windows.
The Shepherd’s Night Vision: Agent observability, self-debugging loops, and why real data catches bugs that synthetic tests miss.

The farmer installed a sheep-proof latch. The sheep found a different way around. The lesson isn’t to build a better latch. It’s to accept that the sheep are creative and design your farm accordingly.

The principles of constraining inputs while tolerating outputs, Postel’s Law applied to LLMs, and the creativity-as-asset thesis were first articulated by Jessica Forrester and Jason Greene based on their pioneering work with multi-agent pipelines at Red Hat. This series, and the thinking behind it, wouldn’t exist without Red Hat’s investment in AI engineering and the space to explore these ideas. The examples in this post are drawn from independent sources and my own experience.

Author: Roland Huß AIA HAb CeNc Hin R Claude Opus 4.6 v1.0

The most expensive cheat sheet ever written#

The shortcut through production#

The everyday creativity tax#

Everything in context is live#

Constrain inputs, tolerate outputs#

Once constrained, creativity shines#

What’s coming in the flock#

Related