Ralph Wiggum is the simple-minded boy from The Simpsons who says things like "I'm learnding!" and eats glue. Of all people, he is now the namesake for a technique for autonomous code generation. The idea behind: If the thought of letting code be generated autonomously turns your stomach, then that is exactly the feeling you should systematically address.
Geoffrey Huntley, an Australian open-source developer, came up with the idea and gave it its name in mid-2025. The driving question behind it was simple: how far can you get if you just let AI agents run without constantly intervening?
The Loop
The Ralph Loop is not a replacement for a structured development approach, but an execution engine for one you already have. The basic idea is about as simple as it gets:
1while has_more_todos; do 2 code-agent --prompt "Work on the next task from todo.md" --non-interactive --yolo 3done
A script starts the AI agent and hands it a prompt. As soon as the agent finishes and exits, the script starts it again. Same prompt, fresh context. After each run it checks whether there are still open tasks. If not, the loop exits.
In practice this works with agents like Claude Code or OpenCode. They can be started in a non-interactive mode. Prompt in, work autonomously, terminate. For the agent to work on its own, it has to have all the permissions and be allowed to execute everything. That is --yolo mode. Writing files, running shell commands, making changes — without asking. Sandboxing therefore becomes essential. The agent needs an isolated environment in which it can't do any damage.
For simple projects a minimal script is enough. For more complex workflows you can have the AI generate the script for you. Maybe your loop has multiple steps per run, such as implementation and review. Maybe you also use a spec framework like BMad. The basic idea stays the same.
Why a Loop? The Fresh-Context Principle
Context windows are the "RAM" of an LLM during a session. Their size is limited. The quality of the results decreases the more of it is used. On top of that, details get lost when the LLM has to summarize the context (compacting). The model loses the thread. Hallucinations increase, earlier decisions get forgotten.
The Ralph Wiggum Loop solves this problem. Each iteration starts a new process with a fresh, empty context. Instead of accumulating ever more context, every iteration starts from zero. Only the specs and the implementation plan land in the context, everything else is gone. One task per run, then reset.
How to Work With the Ralph Wiggum Loop
Specifications as the Foundation
The loop is just the automation. The foundation is a good specification with a checkable task list. In the Ralph loop, exactly one task is implemented per iteration, marked as done, and the agent is restarted. The tasks have to be well specified and clearly bounded.
How you arrive at this structure is up to you. You create the specification in dialogue with the AI. "I want to build X. Ask me questions. Create a specification and an implementation plan." Or you use a framework like BMad, which formalizes this process and ultimately produces stories and tasks that can be worked through.
The format is secondary. What matters is: one task, one run.
On the Loop
Kief Morris describes on martinfowler.com three models for how humans collaborate with AI agents. Out of the Loop means the human only defines the goal. The agent does the rest on its own. That is "vibe coding". In the Loop means the human checks every single output of the agent. That sounds safe, but it doesn't scale. Agents generate code faster than humans can review it. The human becomes the bottleneck.
The third model is On the Loop. Instead of inspecting every output, the human builds the framework in which the agent operates: specifications, automated quality checks, workflow rules. When the result isn't right, you don't fix the code by hand — you improve the agent so the problem doesn't reappear.
Harness Engineering
The AI makes mistakes. The question is not how you prevent those mistakes, but how the agent gets feedback and fixes the mistake itself.
Imagine a rocket meant to reach a distant celestial body. Every deviation means it sails far past its target. What it needs are automatic course corrections. Automated tests make sure the application works. Security scans prevent insecure dependencies from making it into the application. Code-quality checks catch errors before they end up in the build. When a check fails, the agent gets the feedback and fixes the error.
Morris, Anthropic and OpenAI call it harness engineering. Huntley calls it back pressure engineering. The terms differ. The core message is the same: the better the framework, the more reliable the agents.
From Attended to Unattended
Back to the rocket. At the start of the trajectory, course corrections are particularly critical, because small deviations multiply over the distance. In the beginning you start the loop manually and watch every run. You assess whether the automatic feedback mechanisms are doing their job. If not, you adjust the specs or the prompt. That is "attended". You sit next to it and watch.
Over time the course corrections become more reliable. You invest in automated feedback rather than correcting the agent by hand. At some point you start the loop in the evening and look in the morning at what it built. That is "unattended". You only check the result.
The transition is gradual and requires trust. There is no fixed rule for when you are ready to let the AI work on its own. Only the experience you collect by watching.
An Example: A Ralph Wiggum Quote App
To test the Ralph loop in practice, I had a small app built: Umpossible. A web app in which you can browse and vote on Ralph Wiggum quotes.
I created the spec in dialogue with the AI. My initial conditions were:
1## Umpossible – Ralph Wiggum Quote App 2- Shows random Ralph Wiggum quotes with season and episode 3- Voting: upvotes per quote, one vote per session 4- Quote overview with filtering and sorting 5- Admin area for managing quotes 6- Dark mode with system detection 7- Responsive, accessible (WCAG 2.1 AA)
I worked out all the further details together with the AI. The finished spec contains the tech stack, page structure, accessibility requirements and more. From it I had the AI generate an implementation plan with 16 phases. From the project structure through the backend API, frontend components and accessibility, all the way to tests and documentation.
Each phase was worked on in its own loop run. The script for it is simple: it checks whether there are still open phases in the plan, starts Claude Code with the prompt, and repeats this until everything is done. A fresh context per phase, no baggage from previous iterations. After 16 runs I had a working, tested app. After roughly four hours the loop had completed all phases. The API costs for the entire project, from specification to finished implementation, came to around 70 euros.
I Bent My Wookiee
Ralph stumbles, falls, and then says: "I bent my Wookiee." Anyone who has ever let an AI agent work too long inside a single session knows the feeling. At some point everything bends out of shape, and then it falls over. What I took away from the experiment can be boiled down to two principles. Fresh contexts keep the agent on course. The harness catches it when it stumbles anyway. Both sound trivial. The discipline to actually pull it off consistently is not. Pick a small project, write a spec, and let the loop run. The feeling of looking at working code in the morning that you didn't write yourself is something you have to experience at least once.
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Blog author
Johannes Barop
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.