# Ralph Analysis: Autonomous AI Agent Loop System ## Executive Summary **Ralph** is an autonomous AI agent loop system designed to run AI coding tools (Amp or Claude Code) repeatedly until all Product Requirements Document (PRD) items are complete. Based on [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/), it represents a paradigm for autonomous software development where each iteration spawns a fresh AI instance with clean context, relying on git history, a progress log, and a structured PRD JSON file for persistence between runs. The core philosophy is simple yet powerful: break work into small, independently completable stories, run AI agents in a loop, and let structured persistence mechanisms carry context forward. This approach solves the fundamental problem of AI context limits by treating each iteration as a stateless worker that reads from and writes to well-defined artifacts. --- ## Architecture Overview ### High-Level Flow ``` ┌──────────────────────────────────────────────────────────────────┐ │ SETUP PHASE │ ├──────────────────────────────────────────────────────────────────┤ │ 1. User writes a PRD (markdown) │ │ 2. Convert PRD to prd.json (structured user stories) │ │ 3. Run ralph.sh (starts autonomous loop) │ └──────────────────────────────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────┐ │ EXECUTION LOOP │ ├──────────────────────────────────────────────────────────────────┤ │ 4. AI picks highest priority story where passes: false │ │ 5. Implements the story (writes code, runs tests) │ │ 6. Commits changes (if tests pass) │ │ 7. Updates prd.json (sets passes: true) │ │ 8. Logs learnings to progress.txt │ │ 9. Updates AGENTS.md/CLAUDE.md with reusable patterns │ │ 10. Check: More stories? → Loop back to step 4 │ └──────────────────────────────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────┐ │ COMPLETION │ ├──────────────────────────────────────────────────────────────────┤ │ Output: COMPLETE and exit │ └──────────────────────────────────────────────────────────────────┘ ``` ### Core Components | Component | Purpose | Persistence | |-----------|---------|-------------| | `ralph.sh` | Bash loop that spawns fresh AI instances | N/A (orchestrator) | | `prd.json` | Task list with status tracking | Git-tracked JSON | | `progress.txt` | Append-only learnings log | Git-tracked text | | `AGENTS.md` / `CLAUDE.md` | Reusable patterns for future iterations | Git-tracked markdown | | `prompt.md` | Instructions template for Amp | Static config | | Skills (`prd`, `ralph`) | PRD generation and conversion helpers | Static config | --- ## Key Features ### 1. **Stateless Iteration Model** Each iteration spawns a completely fresh AI instance with no memory of previous work. Context is rebuilt from: - Git history (what was committed) - `progress.txt` (learnings and context) - `prd.json` (which stories are done) **Key insight**: This sidesteps the AI context window limit by treating each run as independent, with structured artifacts serving as the "memory." ### 2. **Structured Task Management (prd.json)** ```json { "project": "MyApp", "branchName": "ralph/task-priority", "description": "Task Priority System - Add priority levels to tasks", "userStories": [ { "id": "US-001", "title": "Add priority field to database", "description": "As a developer, I need to store task priority...", "acceptanceCriteria": [ "Add priority column to tasks table", "Typecheck passes" ], "priority": 1, "passes": false, "notes": "" } ] } ``` **Design decisions:** - Priority-based ordering ensures dependencies are handled correctly - `passes: false/true` provides clear completion tracking - Acceptance criteria are verifiable (not vague) - Stories are sized to fit within one context window ### 3. **Progressive Learning System** The dual-file learning system distinguishes between: **`progress.txt`** - Append-only chronological log: ``` ## [Date/Time] - [Story ID] - What was implemented - Files changed - **Learnings for future iterations:** - Patterns discovered - Gotchas encountered - Useful context --- ``` **`AGENTS.md` / `CLAUDE.md`** - Consolidated reusable patterns: ``` ## Codebase Patterns - Use `sql` template for aggregations - Always use `IF NOT EXISTS` for migrations - Export types from actions.ts for UI components ``` **Key insight**: Chronological learnings for debugging, consolidated patterns for quick reference. ### 4. **Branch-Based Run Isolation** - Each feature uses a dedicated branch (`ralph/feature-name`) - When starting a new feature, previous runs are archived to `archive/YYYY-MM-DD-feature-name/` - Clean separation between features prevents context pollution ### 5. **Quality Feedback Loops** Ralph requires feedback loops to function: - Typecheck catches type errors - Tests verify behavior - CI must stay green (broken code compounds) Stories must include verifiable acceptance criteria like "Typecheck passes" and "Tests pass." ### 6. **Browser Verification for UI Stories** Frontend stories include "Verify in browser using dev-browser skill" as acceptance criteria. This ensures visual verification of UI changes, not just code compilation. ### 7. **Stop Condition Protocol** The loop terminates when all stories have `passes: true`. The AI outputs: ``` COMPLETE ``` This magic string is grep'd by `ralph.sh` to detect completion. ### 8. **Multi-Tool Support** Ralph supports both Amp and Claude Code: ```bash ./ralph.sh --tool amp [max_iterations] # Default ./ralph.sh --tool claude [max_iterations] ``` Each tool has its own prompt template (`prompt.md` for Amp, `CLAUDE.md` for Claude Code). ### 9. **Skills System for PRD Workflow** Two skills automate PRD creation: **`prd` skill**: Generates structured PRDs with clarifying questions - Asks 3-5 essential questions with lettered options (for quick "1A, 2C, 3B" responses) - Creates markdown PRD with user stories, functional requirements, non-goals **`ralph` skill**: Converts markdown PRDs to JSON - Enforces story sizing (completable in one iteration) - Orders by dependencies (schema → backend → UI) - Adds standard criteria ("Typecheck passes", "Verify in browser") --- ## Notable Patterns and Design Decisions ### 1. **Single Story Per Iteration** **Design**: Each AI run handles exactly ONE user story, never more. **Rationale**: - Ensures complete focus on a single task - Prevents context exhaustion mid-feature - Creates clean commit boundaries - Simplifies failure recovery (retry a single story, not multiple) ### 2. **Append-Only Progress Log** **Design**: `progress.txt` is append-only, never overwritten. **Rationale**: - Preserves full history for debugging - Enables pattern discovery over time - Prevents accidental loss of learnings - Supports consolidation into AGENTS.md when patterns emerge ### 3. **Story Sizing Rules** **Design**: Stories must be small enough for one context window. **Right-sized examples:** - Add a database column and migration - Add a UI component to an existing page - Update a server action with new logic - Add a filter dropdown to a list **Too big (must split):** - "Build the entire dashboard" - "Add authentication" - "Refactor the API" **Rule of thumb**: If you can't describe the change in 2-3 sentences, it's too big. ### 4. **Dependency-Ordered Execution** **Design**: Stories execute in priority order, earlier stories can't depend on later ones. **Correct order:** 1. Schema/database changes (migrations) 2. Server actions / backend logic 3. UI components that use the backend 4. Dashboard/summary views that aggregate data ### 5. **Commit Discipline** **Design**: Only commit when tests pass, with structured messages. ``` feat: [Story ID] - [Story Title] ``` **Rationale**: Clean git history provides context recovery for future iterations. ### 6. **Verifiable Acceptance Criteria** **Design**: Every criterion must be testable, never vague. **Good**: "Button shows confirmation dialog before deleting" **Bad**: "Works correctly", "Good UX", "Handles edge cases" ### 7. **Archiving Previous Runs** **Design**: When `branchName` changes, archive previous `prd.json` and `progress.txt` to `archive/YYYY-MM-DD-feature-name/`. **Rationale**: Clean separation between features, preserves history for reference. --- ## Context Management Strategy Ralph's context management is its most innovative aspect: ### Between Runs (Persistence) | Mechanism | What It Carries | Format | |-----------|-----------------|--------| | Git commits | Code changes, file structure | Versioned files | | `prd.json` | Task completion status | Structured JSON | | `progress.txt` | Learnings, gotchas, patterns | Structured text | | `AGENTS.md` | Consolidated reusable patterns | Markdown | ### Within a Run (Instructions) The AI receives: 1. Instructions from `prompt.md` or `CLAUDE.md` 2. The `prd.json` file content 3. The `progress.txt` file (especially Codebase Patterns section) 4. Access to read any file via AI tool capabilities ### Context Recovery Pattern Each iteration: 1. Reads `progress.txt` Codebase Patterns section first (quick reference) 2. Reads `prd.json` to find next incomplete story 3. Checks git branch matches expected branch 4. Implements story 5. Appends learnings to `progress.txt` 6. Optionally consolidates patterns to AGENTS.md --- ## Agent Orchestration Model ### Single-Agent Loop (Not Multi-Agent) Ralph is NOT a multi-agent system. It's a single-agent loop where: - One AI instance runs at a time - Each instance is independent (no inter-agent communication) - Coordination happens via file-based state (prd.json, progress.txt) ### Orchestration via Bash Script `ralph.sh` is a simple bash loop: ```bash for i in $(seq 1 $MAX_ITERATIONS); do OUTPUT=$(cat prompt.md | amp --dangerously-allow-all 2>&1 | tee /dev/stderr) || true if echo "$OUTPUT" | grep -q "COMPLETE"; then echo "Ralph completed all tasks!" exit 0 fi done ``` **Key points:** - Uses `--dangerously-allow-all` (Amp) or `--dangerously-skip-permissions` (Claude) for autonomous operation - Outputs are piped through `tee` for visibility - Completion detected via grep for magic string - 2-second sleep between iterations --- ## Error Handling and Recovery ### Implicit Error Handling Ralph has minimal explicit error handling. Instead: - If tests fail, the story isn't committed - If the AI can't complete a story, it logs learnings and the next iteration retries - If max iterations are reached, the script exits with an error - Human intervention is expected for complex failures ### Recovery via Progress Log Failed attempts are documented in `progress.txt`: ``` ## [Date/Time] - [Story ID] - Attempted to implement X - Failed because Y - **Learnings:** - Don't do Z - Instead try W --- ``` The next iteration reads these learnings and avoids the same mistakes. --- ## Configuration and Customization ### Per-Project Customization After copying the prompt template to your project: - Add project-specific quality check commands - Include codebase conventions - Add common gotchas for your stack ### Amp Auto-Handoff Configuration For large stories that approach context limits: ```json { "amp.experimental.autoHandoff": { "context": 90 } } ``` This enables automatic handoff when context fills up. ### Iteration Limits ```bash ./ralph.sh [max_iterations] # Default: 10 ``` --- ## Comparison to Typical Orchestration Approaches | Aspect | Ralph | Typical Orchestration | |--------|-------|----------------------| | **Memory** | File-based (git, JSON, text) | In-memory state, databases | | **Coordination** | Sequential loop | Often parallel/concurrent | | **Agent Communication** | Via files | Direct messaging, queues | | **Complexity** | Simple bash script (~100 LOC) | Often complex frameworks | | **Failure Recovery** | Retry from last good state | Explicit retry logic, checkpoints | | **Context Management** | Fresh context per iteration | Persistent context, context windows | | **Task Decomposition** | Pre-planned user stories | Often dynamic planning | | **Human Oversight** | Minimal during run | Often requires approval gates | ### Key Differentiators 1. **Simplicity**: Ralph is a bash script, not a framework 2. **Statelessness**: Each iteration is independent 3. **Git-Native**: Uses git as the primary state management 4. **AI-Tool Agnostic**: Works with both Amp and Claude Code 5. **Human-Readable Artifacts**: All state is in human-readable files --- ## Implications for Makima ### Features to Consider Adopting 1. **Structured PRD-to-JSON workflow** with skills 2. **Append-only progress logging** for context between runs 3. **Story sizing enforcement** (completable in one context window) 4. **Dependency-ordered task execution** 5. **Branch-based run isolation** with archiving 6. **Consolidated patterns file** (AGENTS.md equivalent) 7. **Magic string completion protocol** (`COMPLETE`) 8. **Verifiable acceptance criteria** enforcement 9. **Browser verification** for UI stories ### Optional Features (Flag-Controlled) 1. `--max-iterations` limit 2. `--auto-handoff` for context management 3. `--archive-previous` for run isolation 4. `--require-tests` for quality gates 5. `--single-story-per-run` mode ### Opinionated Features 1. Task decomposition must result in context-window-sized stories 2. Progress logs must be append-only 3. All commits must pass quality checks 4. Acceptance criteria must be verifiable 5. Dependencies must be ordered correctly --- ## Appendix: File Structure Reference ``` project/ ├── scripts/ralph/ │ ├── ralph.sh # Main loop script │ ├── prompt.md # Amp instructions │ ├── CLAUDE.md # Claude Code instructions │ ├── prd.json # Active task list │ ├── progress.txt # Append-only learnings │ └── archive/ # Previous run archives │ └── YYYY-MM-DD-feature-name/ │ ├── prd.json │ └── progress.txt ├── skills/ │ ├── prd/ │ │ └── SKILL.md # PRD generation skill │ └── ralph/ │ └── SKILL.md # PRD-to-JSON conversion skill └── AGENTS.md # Codebase-wide patterns ``` --- ## References - [Ralph GitHub Repository](https://github.com/snarktank/ralph) - [Geoffrey Huntley's Ralph Article](https://ghuntley.com/ralph/) - [Amp Documentation](https://ampcode.com/manual) - [Claude Code Documentation](https://docs.anthropic.com/en/docs/claude-code)