summaryrefslogtreecommitdiff
path: root/ralph-analysis.md
diff options
context:
space:
mode:
authorsoryu <soryu@soryu.co>2026-01-23 02:57:13 +0000
committerGitHub <noreply@github.com>2026-01-23 02:57:13 +0000
commit5595a2fce2e426fd9f1b6224df467a2300f06238 (patch)
tree57b4e20335f6dbab641f1474f34d048960802188 /ralph-analysis.md
parent1ed362424dafec690f919154f5116471951cda9c (diff)
downloadsoryu-5595a2fce2e426fd9f1b6224df467a2300f06238.tar.gz
soryu-5595a2fce2e426fd9f1b6224df467a2300f06238.zip
docs: Add ralph analysis and feature specification (#22)
- ralph-analysis.md: Comprehensive analysis of ralph repository - Stateless AI loop pattern with file-based persistence - prd.json for task tracking, progress.txt for learnings - AGENTS.md for consolidated patterns - makima-architecture.md: Analysis of makima's current architecture - Existing COMPLETION_GATE, circuit breaker, autonomous loop - Extension points for ralph-inspired features - ralph-features-spec.md: Detailed feature specification - 6 opinionated features (always enabled) - 7 optional features (flag-controlled) - Implementation priorities and migration path Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Diffstat (limited to 'ralph-analysis.md')
-rw-r--r--ralph-analysis.md455
1 files changed, 455 insertions, 0 deletions
diff --git a/ralph-analysis.md b/ralph-analysis.md
new file mode 100644
index 0000000..89df62c
--- /dev/null
+++ b/ralph-analysis.md
@@ -0,0 +1,455 @@
+# Ralph Analysis: Autonomous AI Agent Loop System
+
+## Executive Summary
+
+**Ralph** is an autonomous AI agent loop system designed to run AI coding tools (Amp or Claude Code) repeatedly until all Product Requirements Document (PRD) items are complete. Based on [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/), it represents a paradigm for autonomous software development where each iteration spawns a fresh AI instance with clean context, relying on git history, a progress log, and a structured PRD JSON file for persistence between runs.
+
+The core philosophy is simple yet powerful: break work into small, independently completable stories, run AI agents in a loop, and let structured persistence mechanisms carry context forward. This approach solves the fundamental problem of AI context limits by treating each iteration as a stateless worker that reads from and writes to well-defined artifacts.
+
+---
+
+## Architecture Overview
+
+### High-Level Flow
+
+```
+┌──────────────────────────────────────────────────────────────────┐
+│ SETUP PHASE │
+├──────────────────────────────────────────────────────────────────┤
+│ 1. User writes a PRD (markdown) │
+│ 2. Convert PRD to prd.json (structured user stories) │
+│ 3. Run ralph.sh (starts autonomous loop) │
+└──────────────────────────────────────────────────────────────────┘
+ │
+ ▼
+┌──────────────────────────────────────────────────────────────────┐
+│ EXECUTION LOOP │
+├──────────────────────────────────────────────────────────────────┤
+│ 4. AI picks highest priority story where passes: false │
+│ 5. Implements the story (writes code, runs tests) │
+│ 6. Commits changes (if tests pass) │
+│ 7. Updates prd.json (sets passes: true) │
+│ 8. Logs learnings to progress.txt │
+│ 9. Updates AGENTS.md/CLAUDE.md with reusable patterns │
+│ 10. Check: More stories? → Loop back to step 4 │
+└──────────────────────────────────────────────────────────────────┘
+ │
+ ▼
+┌──────────────────────────────────────────────────────────────────┐
+│ COMPLETION │
+├──────────────────────────────────────────────────────────────────┤
+│ Output: <promise>COMPLETE</promise> and exit │
+└──────────────────────────────────────────────────────────────────┘
+```
+
+### Core Components
+
+| Component | Purpose | Persistence |
+|-----------|---------|-------------|
+| `ralph.sh` | Bash loop that spawns fresh AI instances | N/A (orchestrator) |
+| `prd.json` | Task list with status tracking | Git-tracked JSON |
+| `progress.txt` | Append-only learnings log | Git-tracked text |
+| `AGENTS.md` / `CLAUDE.md` | Reusable patterns for future iterations | Git-tracked markdown |
+| `prompt.md` | Instructions template for Amp | Static config |
+| Skills (`prd`, `ralph`) | PRD generation and conversion helpers | Static config |
+
+---
+
+## Key Features
+
+### 1. **Stateless Iteration Model**
+
+Each iteration spawns a completely fresh AI instance with no memory of previous work. Context is rebuilt from:
+- Git history (what was committed)
+- `progress.txt` (learnings and context)
+- `prd.json` (which stories are done)
+
+**Key insight**: This sidesteps the AI context window limit by treating each run as independent, with structured artifacts serving as the "memory."
+
+### 2. **Structured Task Management (prd.json)**
+
+```json
+{
+ "project": "MyApp",
+ "branchName": "ralph/task-priority",
+ "description": "Task Priority System - Add priority levels to tasks",
+ "userStories": [
+ {
+ "id": "US-001",
+ "title": "Add priority field to database",
+ "description": "As a developer, I need to store task priority...",
+ "acceptanceCriteria": [
+ "Add priority column to tasks table",
+ "Typecheck passes"
+ ],
+ "priority": 1,
+ "passes": false,
+ "notes": ""
+ }
+ ]
+}
+```
+
+**Design decisions:**
+- Priority-based ordering ensures dependencies are handled correctly
+- `passes: false/true` provides clear completion tracking
+- Acceptance criteria are verifiable (not vague)
+- Stories are sized to fit within one context window
+
+### 3. **Progressive Learning System**
+
+The dual-file learning system distinguishes between:
+
+**`progress.txt`** - Append-only chronological log:
+```
+## [Date/Time] - [Story ID]
+- What was implemented
+- Files changed
+- **Learnings for future iterations:**
+ - Patterns discovered
+ - Gotchas encountered
+ - Useful context
+---
+```
+
+**`AGENTS.md` / `CLAUDE.md`** - Consolidated reusable patterns:
+```
+## Codebase Patterns
+- Use `sql<number>` template for aggregations
+- Always use `IF NOT EXISTS` for migrations
+- Export types from actions.ts for UI components
+```
+
+**Key insight**: Chronological learnings for debugging, consolidated patterns for quick reference.
+
+### 4. **Branch-Based Run Isolation**
+
+- Each feature uses a dedicated branch (`ralph/feature-name`)
+- When starting a new feature, previous runs are archived to `archive/YYYY-MM-DD-feature-name/`
+- Clean separation between features prevents context pollution
+
+### 5. **Quality Feedback Loops**
+
+Ralph requires feedback loops to function:
+- Typecheck catches type errors
+- Tests verify behavior
+- CI must stay green (broken code compounds)
+
+Stories must include verifiable acceptance criteria like "Typecheck passes" and "Tests pass."
+
+### 6. **Browser Verification for UI Stories**
+
+Frontend stories include "Verify in browser using dev-browser skill" as acceptance criteria. This ensures visual verification of UI changes, not just code compilation.
+
+### 7. **Stop Condition Protocol**
+
+The loop terminates when all stories have `passes: true`. The AI outputs:
+```
+<promise>COMPLETE</promise>
+```
+
+This magic string is grep'd by `ralph.sh` to detect completion.
+
+### 8. **Multi-Tool Support**
+
+Ralph supports both Amp and Claude Code:
+```bash
+./ralph.sh --tool amp [max_iterations] # Default
+./ralph.sh --tool claude [max_iterations]
+```
+
+Each tool has its own prompt template (`prompt.md` for Amp, `CLAUDE.md` for Claude Code).
+
+### 9. **Skills System for PRD Workflow**
+
+Two skills automate PRD creation:
+
+**`prd` skill**: Generates structured PRDs with clarifying questions
+- Asks 3-5 essential questions with lettered options (for quick "1A, 2C, 3B" responses)
+- Creates markdown PRD with user stories, functional requirements, non-goals
+
+**`ralph` skill**: Converts markdown PRDs to JSON
+- Enforces story sizing (completable in one iteration)
+- Orders by dependencies (schema → backend → UI)
+- Adds standard criteria ("Typecheck passes", "Verify in browser")
+
+---
+
+## Notable Patterns and Design Decisions
+
+### 1. **Single Story Per Iteration**
+
+**Design**: Each AI run handles exactly ONE user story, never more.
+
+**Rationale**:
+- Ensures complete focus on a single task
+- Prevents context exhaustion mid-feature
+- Creates clean commit boundaries
+- Simplifies failure recovery (retry a single story, not multiple)
+
+### 2. **Append-Only Progress Log**
+
+**Design**: `progress.txt` is append-only, never overwritten.
+
+**Rationale**:
+- Preserves full history for debugging
+- Enables pattern discovery over time
+- Prevents accidental loss of learnings
+- Supports consolidation into AGENTS.md when patterns emerge
+
+### 3. **Story Sizing Rules**
+
+**Design**: Stories must be small enough for one context window.
+
+**Right-sized examples:**
+- Add a database column and migration
+- Add a UI component to an existing page
+- Update a server action with new logic
+- Add a filter dropdown to a list
+
+**Too big (must split):**
+- "Build the entire dashboard"
+- "Add authentication"
+- "Refactor the API"
+
+**Rule of thumb**: If you can't describe the change in 2-3 sentences, it's too big.
+
+### 4. **Dependency-Ordered Execution**
+
+**Design**: Stories execute in priority order, earlier stories can't depend on later ones.
+
+**Correct order:**
+1. Schema/database changes (migrations)
+2. Server actions / backend logic
+3. UI components that use the backend
+4. Dashboard/summary views that aggregate data
+
+### 5. **Commit Discipline**
+
+**Design**: Only commit when tests pass, with structured messages.
+
+```
+feat: [Story ID] - [Story Title]
+```
+
+**Rationale**: Clean git history provides context recovery for future iterations.
+
+### 6. **Verifiable Acceptance Criteria**
+
+**Design**: Every criterion must be testable, never vague.
+
+**Good**: "Button shows confirmation dialog before deleting"
+**Bad**: "Works correctly", "Good UX", "Handles edge cases"
+
+### 7. **Archiving Previous Runs**
+
+**Design**: When `branchName` changes, archive previous `prd.json` and `progress.txt` to `archive/YYYY-MM-DD-feature-name/`.
+
+**Rationale**: Clean separation between features, preserves history for reference.
+
+---
+
+## Context Management Strategy
+
+Ralph's context management is its most innovative aspect:
+
+### Between Runs (Persistence)
+
+| Mechanism | What It Carries | Format |
+|-----------|-----------------|--------|
+| Git commits | Code changes, file structure | Versioned files |
+| `prd.json` | Task completion status | Structured JSON |
+| `progress.txt` | Learnings, gotchas, patterns | Structured text |
+| `AGENTS.md` | Consolidated reusable patterns | Markdown |
+
+### Within a Run (Instructions)
+
+The AI receives:
+1. Instructions from `prompt.md` or `CLAUDE.md`
+2. The `prd.json` file content
+3. The `progress.txt` file (especially Codebase Patterns section)
+4. Access to read any file via AI tool capabilities
+
+### Context Recovery Pattern
+
+Each iteration:
+1. Reads `progress.txt` Codebase Patterns section first (quick reference)
+2. Reads `prd.json` to find next incomplete story
+3. Checks git branch matches expected branch
+4. Implements story
+5. Appends learnings to `progress.txt`
+6. Optionally consolidates patterns to AGENTS.md
+
+---
+
+## Agent Orchestration Model
+
+### Single-Agent Loop (Not Multi-Agent)
+
+Ralph is NOT a multi-agent system. It's a single-agent loop where:
+- One AI instance runs at a time
+- Each instance is independent (no inter-agent communication)
+- Coordination happens via file-based state (prd.json, progress.txt)
+
+### Orchestration via Bash Script
+
+`ralph.sh` is a simple bash loop:
+```bash
+for i in $(seq 1 $MAX_ITERATIONS); do
+ OUTPUT=$(cat prompt.md | amp --dangerously-allow-all 2>&1 | tee /dev/stderr) || true
+
+ if echo "$OUTPUT" | grep -q "<promise>COMPLETE</promise>"; then
+ echo "Ralph completed all tasks!"
+ exit 0
+ fi
+done
+```
+
+**Key points:**
+- Uses `--dangerously-allow-all` (Amp) or `--dangerously-skip-permissions` (Claude) for autonomous operation
+- Outputs are piped through `tee` for visibility
+- Completion detected via grep for magic string
+- 2-second sleep between iterations
+
+---
+
+## Error Handling and Recovery
+
+### Implicit Error Handling
+
+Ralph has minimal explicit error handling. Instead:
+- If tests fail, the story isn't committed
+- If the AI can't complete a story, it logs learnings and the next iteration retries
+- If max iterations are reached, the script exits with an error
+- Human intervention is expected for complex failures
+
+### Recovery via Progress Log
+
+Failed attempts are documented in `progress.txt`:
+```
+## [Date/Time] - [Story ID]
+- Attempted to implement X
+- Failed because Y
+- **Learnings:**
+ - Don't do Z
+ - Instead try W
+---
+```
+
+The next iteration reads these learnings and avoids the same mistakes.
+
+---
+
+## Configuration and Customization
+
+### Per-Project Customization
+
+After copying the prompt template to your project:
+- Add project-specific quality check commands
+- Include codebase conventions
+- Add common gotchas for your stack
+
+### Amp Auto-Handoff Configuration
+
+For large stories that approach context limits:
+```json
+{
+ "amp.experimental.autoHandoff": { "context": 90 }
+}
+```
+
+This enables automatic handoff when context fills up.
+
+### Iteration Limits
+
+```bash
+./ralph.sh [max_iterations] # Default: 10
+```
+
+---
+
+## Comparison to Typical Orchestration Approaches
+
+| Aspect | Ralph | Typical Orchestration |
+|--------|-------|----------------------|
+| **Memory** | File-based (git, JSON, text) | In-memory state, databases |
+| **Coordination** | Sequential loop | Often parallel/concurrent |
+| **Agent Communication** | Via files | Direct messaging, queues |
+| **Complexity** | Simple bash script (~100 LOC) | Often complex frameworks |
+| **Failure Recovery** | Retry from last good state | Explicit retry logic, checkpoints |
+| **Context Management** | Fresh context per iteration | Persistent context, context windows |
+| **Task Decomposition** | Pre-planned user stories | Often dynamic planning |
+| **Human Oversight** | Minimal during run | Often requires approval gates |
+
+### Key Differentiators
+
+1. **Simplicity**: Ralph is a bash script, not a framework
+2. **Statelessness**: Each iteration is independent
+3. **Git-Native**: Uses git as the primary state management
+4. **AI-Tool Agnostic**: Works with both Amp and Claude Code
+5. **Human-Readable Artifacts**: All state is in human-readable files
+
+---
+
+## Implications for Makima
+
+### Features to Consider Adopting
+
+1. **Structured PRD-to-JSON workflow** with skills
+2. **Append-only progress logging** for context between runs
+3. **Story sizing enforcement** (completable in one context window)
+4. **Dependency-ordered task execution**
+5. **Branch-based run isolation** with archiving
+6. **Consolidated patterns file** (AGENTS.md equivalent)
+7. **Magic string completion protocol** (`<promise>COMPLETE</promise>`)
+8. **Verifiable acceptance criteria** enforcement
+9. **Browser verification** for UI stories
+
+### Optional Features (Flag-Controlled)
+
+1. `--max-iterations` limit
+2. `--auto-handoff` for context management
+3. `--archive-previous` for run isolation
+4. `--require-tests` for quality gates
+5. `--single-story-per-run` mode
+
+### Opinionated Features
+
+1. Task decomposition must result in context-window-sized stories
+2. Progress logs must be append-only
+3. All commits must pass quality checks
+4. Acceptance criteria must be verifiable
+5. Dependencies must be ordered correctly
+
+---
+
+## Appendix: File Structure Reference
+
+```
+project/
+├── scripts/ralph/
+│ ├── ralph.sh # Main loop script
+│ ├── prompt.md # Amp instructions
+│ ├── CLAUDE.md # Claude Code instructions
+│ ├── prd.json # Active task list
+│ ├── progress.txt # Append-only learnings
+│ └── archive/ # Previous run archives
+│ └── YYYY-MM-DD-feature-name/
+│ ├── prd.json
+│ └── progress.txt
+├── skills/
+│ ├── prd/
+│ │ └── SKILL.md # PRD generation skill
+│ └── ralph/
+│ └── SKILL.md # PRD-to-JSON conversion skill
+└── AGENTS.md # Codebase-wide patterns
+```
+
+---
+
+## References
+
+- [Ralph GitHub Repository](https://github.com/snarktank/ralph)
+- [Geoffrey Huntley's Ralph Article](https://ghuntley.com/ralph/)
+- [Amp Documentation](https://ampcode.com/manual)
+- [Claude Code Documentation](https://docs.anthropic.com/en/docs/claude-code)