diff options
| author | soryu <soryu@soryu.co> | 2026-01-23 02:57:13 +0000 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2026-01-23 02:57:13 +0000 |
| commit | 5595a2fce2e426fd9f1b6224df467a2300f06238 (patch) | |
| tree | 57b4e20335f6dbab641f1474f34d048960802188 /ralph-analysis.md | |
| parent | 1ed362424dafec690f919154f5116471951cda9c (diff) | |
| download | soryu-5595a2fce2e426fd9f1b6224df467a2300f06238.tar.gz soryu-5595a2fce2e426fd9f1b6224df467a2300f06238.zip | |
docs: Add ralph analysis and feature specification (#22)
- ralph-analysis.md: Comprehensive analysis of ralph repository
- Stateless AI loop pattern with file-based persistence
- prd.json for task tracking, progress.txt for learnings
- AGENTS.md for consolidated patterns
- makima-architecture.md: Analysis of makima's current architecture
- Existing COMPLETION_GATE, circuit breaker, autonomous loop
- Extension points for ralph-inspired features
- ralph-features-spec.md: Detailed feature specification
- 6 opinionated features (always enabled)
- 7 optional features (flag-controlled)
- Implementation priorities and migration path
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Diffstat (limited to 'ralph-analysis.md')
| -rw-r--r-- | ralph-analysis.md | 455 |
1 files changed, 455 insertions, 0 deletions
diff --git a/ralph-analysis.md b/ralph-analysis.md new file mode 100644 index 0000000..89df62c --- /dev/null +++ b/ralph-analysis.md @@ -0,0 +1,455 @@ +# Ralph Analysis: Autonomous AI Agent Loop System + +## Executive Summary + +**Ralph** is an autonomous AI agent loop system designed to run AI coding tools (Amp or Claude Code) repeatedly until all Product Requirements Document (PRD) items are complete. Based on [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/), it represents a paradigm for autonomous software development where each iteration spawns a fresh AI instance with clean context, relying on git history, a progress log, and a structured PRD JSON file for persistence between runs. + +The core philosophy is simple yet powerful: break work into small, independently completable stories, run AI agents in a loop, and let structured persistence mechanisms carry context forward. This approach solves the fundamental problem of AI context limits by treating each iteration as a stateless worker that reads from and writes to well-defined artifacts. + +--- + +## Architecture Overview + +### High-Level Flow + +``` +┌──────────────────────────────────────────────────────────────────┐ +│ SETUP PHASE │ +├──────────────────────────────────────────────────────────────────┤ +│ 1. User writes a PRD (markdown) │ +│ 2. Convert PRD to prd.json (structured user stories) │ +│ 3. Run ralph.sh (starts autonomous loop) │ +└──────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌──────────────────────────────────────────────────────────────────┐ +│ EXECUTION LOOP │ +├──────────────────────────────────────────────────────────────────┤ +│ 4. AI picks highest priority story where passes: false │ +│ 5. Implements the story (writes code, runs tests) │ +│ 6. Commits changes (if tests pass) │ +│ 7. Updates prd.json (sets passes: true) │ +│ 8. Logs learnings to progress.txt │ +│ 9. Updates AGENTS.md/CLAUDE.md with reusable patterns │ +│ 10. Check: More stories? → Loop back to step 4 │ +└──────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌──────────────────────────────────────────────────────────────────┐ +│ COMPLETION │ +├──────────────────────────────────────────────────────────────────┤ +│ Output: <promise>COMPLETE</promise> and exit │ +└──────────────────────────────────────────────────────────────────┘ +``` + +### Core Components + +| Component | Purpose | Persistence | +|-----------|---------|-------------| +| `ralph.sh` | Bash loop that spawns fresh AI instances | N/A (orchestrator) | +| `prd.json` | Task list with status tracking | Git-tracked JSON | +| `progress.txt` | Append-only learnings log | Git-tracked text | +| `AGENTS.md` / `CLAUDE.md` | Reusable patterns for future iterations | Git-tracked markdown | +| `prompt.md` | Instructions template for Amp | Static config | +| Skills (`prd`, `ralph`) | PRD generation and conversion helpers | Static config | + +--- + +## Key Features + +### 1. **Stateless Iteration Model** + +Each iteration spawns a completely fresh AI instance with no memory of previous work. Context is rebuilt from: +- Git history (what was committed) +- `progress.txt` (learnings and context) +- `prd.json` (which stories are done) + +**Key insight**: This sidesteps the AI context window limit by treating each run as independent, with structured artifacts serving as the "memory." + +### 2. **Structured Task Management (prd.json)** + +```json +{ + "project": "MyApp", + "branchName": "ralph/task-priority", + "description": "Task Priority System - Add priority levels to tasks", + "userStories": [ + { + "id": "US-001", + "title": "Add priority field to database", + "description": "As a developer, I need to store task priority...", + "acceptanceCriteria": [ + "Add priority column to tasks table", + "Typecheck passes" + ], + "priority": 1, + "passes": false, + "notes": "" + } + ] +} +``` + +**Design decisions:** +- Priority-based ordering ensures dependencies are handled correctly +- `passes: false/true` provides clear completion tracking +- Acceptance criteria are verifiable (not vague) +- Stories are sized to fit within one context window + +### 3. **Progressive Learning System** + +The dual-file learning system distinguishes between: + +**`progress.txt`** - Append-only chronological log: +``` +## [Date/Time] - [Story ID] +- What was implemented +- Files changed +- **Learnings for future iterations:** + - Patterns discovered + - Gotchas encountered + - Useful context +--- +``` + +**`AGENTS.md` / `CLAUDE.md`** - Consolidated reusable patterns: +``` +## Codebase Patterns +- Use `sql<number>` template for aggregations +- Always use `IF NOT EXISTS` for migrations +- Export types from actions.ts for UI components +``` + +**Key insight**: Chronological learnings for debugging, consolidated patterns for quick reference. + +### 4. **Branch-Based Run Isolation** + +- Each feature uses a dedicated branch (`ralph/feature-name`) +- When starting a new feature, previous runs are archived to `archive/YYYY-MM-DD-feature-name/` +- Clean separation between features prevents context pollution + +### 5. **Quality Feedback Loops** + +Ralph requires feedback loops to function: +- Typecheck catches type errors +- Tests verify behavior +- CI must stay green (broken code compounds) + +Stories must include verifiable acceptance criteria like "Typecheck passes" and "Tests pass." + +### 6. **Browser Verification for UI Stories** + +Frontend stories include "Verify in browser using dev-browser skill" as acceptance criteria. This ensures visual verification of UI changes, not just code compilation. + +### 7. **Stop Condition Protocol** + +The loop terminates when all stories have `passes: true`. The AI outputs: +``` +<promise>COMPLETE</promise> +``` + +This magic string is grep'd by `ralph.sh` to detect completion. + +### 8. **Multi-Tool Support** + +Ralph supports both Amp and Claude Code: +```bash +./ralph.sh --tool amp [max_iterations] # Default +./ralph.sh --tool claude [max_iterations] +``` + +Each tool has its own prompt template (`prompt.md` for Amp, `CLAUDE.md` for Claude Code). + +### 9. **Skills System for PRD Workflow** + +Two skills automate PRD creation: + +**`prd` skill**: Generates structured PRDs with clarifying questions +- Asks 3-5 essential questions with lettered options (for quick "1A, 2C, 3B" responses) +- Creates markdown PRD with user stories, functional requirements, non-goals + +**`ralph` skill**: Converts markdown PRDs to JSON +- Enforces story sizing (completable in one iteration) +- Orders by dependencies (schema → backend → UI) +- Adds standard criteria ("Typecheck passes", "Verify in browser") + +--- + +## Notable Patterns and Design Decisions + +### 1. **Single Story Per Iteration** + +**Design**: Each AI run handles exactly ONE user story, never more. + +**Rationale**: +- Ensures complete focus on a single task +- Prevents context exhaustion mid-feature +- Creates clean commit boundaries +- Simplifies failure recovery (retry a single story, not multiple) + +### 2. **Append-Only Progress Log** + +**Design**: `progress.txt` is append-only, never overwritten. + +**Rationale**: +- Preserves full history for debugging +- Enables pattern discovery over time +- Prevents accidental loss of learnings +- Supports consolidation into AGENTS.md when patterns emerge + +### 3. **Story Sizing Rules** + +**Design**: Stories must be small enough for one context window. + +**Right-sized examples:** +- Add a database column and migration +- Add a UI component to an existing page +- Update a server action with new logic +- Add a filter dropdown to a list + +**Too big (must split):** +- "Build the entire dashboard" +- "Add authentication" +- "Refactor the API" + +**Rule of thumb**: If you can't describe the change in 2-3 sentences, it's too big. + +### 4. **Dependency-Ordered Execution** + +**Design**: Stories execute in priority order, earlier stories can't depend on later ones. + +**Correct order:** +1. Schema/database changes (migrations) +2. Server actions / backend logic +3. UI components that use the backend +4. Dashboard/summary views that aggregate data + +### 5. **Commit Discipline** + +**Design**: Only commit when tests pass, with structured messages. + +``` +feat: [Story ID] - [Story Title] +``` + +**Rationale**: Clean git history provides context recovery for future iterations. + +### 6. **Verifiable Acceptance Criteria** + +**Design**: Every criterion must be testable, never vague. + +**Good**: "Button shows confirmation dialog before deleting" +**Bad**: "Works correctly", "Good UX", "Handles edge cases" + +### 7. **Archiving Previous Runs** + +**Design**: When `branchName` changes, archive previous `prd.json` and `progress.txt` to `archive/YYYY-MM-DD-feature-name/`. + +**Rationale**: Clean separation between features, preserves history for reference. + +--- + +## Context Management Strategy + +Ralph's context management is its most innovative aspect: + +### Between Runs (Persistence) + +| Mechanism | What It Carries | Format | +|-----------|-----------------|--------| +| Git commits | Code changes, file structure | Versioned files | +| `prd.json` | Task completion status | Structured JSON | +| `progress.txt` | Learnings, gotchas, patterns | Structured text | +| `AGENTS.md` | Consolidated reusable patterns | Markdown | + +### Within a Run (Instructions) + +The AI receives: +1. Instructions from `prompt.md` or `CLAUDE.md` +2. The `prd.json` file content +3. The `progress.txt` file (especially Codebase Patterns section) +4. Access to read any file via AI tool capabilities + +### Context Recovery Pattern + +Each iteration: +1. Reads `progress.txt` Codebase Patterns section first (quick reference) +2. Reads `prd.json` to find next incomplete story +3. Checks git branch matches expected branch +4. Implements story +5. Appends learnings to `progress.txt` +6. Optionally consolidates patterns to AGENTS.md + +--- + +## Agent Orchestration Model + +### Single-Agent Loop (Not Multi-Agent) + +Ralph is NOT a multi-agent system. It's a single-agent loop where: +- One AI instance runs at a time +- Each instance is independent (no inter-agent communication) +- Coordination happens via file-based state (prd.json, progress.txt) + +### Orchestration via Bash Script + +`ralph.sh` is a simple bash loop: +```bash +for i in $(seq 1 $MAX_ITERATIONS); do + OUTPUT=$(cat prompt.md | amp --dangerously-allow-all 2>&1 | tee /dev/stderr) || true + + if echo "$OUTPUT" | grep -q "<promise>COMPLETE</promise>"; then + echo "Ralph completed all tasks!" + exit 0 + fi +done +``` + +**Key points:** +- Uses `--dangerously-allow-all` (Amp) or `--dangerously-skip-permissions` (Claude) for autonomous operation +- Outputs are piped through `tee` for visibility +- Completion detected via grep for magic string +- 2-second sleep between iterations + +--- + +## Error Handling and Recovery + +### Implicit Error Handling + +Ralph has minimal explicit error handling. Instead: +- If tests fail, the story isn't committed +- If the AI can't complete a story, it logs learnings and the next iteration retries +- If max iterations are reached, the script exits with an error +- Human intervention is expected for complex failures + +### Recovery via Progress Log + +Failed attempts are documented in `progress.txt`: +``` +## [Date/Time] - [Story ID] +- Attempted to implement X +- Failed because Y +- **Learnings:** + - Don't do Z + - Instead try W +--- +``` + +The next iteration reads these learnings and avoids the same mistakes. + +--- + +## Configuration and Customization + +### Per-Project Customization + +After copying the prompt template to your project: +- Add project-specific quality check commands +- Include codebase conventions +- Add common gotchas for your stack + +### Amp Auto-Handoff Configuration + +For large stories that approach context limits: +```json +{ + "amp.experimental.autoHandoff": { "context": 90 } +} +``` + +This enables automatic handoff when context fills up. + +### Iteration Limits + +```bash +./ralph.sh [max_iterations] # Default: 10 +``` + +--- + +## Comparison to Typical Orchestration Approaches + +| Aspect | Ralph | Typical Orchestration | +|--------|-------|----------------------| +| **Memory** | File-based (git, JSON, text) | In-memory state, databases | +| **Coordination** | Sequential loop | Often parallel/concurrent | +| **Agent Communication** | Via files | Direct messaging, queues | +| **Complexity** | Simple bash script (~100 LOC) | Often complex frameworks | +| **Failure Recovery** | Retry from last good state | Explicit retry logic, checkpoints | +| **Context Management** | Fresh context per iteration | Persistent context, context windows | +| **Task Decomposition** | Pre-planned user stories | Often dynamic planning | +| **Human Oversight** | Minimal during run | Often requires approval gates | + +### Key Differentiators + +1. **Simplicity**: Ralph is a bash script, not a framework +2. **Statelessness**: Each iteration is independent +3. **Git-Native**: Uses git as the primary state management +4. **AI-Tool Agnostic**: Works with both Amp and Claude Code +5. **Human-Readable Artifacts**: All state is in human-readable files + +--- + +## Implications for Makima + +### Features to Consider Adopting + +1. **Structured PRD-to-JSON workflow** with skills +2. **Append-only progress logging** for context between runs +3. **Story sizing enforcement** (completable in one context window) +4. **Dependency-ordered task execution** +5. **Branch-based run isolation** with archiving +6. **Consolidated patterns file** (AGENTS.md equivalent) +7. **Magic string completion protocol** (`<promise>COMPLETE</promise>`) +8. **Verifiable acceptance criteria** enforcement +9. **Browser verification** for UI stories + +### Optional Features (Flag-Controlled) + +1. `--max-iterations` limit +2. `--auto-handoff` for context management +3. `--archive-previous` for run isolation +4. `--require-tests` for quality gates +5. `--single-story-per-run` mode + +### Opinionated Features + +1. Task decomposition must result in context-window-sized stories +2. Progress logs must be append-only +3. All commits must pass quality checks +4. Acceptance criteria must be verifiable +5. Dependencies must be ordered correctly + +--- + +## Appendix: File Structure Reference + +``` +project/ +├── scripts/ralph/ +│ ├── ralph.sh # Main loop script +│ ├── prompt.md # Amp instructions +│ ├── CLAUDE.md # Claude Code instructions +│ ├── prd.json # Active task list +│ ├── progress.txt # Append-only learnings +│ └── archive/ # Previous run archives +│ └── YYYY-MM-DD-feature-name/ +│ ├── prd.json +│ └── progress.txt +├── skills/ +│ ├── prd/ +│ │ └── SKILL.md # PRD generation skill +│ └── ralph/ +│ └── SKILL.md # PRD-to-JSON conversion skill +└── AGENTS.md # Codebase-wide patterns +``` + +--- + +## References + +- [Ralph GitHub Repository](https://github.com/snarktank/ralph) +- [Geoffrey Huntley's Ralph Article](https://ghuntley.com/ralph/) +- [Amp Documentation](https://ampcode.com/manual) +- [Claude Code Documentation](https://docs.anthropic.com/en/docs/claude-code) |
