summaryrefslogtreecommitdiff
path: root/ralph-analysis.md
blob: 89df62c2bf972bf430188285c206145111750855 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
# Ralph Analysis: Autonomous AI Agent Loop System

## Executive Summary

**Ralph** is an autonomous AI agent loop system designed to run AI coding tools (Amp or Claude Code) repeatedly until all Product Requirements Document (PRD) items are complete. Based on [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/), it represents a paradigm for autonomous software development where each iteration spawns a fresh AI instance with clean context, relying on git history, a progress log, and a structured PRD JSON file for persistence between runs.

The core philosophy is simple yet powerful: break work into small, independently completable stories, run AI agents in a loop, and let structured persistence mechanisms carry context forward. This approach solves the fundamental problem of AI context limits by treating each iteration as a stateless worker that reads from and writes to well-defined artifacts.

---

## Architecture Overview

### High-Level Flow

```
┌──────────────────────────────────────────────────────────────────┐
│                         SETUP PHASE                              │
├──────────────────────────────────────────────────────────────────┤
│  1. User writes a PRD (markdown)                                 │
│  2. Convert PRD to prd.json (structured user stories)            │
│  3. Run ralph.sh (starts autonomous loop)                        │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│                         EXECUTION LOOP                           │
├──────────────────────────────────────────────────────────────────┤
│  4. AI picks highest priority story where passes: false          │
│  5. Implements the story (writes code, runs tests)               │
│  6. Commits changes (if tests pass)                              │
│  7. Updates prd.json (sets passes: true)                         │
│  8. Logs learnings to progress.txt                               │
│  9. Updates AGENTS.md/CLAUDE.md with reusable patterns           │
│ 10. Check: More stories? → Loop back to step 4                   │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│                         COMPLETION                               │
├──────────────────────────────────────────────────────────────────┤
│  Output: <promise>COMPLETE</promise> and exit                    │
└──────────────────────────────────────────────────────────────────┘
```

### Core Components

| Component | Purpose | Persistence |
|-----------|---------|-------------|
| `ralph.sh` | Bash loop that spawns fresh AI instances | N/A (orchestrator) |
| `prd.json` | Task list with status tracking | Git-tracked JSON |
| `progress.txt` | Append-only learnings log | Git-tracked text |
| `AGENTS.md` / `CLAUDE.md` | Reusable patterns for future iterations | Git-tracked markdown |
| `prompt.md` | Instructions template for Amp | Static config |
| Skills (`prd`, `ralph`) | PRD generation and conversion helpers | Static config |

---

## Key Features

### 1. **Stateless Iteration Model**

Each iteration spawns a completely fresh AI instance with no memory of previous work. Context is rebuilt from:
- Git history (what was committed)
- `progress.txt` (learnings and context)
- `prd.json` (which stories are done)

**Key insight**: This sidesteps the AI context window limit by treating each run as independent, with structured artifacts serving as the "memory."

### 2. **Structured Task Management (prd.json)**

```json
{
  "project": "MyApp",
  "branchName": "ralph/task-priority",
  "description": "Task Priority System - Add priority levels to tasks",
  "userStories": [
    {
      "id": "US-001",
      "title": "Add priority field to database",
      "description": "As a developer, I need to store task priority...",
      "acceptanceCriteria": [
        "Add priority column to tasks table",
        "Typecheck passes"
      ],
      "priority": 1,
      "passes": false,
      "notes": ""
    }
  ]
}
```

**Design decisions:**
- Priority-based ordering ensures dependencies are handled correctly
- `passes: false/true` provides clear completion tracking
- Acceptance criteria are verifiable (not vague)
- Stories are sized to fit within one context window

### 3. **Progressive Learning System**

The dual-file learning system distinguishes between:

**`progress.txt`** - Append-only chronological log:
```
## [Date/Time] - [Story ID]
- What was implemented
- Files changed
- **Learnings for future iterations:**
  - Patterns discovered
  - Gotchas encountered
  - Useful context
---
```

**`AGENTS.md` / `CLAUDE.md`** - Consolidated reusable patterns:
```
## Codebase Patterns
- Use `sql<number>` template for aggregations
- Always use `IF NOT EXISTS` for migrations
- Export types from actions.ts for UI components
```

**Key insight**: Chronological learnings for debugging, consolidated patterns for quick reference.

### 4. **Branch-Based Run Isolation**

- Each feature uses a dedicated branch (`ralph/feature-name`)
- When starting a new feature, previous runs are archived to `archive/YYYY-MM-DD-feature-name/`
- Clean separation between features prevents context pollution

### 5. **Quality Feedback Loops**

Ralph requires feedback loops to function:
- Typecheck catches type errors
- Tests verify behavior
- CI must stay green (broken code compounds)

Stories must include verifiable acceptance criteria like "Typecheck passes" and "Tests pass."

### 6. **Browser Verification for UI Stories**

Frontend stories include "Verify in browser using dev-browser skill" as acceptance criteria. This ensures visual verification of UI changes, not just code compilation.

### 7. **Stop Condition Protocol**

The loop terminates when all stories have `passes: true`. The AI outputs:
```
<promise>COMPLETE</promise>
```

This magic string is grep'd by `ralph.sh` to detect completion.

### 8. **Multi-Tool Support**

Ralph supports both Amp and Claude Code:
```bash
./ralph.sh --tool amp [max_iterations]   # Default
./ralph.sh --tool claude [max_iterations]
```

Each tool has its own prompt template (`prompt.md` for Amp, `CLAUDE.md` for Claude Code).

### 9. **Skills System for PRD Workflow**

Two skills automate PRD creation:

**`prd` skill**: Generates structured PRDs with clarifying questions
- Asks 3-5 essential questions with lettered options (for quick "1A, 2C, 3B" responses)
- Creates markdown PRD with user stories, functional requirements, non-goals

**`ralph` skill**: Converts markdown PRDs to JSON
- Enforces story sizing (completable in one iteration)
- Orders by dependencies (schema → backend → UI)
- Adds standard criteria ("Typecheck passes", "Verify in browser")

---

## Notable Patterns and Design Decisions

### 1. **Single Story Per Iteration**

**Design**: Each AI run handles exactly ONE user story, never more.

**Rationale**:
- Ensures complete focus on a single task
- Prevents context exhaustion mid-feature
- Creates clean commit boundaries
- Simplifies failure recovery (retry a single story, not multiple)

### 2. **Append-Only Progress Log**

**Design**: `progress.txt` is append-only, never overwritten.

**Rationale**:
- Preserves full history for debugging
- Enables pattern discovery over time
- Prevents accidental loss of learnings
- Supports consolidation into AGENTS.md when patterns emerge

### 3. **Story Sizing Rules**

**Design**: Stories must be small enough for one context window.

**Right-sized examples:**
- Add a database column and migration
- Add a UI component to an existing page
- Update a server action with new logic
- Add a filter dropdown to a list

**Too big (must split):**
- "Build the entire dashboard"
- "Add authentication"
- "Refactor the API"

**Rule of thumb**: If you can't describe the change in 2-3 sentences, it's too big.

### 4. **Dependency-Ordered Execution**

**Design**: Stories execute in priority order, earlier stories can't depend on later ones.

**Correct order:**
1. Schema/database changes (migrations)
2. Server actions / backend logic
3. UI components that use the backend
4. Dashboard/summary views that aggregate data

### 5. **Commit Discipline**

**Design**: Only commit when tests pass, with structured messages.

```
feat: [Story ID] - [Story Title]
```

**Rationale**: Clean git history provides context recovery for future iterations.

### 6. **Verifiable Acceptance Criteria**

**Design**: Every criterion must be testable, never vague.

**Good**: "Button shows confirmation dialog before deleting"
**Bad**: "Works correctly", "Good UX", "Handles edge cases"

### 7. **Archiving Previous Runs**

**Design**: When `branchName` changes, archive previous `prd.json` and `progress.txt` to `archive/YYYY-MM-DD-feature-name/`.

**Rationale**: Clean separation between features, preserves history for reference.

---

## Context Management Strategy

Ralph's context management is its most innovative aspect:

### Between Runs (Persistence)

| Mechanism | What It Carries | Format |
|-----------|-----------------|--------|
| Git commits | Code changes, file structure | Versioned files |
| `prd.json` | Task completion status | Structured JSON |
| `progress.txt` | Learnings, gotchas, patterns | Structured text |
| `AGENTS.md` | Consolidated reusable patterns | Markdown |

### Within a Run (Instructions)

The AI receives:
1. Instructions from `prompt.md` or `CLAUDE.md`
2. The `prd.json` file content
3. The `progress.txt` file (especially Codebase Patterns section)
4. Access to read any file via AI tool capabilities

### Context Recovery Pattern

Each iteration:
1. Reads `progress.txt` Codebase Patterns section first (quick reference)
2. Reads `prd.json` to find next incomplete story
3. Checks git branch matches expected branch
4. Implements story
5. Appends learnings to `progress.txt`
6. Optionally consolidates patterns to AGENTS.md

---

## Agent Orchestration Model

### Single-Agent Loop (Not Multi-Agent)

Ralph is NOT a multi-agent system. It's a single-agent loop where:
- One AI instance runs at a time
- Each instance is independent (no inter-agent communication)
- Coordination happens via file-based state (prd.json, progress.txt)

### Orchestration via Bash Script

`ralph.sh` is a simple bash loop:
```bash
for i in $(seq 1 $MAX_ITERATIONS); do
    OUTPUT=$(cat prompt.md | amp --dangerously-allow-all 2>&1 | tee /dev/stderr) || true

    if echo "$OUTPUT" | grep -q "<promise>COMPLETE</promise>"; then
        echo "Ralph completed all tasks!"
        exit 0
    fi
done
```

**Key points:**
- Uses `--dangerously-allow-all` (Amp) or `--dangerously-skip-permissions` (Claude) for autonomous operation
- Outputs are piped through `tee` for visibility
- Completion detected via grep for magic string
- 2-second sleep between iterations

---

## Error Handling and Recovery

### Implicit Error Handling

Ralph has minimal explicit error handling. Instead:
- If tests fail, the story isn't committed
- If the AI can't complete a story, it logs learnings and the next iteration retries
- If max iterations are reached, the script exits with an error
- Human intervention is expected for complex failures

### Recovery via Progress Log

Failed attempts are documented in `progress.txt`:
```
## [Date/Time] - [Story ID]
- Attempted to implement X
- Failed because Y
- **Learnings:**
  - Don't do Z
  - Instead try W
---
```

The next iteration reads these learnings and avoids the same mistakes.

---

## Configuration and Customization

### Per-Project Customization

After copying the prompt template to your project:
- Add project-specific quality check commands
- Include codebase conventions
- Add common gotchas for your stack

### Amp Auto-Handoff Configuration

For large stories that approach context limits:
```json
{
  "amp.experimental.autoHandoff": { "context": 90 }
}
```

This enables automatic handoff when context fills up.

### Iteration Limits

```bash
./ralph.sh [max_iterations]  # Default: 10
```

---

## Comparison to Typical Orchestration Approaches

| Aspect | Ralph | Typical Orchestration |
|--------|-------|----------------------|
| **Memory** | File-based (git, JSON, text) | In-memory state, databases |
| **Coordination** | Sequential loop | Often parallel/concurrent |
| **Agent Communication** | Via files | Direct messaging, queues |
| **Complexity** | Simple bash script (~100 LOC) | Often complex frameworks |
| **Failure Recovery** | Retry from last good state | Explicit retry logic, checkpoints |
| **Context Management** | Fresh context per iteration | Persistent context, context windows |
| **Task Decomposition** | Pre-planned user stories | Often dynamic planning |
| **Human Oversight** | Minimal during run | Often requires approval gates |

### Key Differentiators

1. **Simplicity**: Ralph is a bash script, not a framework
2. **Statelessness**: Each iteration is independent
3. **Git-Native**: Uses git as the primary state management
4. **AI-Tool Agnostic**: Works with both Amp and Claude Code
5. **Human-Readable Artifacts**: All state is in human-readable files

---

## Implications for Makima

### Features to Consider Adopting

1. **Structured PRD-to-JSON workflow** with skills
2. **Append-only progress logging** for context between runs
3. **Story sizing enforcement** (completable in one context window)
4. **Dependency-ordered task execution**
5. **Branch-based run isolation** with archiving
6. **Consolidated patterns file** (AGENTS.md equivalent)
7. **Magic string completion protocol** (`<promise>COMPLETE</promise>`)
8. **Verifiable acceptance criteria** enforcement
9. **Browser verification** for UI stories

### Optional Features (Flag-Controlled)

1. `--max-iterations` limit
2. `--auto-handoff` for context management
3. `--archive-previous` for run isolation
4. `--require-tests` for quality gates
5. `--single-story-per-run` mode

### Opinionated Features

1. Task decomposition must result in context-window-sized stories
2. Progress logs must be append-only
3. All commits must pass quality checks
4. Acceptance criteria must be verifiable
5. Dependencies must be ordered correctly

---

## Appendix: File Structure Reference

```
project/
├── scripts/ralph/
│   ├── ralph.sh            # Main loop script
│   ├── prompt.md           # Amp instructions
│   ├── CLAUDE.md           # Claude Code instructions
│   ├── prd.json            # Active task list
│   ├── progress.txt        # Append-only learnings
│   └── archive/            # Previous run archives
│       └── YYYY-MM-DD-feature-name/
│           ├── prd.json
│           └── progress.txt
├── skills/
│   ├── prd/
│   │   └── SKILL.md        # PRD generation skill
│   └── ralph/
│       └── SKILL.md        # PRD-to-JSON conversion skill
└── AGENTS.md               # Codebase-wide patterns
```

---

## References

- [Ralph GitHub Repository](https://github.com/snarktank/ralph)
- [Geoffrey Huntley's Ralph Article](https://ghuntley.com/ralph/)
- [Amp Documentation](https://ampcode.com/manual)
- [Claude Code Documentation](https://docs.anthropic.com/en/docs/claude-code)