ralph-features-spec.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773

# Ralph-Inspired Features for Makima

## Overview

This specification outlines features derived from the [ralph](https://github.com/snarktank/ralph) autonomous AI agent loop system that can be implemented in makima to reduce manual steering and improve context management between runs.

---

## Part 1: Opinionated Features (Always Enabled)

These features represent best practices that should be core to makima's behavior.

### 1.1 Structured Progress Logging

**Name:** `progress-log`
**Priority:** HIGH

**Description:**
Implement an append-only progress log file (`progress.txt` or similar) that persists learnings, patterns, and context across task iterations.

**Motivation:**
- Ralph's most powerful feature is its dual-file learning system
- Captures context that survives Claude's context window limits
- Enables pattern discovery over time
- Provides debugging history

**Current State in Makima:**
- `progress_summary` field exists but is per-task, not persistent
- Task events stored in database but not summarized
- No cross-task learning mechanism

**Implementation Approach:**
1. Add `progress.log` file to each task's worktree
2. Append structured entries at task completion:
   ```
   ## [Timestamp] - Task [ID]: [Name]
   - Status: [done/failed]
   - Files changed: [list]
   - **Learnings:**
     - [Pattern discovered]
     - [Gotcha encountered]
   ---
   ```
3. Inject progress.log contents into new task prompts
4. Periodic consolidation into `AGENTS.md` equivalent

**Configuration:**
```toml
[progress_log]
enabled = true  # Always on
max_entries_injected = 20  # Limit for prompt injection
consolidation_threshold = 50  # Trigger consolidation
```

**Integration Points:**
- `daemon/task/manager.rs` → `on_completion()` hook
- `daemon/process/claude.rs` → `inject_system_prompt()`
- New file: `daemon/task/progress_log.rs`

---

### 1.2 Context Recovery Pattern

**Name:** `context-recovery`
**Priority:** HIGH

**Description:**
Standardize how context is rebuilt when tasks resume or restart, ensuring Claude can quickly orient itself.

**Motivation:**
- Ralph's stateless model works because context recovery is systematic
- Each iteration reads from well-defined artifacts
- Reduces confusion and repeated work

**Current State in Makima:**
- `conversation_state` stored for resumption
- `--continue` flag relies on Claude's session state
- No structured "where we left off" pattern

**Implementation Approach:**
1. Create standard context recovery header for task prompts:
   ```
   ## Context Recovery
   - Current branch: [branch name]
   - Git status: [uncommitted changes summary]
   - Last checkpoint: [timestamp, message]
   - Progress log (recent): [last 5 entries]
   - Current phase: [research/specify/plan/execute/review]
   ```
2. Auto-generate on task start/resume
3. Include in system prompt before user plan

**Integration Points:**
- `daemon/task/manager.rs` → `build_context_recovery()`
- `daemon/process/claude.rs` → Prepend to injected prompt
- New file: `daemon/task/context_recovery.rs`

---

### 1.3 Dependency-Ordered Task Execution

**Name:** `dependency-ordering`
**Priority:** MEDIUM

**Description:**
Enforce that tasks execute in dependency order: schema changes → backend → UI.

**Motivation:**
- Ralph explicitly orders stories: database → server → UI → dashboard
- Prevents tasks from failing due to missing dependencies
- Creates clean commit boundaries

**Current State in Makima:**
- Tasks have `priority` field but no dependency inference
- Supervisors manually order task creation
- No validation of execution order

**Implementation Approach:**
1. Add `depends_on: Vec<Uuid>` field to tasks
2. Validate dependencies before marking task as runnable
3. Auto-detect dependency patterns:
   - Migration files → backend code
   - Types/models → consumers
   - APIs → UI components
4. Warn if a task seems out of order based on file patterns

**Configuration:**
```toml
[dependency_ordering]
enabled = true
auto_detect = true
warn_on_violation = true
```

**Integration Points:**
- `db/models.rs` → Task model extension
- `daemon/task/manager.rs` → `can_start_task()` validation
- New file: `daemon/task/dependency_analysis.rs`

---

### 1.4 Verifiable Acceptance Criteria

**Name:** `acceptance-criteria`
**Priority:** MEDIUM

**Description:**
Require that all tasks have verifiable (not vague) acceptance criteria, and automatically validate them.

**Motivation:**
- Ralph requires criteria like "Typecheck passes", "Tests pass"
- Prevents "done" status on incomplete work
- Provides clear success definition

**Current State in Makima:**
- `COMPLETION_GATE` signals readiness
- No structured criteria validation
- Manual interpretation of "ready"

**Implementation Approach:**
1. Parse task plans for acceptance criteria section
2. Identify verifiable vs vague criteria:
   - **Good:** "All tests pass", "No TypeScript errors"
   - **Bad:** "Works correctly", "Good UX"
3. Auto-append standard criteria if missing:
   - "No uncommitted changes remain"
   - "CI/linting passes" (if configured)
4. Validate criteria satisfaction before marking complete

**Configuration:**
```toml
[acceptance_criteria]
enabled = true
require_verifiable = true
auto_append_standard = ["no_uncommitted_changes", "tests_pass"]
```

**Integration Points:**
- `daemon/task/completion_gate.rs` → Extend validation
- `llm/task_output.rs` → Parse criteria from plan
- New file: `daemon/task/criteria_validator.rs`

---

### 1.5 Task Sizing Validation

**Name:** `task-sizing`
**Priority:** MEDIUM

**Description:**
Warn or prevent tasks that are likely too large to complete in one context window.

**Motivation:**
- Ralph's story sizing is crucial: "If you can't describe it in 2-3 sentences, it's too big"
- Large tasks exhaust context, require handoffs
- Smaller tasks = cleaner commits, easier recovery

**Current State in Makima:**
- No task size estimation
- `auto_handoff` exists but reactive
- Manual task breakdown by supervisors

**Implementation Approach:**
1. Estimate task complexity from plan text:
   - Number of files mentioned
   - Scope words ("entire", "all", "refactor")
   - Estimated token count
2. Warn if task exceeds thresholds
3. Suggest breakdown for large tasks

**Thresholds:**
- Files mentioned > 10 → Warning
- Plan length > 500 words → Warning
- Scope words detected → Strong warning

**Configuration:**
```toml
[task_sizing]
enabled = true
max_files_mentioned = 10
max_plan_words = 500
warn_on_scope_words = ["entire", "all", "complete", "refactor"]
```

**Integration Points:**
- `daemon/task/manager.rs` → `validate_task_size()`
- Supervisor prompts → Include sizing guidance
- New file: `daemon/task/sizing_validator.rs`

---

### 1.6 Commit Discipline

**Name:** `commit-discipline`
**Priority:** HIGH

**Description:**
Enforce structured commit messages and only allow commits when quality checks pass.

**Motivation:**
- Ralph: "Only commit when tests pass"
- Clean git history aids context recovery
- Structured messages enable automation

**Current State in Makima:**
- Checkpoints create commits automatically
- No quality gate before commit
- Commit messages not standardized

**Implementation Approach:**
1. Standardize commit message format:
   ```
   feat/fix/chore: [Task ID] - [Summary]

   [Optional body]

   Co-Authored-By: Claude <noreply@anthropic.com>
   ```
2. Run quality checks before checkpoint commit:
   - TypeScript/lint (if configured)
   - Tests (if configured)
3. Reject commit if checks fail, provide feedback

**Configuration:**
```toml
[commit_discipline]
enabled = true
require_tests = false  # Optional
require_lint = false   # Optional
message_format = "conventional"  # conventional, simple
```

**Integration Points:**
- `daemon/worktree/manager.rs` → `create_checkpoint()`
- `daemon/task/manager.rs` → Pre-commit hooks
- New file: `daemon/task/commit_validator.rs`

---

## Part 2: Optional Features (Flag-Controlled)

These features provide advanced control and should be opt-in via CLI flags or configuration.

### 2.1 Maximum Iterations Limit

**Name:** `--max-iterations`
**Priority:** HIGH
**Flag:** `--max-iterations <N>` or `-i <N>`

**Description:**
Limit the number of autonomous loop iterations before stopping.

**Motivation:**
- Ralph uses `max_iterations` (default 10)
- Prevents runaway loops that waste tokens
- Provides predictable behavior

**Current State in Makima:**
- Circuit breaker has `iteration_count` limit (10)
- Not configurable at task/contract level
- No per-run override

**Implementation Approach:**
1. Add `--max-iterations` flag to contract/task creation
2. Store in task metadata
3. Check count in autonomous loop logic
4. Exit cleanly with message when limit reached

**CLI Usage:**
```bash
makima contract create --max-iterations 5 "Feature X"
makima supervisor spawn "Task" "Plan" --max-iterations 3
```

**Configuration:**
```toml
[autonomous_loop]
default_max_iterations = 10
hard_limit = 50  # Absolute maximum
```

**Integration Points:**
- `daemon/task/manager.rs` → Loop control
- `db/models.rs` → Task field
- CLI argument parsing

---

### 2.2 Single-Story-Per-Run Mode

**Name:** `--single-task`
**Priority:** MEDIUM
**Flag:** `--single-task` or `-1`

**Description:**
Execute exactly one task per Claude invocation, then stop (don't auto-continue).

**Motivation:**
- Ralph's model: one story per iteration
- Ensures complete focus
- Creates clean boundaries
- Simplifies failure recovery

**Current State in Makima:**
- Tasks can run multiple iterations
- Supervisor can spawn multiple concurrent tasks
- No single-task mode

**Implementation Approach:**
1. When `--single-task` enabled:
   - Execute one task
   - Parse completion gate
   - Stop regardless of `ready` status
   - Report status and exit
2. User reviews, then manually continues or adjusts

**CLI Usage:**
```bash
makima contract create --single-task "Feature X"
```

**Configuration:**
```toml
[execution]
single_task_mode = false  # Default
```

**Integration Points:**
- `daemon/task/manager.rs` → Execution loop
- `server/handlers/contract_daemon.rs` → Contract options
- CLI flags

---

### 2.3 Archive Previous Runs

**Name:** `--archive-previous`
**Priority:** LOW
**Flag:** `--archive-previous` or `--archive`

**Description:**
When starting a new feature/contract, archive the previous run's artifacts.

**Motivation:**
- Ralph archives to `archive/YYYY-MM-DD-feature-name/`
- Clean separation between features
- Preserves history for reference
- Prevents context pollution

**Current State in Makima:**
- Worktrees are per-task but ephemeral
- No archiving mechanism
- Old task data in database but hard to access

**Implementation Approach:**
1. On contract creation with `--archive`:
   - Find previous contract with same name/goal
   - Copy key artifacts to `archive/` directory:
     - progress.log
     - Final checkpoint
     - Summary document
2. Archive structure:
   ```
   archive/
   └── 2026-01-22-feature-name/
       ├── progress.log
       ├── summary.md
       └── final-diff.patch
   ```

**CLI Usage:**
```bash
makima contract create --archive-previous "Feature X v2"
```

**Integration Points:**
- `daemon/worktree/manager.rs` → Archive logic
- `server/handlers/contracts.rs` → Archive on create
- New file: `daemon/archive/manager.rs`

---

### 2.4 Require Tests Quality Gate

**Name:** `--require-tests`
**Priority:** MEDIUM
**Flag:** `--require-tests` or `--tests`

**Description:**
Block task completion unless tests pass.

**Motivation:**
- Ralph: stories require "Tests pass" in acceptance criteria
- Ensures quality before merge
- Catches regressions early

**Current State in Makima:**
- Completion gate is self-reported by Claude
- No actual test execution
- Circuit breaker is reactive, not proactive

**Implementation Approach:**
1. Detect test framework from project:
   - `package.json` scripts
   - `pytest`, `cargo test`, etc.
2. Run tests before accepting completion
3. Parse test output for pass/fail
4. If failed:
   - Don't mark complete
   - Inject failure info into next prompt
   - Increment failure counter

**CLI Usage:**
```bash
makima contract create --require-tests "Feature X"
```

**Configuration:**
```toml
[quality_gates]
require_tests = false
test_command = "npm test"  # Auto-detected if not set
test_timeout_secs = 300
```

**Integration Points:**
- `daemon/task/completion_gate.rs` → Test validation
- `daemon/process/` → Test runner
- New file: `daemon/quality/test_runner.rs`

---

### 2.5 PRD Mode

**Name:** `--prd-mode`
**Priority:** MEDIUM
**Flag:** `--prd-mode` or `--prd`

**Description:**
Enable ralph-style PRD workflow with structured JSON task tracking.

**Motivation:**
- Ralph's `prd.json` provides clear task breakdown
- Structured format aids automation
- Priority-based execution
- Clear pass/fail tracking

**Current State in Makima:**
- Plans are free-form text
- Task status is in database, not file-based
- No structured PRD format

**Implementation Approach:**
1. When `--prd-mode` enabled:
   - Create `prd.json` in worktree:
     ```json
     {
       "project": "Contract Name",
       "branchName": "makima/feature",
       "description": "Contract goal",
       "userStories": [
         {
           "id": "US-001",
           "title": "Story title",
           "description": "As a...",
           "acceptanceCriteria": ["Criterion 1"],
           "priority": 1,
           "passes": false,
           "notes": ""
         }
       ]
     }
     ```
   - Tasks update `passes` field on completion
   - Supervisor reads PRD to find next incomplete story
2. Sync between database and `prd.json`

**CLI Usage:**
```bash
makima contract create --prd-mode "Feature X"
```

**Configuration:**
```toml
[prd_mode]
enabled = false
auto_generate_from_plan = true
sync_to_database = true
```

**Integration Points:**
- `daemon/task/manager.rs` → PRD sync
- New file: `daemon/prd/manager.rs`
- New file: `daemon/prd/models.rs`

---

### 2.6 Learning Mode

**Name:** `--learn`
**Priority:** LOW
**Flag:** `--learn` or `-l`

**Description:**
Enable cross-task learning that extracts patterns and improves future prompts.

**Motivation:**
- Ralph's AGENTS.md consolidates patterns
- Learning from success improves future runs
- Learning from failure prevents repeating mistakes

**Current State in Makima:**
- No cross-task learning
- Each task starts fresh
- Patterns not extracted or reused

**Implementation Approach:**
1. On task completion, extract:
   - Files commonly modified together
   - Commands that succeeded/failed
   - Error patterns and solutions
2. Store in `learnings.db` (SQLite) per repository
3. Inject relevant learnings into future task prompts:
   ```
   ## Learned Patterns for this codebase:
   - Always run `npm run typecheck` before commit
   - The auth middleware is in src/middleware/auth.ts
   - Database migrations require `npx prisma generate` after
   ```

**CLI Usage:**
```bash
makima contract create --learn "Feature X"
```

**Configuration:**
```toml
[learning]
enabled = false
extract_file_patterns = true
extract_command_patterns = true
max_learnings_injected = 10
```

**Integration Points:**
- `daemon/task/manager.rs` → Learning extraction
- `daemon/process/claude.rs` → Learning injection
- New file: `daemon/learning/extractor.rs`
- New file: `daemon/learning/database.rs`

---

### 2.7 Browser Verification for UI

**Name:** `--browser-verify`
**Priority:** LOW
**Flag:** `--browser-verify` or `--ui`

**Description:**
For UI-related tasks, require browser verification before completion.

**Motivation:**
- Ralph includes "Verify in browser" as acceptance criteria
- Visual verification catches issues that tests miss
- Ensures UI actually works, not just compiles

**Current State in Makima:**
- No browser verification
- UI tasks treated same as backend
- No visual testing integration

**Implementation Approach:**
1. Detect UI-related tasks from file patterns:
   - `*.tsx`, `*.vue`, `*.svelte`
   - `components/`, `pages/`, `views/`
2. When completing, prompt for verification:
   - Launch dev server if needed
   - Open browser to relevant URL
   - Wait for user confirmation or screenshot analysis
3. Alternative: Integrate with Playwright for visual testing

**CLI Usage:**
```bash
makima contract create --browser-verify "Add login page"
```

**Configuration:**
```toml
[browser_verify]
enabled = false
auto_detect_ui_tasks = true
dev_server_command = "npm run dev"
base_url = "http://localhost:3000"
```

**Integration Points:**
- `daemon/task/completion_gate.rs` → Browser check
- New file: `daemon/quality/browser_verify.rs`

---

## Part 3: Implementation Priorities

### Phase 1: Foundation (High Priority)
1. **Structured Progress Logging** - Core to context management
2. **Context Recovery Pattern** - Enables stateless iterations
3. **Commit Discipline** - Ensures quality git history
4. **Maximum Iterations Limit** - Prevents runaway loops

### Phase 2: Quality (Medium Priority)
5. **Verifiable Acceptance Criteria** - Improves completion reliability
6. **Dependency-Ordered Execution** - Prevents out-of-order failures
7. **Task Sizing Validation** - Catches too-large tasks early
8. **Require Tests Quality Gate** - Ensures working code
9. **Single-Story Mode** - Ralph's core pattern

### Phase 3: Advanced (Low Priority)
10. **PRD Mode** - Full ralph-style workflow
11. **Learning Mode** - Cross-task intelligence
12. **Archive Previous** - Run isolation
13. **Browser Verification** - UI quality

---

## Part 4: Configuration Summary

### New Configuration File Sections

```toml
# makima-daemon.toml additions

[progress_log]
enabled = true
max_entries_injected = 20
consolidation_threshold = 50

[context_recovery]
enabled = true
include_git_status = true
include_recent_progress = true

[dependency_ordering]
enabled = true
auto_detect = true
warn_on_violation = true

[acceptance_criteria]
enabled = true
require_verifiable = true
auto_append_standard = ["no_uncommitted_changes"]

[task_sizing]
enabled = true
max_files_mentioned = 10
max_plan_words = 500
warn_on_scope_words = ["entire", "all", "complete", "refactor"]

[commit_discipline]
enabled = true
require_tests = false
require_lint = false
message_format = "conventional"

[autonomous_loop]
default_max_iterations = 10
hard_limit = 50

[execution]
single_task_mode = false

[quality_gates]
require_tests = false
test_command = ""  # Auto-detected
test_timeout_secs = 300

[prd_mode]
enabled = false
auto_generate_from_plan = true
sync_to_database = true

[learning]
enabled = false
extract_file_patterns = true
extract_command_patterns = true
max_learnings_injected = 10

[browser_verify]
enabled = false
auto_detect_ui_tasks = true
dev_server_command = "npm run dev"
base_url = "http://localhost:3000"
```

### CLI Flag Summary

| Flag | Short | Feature | Default |
|------|-------|---------|---------|
| `--max-iterations` | `-i` | Iteration limit | 10 |
| `--single-task` | `-1` | One task per run | false |
| `--archive-previous` | `--archive` | Archive old runs | false |
| `--require-tests` | `--tests` | Test quality gate | false |
| `--prd-mode` | `--prd` | PRD-style workflow | false |
| `--learn` | `-l` | Cross-task learning | false |
| `--browser-verify` | `--ui` | UI verification | false |

---

## Part 5: Migration Path

### For Existing Contracts

1. Progress logging starts fresh (no historical data)
2. Context recovery applies to new tasks only
3. Existing tasks not affected by new validation
4. Can opt-in to optional features per-contract

### Backward Compatibility

- All opinionated features have graceful defaults
- Optional features are off by default
- No breaking changes to existing CLI/API
- Configuration is additive

---

## Conclusion

These features address the core challenges mentioned in the contract goal:
- **Manual steering** → Progress logging, context recovery, learning mode
- **Context between runs** → Structured persistence, progress.txt pattern
- **Handholding** → Verifiable criteria, commit discipline, quality gates

The opinionated features make makima more reliable out of the box, while optional features provide ralph-style workflows for users who want them.