path: root/ralph-features-spec.md



# Ralph-Inspired Features for Makima

## Overview

This specification outlines features derived from the [ralph](https://github.com/snarktank/ralph) autonomous AI agent loop system that can be implemented in makima to reduce manual steering and improve context management between runs.

---

## Part 1: Opinionated Features (Always Enabled)

These features represent best practices that should be core to makima's behavior.

### 1.1 Structured Progress Logging

**Name:** `progress-log`
**Priority:** HIGH

**Description:**
Implement an append-only progress log file (`progress.txt` or similar) that persists learnings, patterns, and context across task iterations.

**Motivation:**
- Ralph's most powerful feature is its dual-file learning system
- Captures context that survives Claude's context window limits
- Enables pattern discovery over time
- Provides debugging history

**Current State in Makima:**
- `progress_summary` field exists but is per-task, not persistent
- Task events stored in database but not summarized
- No cross-task learning mechanism

**Implementation Approach:**
1. Add `progress.log` file to each task's worktree
2. Append structured entries at task completion:
   ```
   ## [Timestamp] - Task [ID]: [Name]
   - Status: [done/failed]
   - Files changed: [list]
   - **Learnings:**
     - [Pattern discovered]
     - [Gotcha encountered]
   ---
   ```
3. Inject progress.log contents into new task prompts
4. Periodic consolidation into `AGENTS.md` equivalent

**Configuration:**
```toml
[progress_log]
enabled = true  # Always on
max_entries_injected = 20  # Limit for prompt injection
consolidation_threshold = 50  # Trigger consolidation
```

**Integration Points:**
- `daemon/task/manager.rs` → `on_completion()` hook
- `daemon/process/claude.rs` → `inject_system_prompt()`
- New file: `daemon/task/progress_log.rs`

---

### 1.2 Context Recovery Pattern

**Name:** `context-recovery`
**Priority:** HIGH

**Description:**
Standardize how context is rebuilt when tasks resume or restart, ensuring Claude can quickly orient itself.

**Motivation:**
- Ralph's stateless model works because context recovery is systematic
- Each iteration reads from well-defined artifacts
- Reduces confusion and repeated work

**Current State in Makima:**
- `conversation_state` stored for resumption
- `--continue` flag relies on Claude's session state
- No structured "where we left off" pattern

**Implementation Approach:**
1. Create standard context recovery header for task prompts:
   ```
   ## Context Recovery
   - Current branch: [branch name]
   - Git status: [uncommitted changes summary]
   - Last checkpoint: [timestamp, message]
   - Progress log (recent): [last 5 entries]
   - Current phase: [research/specify/plan/execute/review]
   ```
2. Auto-generate on task start/resume
3. Include in system prompt before user plan

**Integration Points:**
- `daemon/task/manager.rs` → `build_context_recovery()`
- `daemon/process/claude.rs` → Prepend to injected prompt
- New file: `daemon/task/context_recovery.rs`

---

### 1.3 Dependency-Ordered Task Execution

**Name:** `dependency-ordering`
**Priority:** MEDIUM

**Description:**
Enforce that tasks execute in dependency order: schema changes → backend → UI.

**Motivation:**
- Ralph explicitly orders stories: database → server → UI → dashboard
- Prevents tasks from failing due to missing dependencies
- Creates clean commit boundaries

**Current State in Makima:**
- Tasks have `priority` field but no dependency inference
- Supervisors manually order task creation
- No validation of execution order

**Implementation Approach:**
1. Add `depends_on: Vec<Uuid>` field to tasks
2. Validate dependencies before marking task as runnable
3. Auto-detect dependency patterns:
   - Migration files → backend code
   - Types/models → consumers
   - APIs → UI components
4. Warn if a task seems out of order based on file patterns

**Configuration:**
```toml
[dependency_ordering]
enabled = true
auto_detect = true
warn_on_violation = true
```

**Integration Points:**
- `db/models.rs` → Task model extension
- `daemon/task/manager.rs` → `can_start_task()` validation
- New file: `daemon/task/dependency_analysis.rs`

---

### 1.4 Verifiable Acceptance Criteria

**Name:** `acceptance-criteria`
**Priority:** MEDIUM

**Description:**
Require that all tasks have verifiable (not vague) acceptance criteria, and automatically validate them.

**Motivation:**
- Ralph requires criteria like "Typecheck passes", "Tests pass"
- Prevents "done" status on incomplete work
- Provides clear success definition

**Current State in Makima:**
- `COMPLETION_GATE` signals readiness
- No structured criteria validation
- Manual interpretation of "ready"

**Implementation Approach:**
1. Parse task plans for acceptance criteria section
2. Identify verifiable vs vague criteria:
   - **Good:** "All tests pass", "No TypeScript errors"
   - **Bad:** "Works correctly", "Good UX"
3. Auto-append standard criteria if missing:
   - "No uncommitted changes remain"
   - "CI/linting passes" (if configured)
4. Validate criteria satisfaction before marking complete

**Configuration:**
```toml
[acceptance_criteria]
enabled = true
require_verifiable = true
auto_append_standard = ["no_uncommitted_changes", "tests_pass"]
```

**Integration Points:**
- `daemon/task/completion_gate.rs` → Extend validation
- `llm/task_output.rs` → Parse criteria from plan
- New file: `daemon/task/criteria_validator.rs`

---

### 1.5 Task Sizing Validation

**Name:** `task-sizing`
**Priority:** MEDIUM

**Description:**
Warn or prevent tasks that are likely too large to complete in one context window.

**Motivation:**
- Ralph's story sizing is crucial: "If you can't describe it in 2-3 sentences, it's too big"
- Large tasks exhaust context, require handoffs
- Smaller tasks = cleaner commits, easier recovery

**Current State in Makima:**
- No task size estimation
- `auto_handoff` exists but reactive
- Manual task breakdown by supervisors

**Implementation Approach:**
1. Estimate task complexity from plan text:
   - Number of files mentioned
   - Scope words ("entire", "all", "refactor")
   - Estimated token count
2. Warn if task exceeds thresholds
3. Suggest breakdown for large tasks

**Thresholds:**
- Files mentioned > 10 → Warning
- Plan length > 500 words → Warning
- Scope words detected → Strong warning

**Configuration:**
```toml
[task_sizing]
enabled = true
max_files_mentioned = 10
max_plan_words = 500
warn_on_scope_words = ["entire", "all", "complete", "refactor"]
```

**Integration Points:**
- `daemon/task/manager.rs` → `validate_task_size()`
- Supervisor prompts → Include sizing guidance
- New file: `daemon/task/sizing_validator.rs`

---

### 1.6 Commit Discipline

**Name:** `commit-discipline`
**Priority:** HIGH

**Description:**
Enforce structured commit messages and only allow commits when quality checks pass.

**Motivation:**
- Ralph: "Only commit when tests pass"
- Clean git history aids context recovery
- Structured messages enable automation

**Current State in Makima:**
- Checkpoints create commits automatically
- No quality gate before commit
- Commit messages not standardized

**Implementation Approach:**
1. Standardize commit message format:
   ```
   feat/fix/chore: [Task ID] - [Summary]

   [Optional body]

   Co-Authored-By: Claude <noreply@anthropic.com>
   ```
2. Run quality checks before checkpoint commit:
   - TypeScript/lint (if configured)
   - Tests (if configured)
3. Reject commit if checks fail, provide feedback

**Configuration:**
```toml
[commit_discipline]
enabled = true
require_tests = false  # Optional
require_lint = false   # Optional
message_format = "conventional"  # conventional, simple
```

**Integration Points:**
- `daemon/worktree/manager.rs` → `create_checkpoint()`
- `daemon/task/manager.rs` → Pre-commit hooks
- New file: `daemon/task/commit_validator.rs`

---

## Part 2: Optional Features (Flag-Controlled)

These features provide advanced control and should be opt-in via CLI flags or configuration.

### 2.1 Maximum Iterations Limit

**Name:** `--max-iterations`
**Priority:** HIGH
**Flag:** `--max-iterations <N>` or `-i <N>`

**Description:**
Limit the number of autonomous loop iterations before stopping.

**Motivation:**
- Ralph uses `max_iterations` (default 10)
- Prevents runaway loops that waste tokens
- Provides predictable behavior

**Current State in Makima:**
- Circuit breaker has `iteration_count` limit (10)
- Not configurable at task/contract level
- No per-run override

**Implementation Approach:**
1. Add `--max-iterations` flag to contract/task creation
2. Store in task metadata
3. Check count in autonomous loop logic
4. Exit cleanly with message when limit reached

**CLI Usage:**
```bash
makima contract create --max-iterations 5 "Feature X"
makima supervisor spawn "Task" "Plan" --max-iterations 3
```

**Configuration:**
```toml
[autonomous_loop]
default_max_iterations = 10
hard_limit = 50  # Absolute maximum
```

**Integration Points:**
- `daemon/task/manager.rs` → Loop control
- `db/models.rs` → Task field
- CLI argument parsing

---

### 2.2 Single-Story-Per-Run Mode

**Name:** `--single-task`
**Priority:** MEDIUM
**Flag:** `--single-task` or `-1`

**Description:**
Execute exactly one task per Claude invocation, then stop (don't auto-continue).

**Motivation:**
- Ralph's model: one story per iteration
- Ensures complete focus
- Creates clean boundaries
- Simplifies failure recovery

**Current State in Makima:**
- Tasks can run multiple iterations
- Supervisor can spawn multiple concurrent tasks
- No single-task mode

**Implementation Approach:**
1. When `--single-task` enabled:
   - Execute one task
   - Parse completion gate
   - Stop regardless of `ready` status
   - Report status and exit
2. User reviews, then manually continues or adjusts

**CLI Usage:**
```bash
makima contract create --single-task "Feature X"
```

**Configuration:**
```toml
[execution]
single_task_mode = false  # Default
```

**Integration Points:**
- `daemon/task/manager.rs` → Execution loop
- `server/handlers/contract_daemon.rs` → Contract options
- CLI flags

---

### 2.3 Archive Previous Runs

**Name:** `--archive-previous`
**Priority:** LOW
**Flag:** `--archive-previous` or `--archive`

**Description:**
When starting a new feature/contract, archive the previous run's artifacts.

**Motivation:**
- Ralph archives to `archive/YYYY-MM-DD-feature-name/`
- Clean separation between features
- Preserves history for reference
- Prevents context pollution

**Current State in Makima:**
- Worktrees are per-task but ephemeral
- No archiving mechanism
- Old task data in database but hard to access

**Implementation Approach:**
1. On contract creation with `--archive`:
   - Find previous contract with same name/goal
   - Copy key artifacts to `archive/` directory:
     - progress.log
     - Final checkpoint
     - Summary document
2. Archive structure:
   ```
   archive/
   └── 2026-01-22-feature-name/
       ├── progress.log
       ├── summary.md
       └── final-diff.patch
   ```

**CLI Usage:**
```bash
makima contract create --archive-previous "Feature X v2"
```

**Integration Points:**
- `daemon/worktree/manager.rs` → Archive logic
- `server/handlers/contracts.rs` → Archive on create
- New file: `daemon/archive/manager.rs`

---

### 2.4 Require Tests Quality Gate

**Name:** `--require-tests`
**Priority:** MEDIUM
**Flag:** `--require-tests` or `--tests`

**Description:**
Block task completion unless tests pass.

**Motivation:**
- Ralph: stories require "Tests pass" in acceptance criteria
- Ensures quality before merge
- Catches regressions early

**Current State in Makima:**
- Completion gate is self-reported by Claude
- No actual test execution
- Circuit breaker is reactive, not proactive

**Implementation Approach:**
1. Detect test framework from project:
   - `package.json` scripts
   - `pytest`, `cargo test`, etc.
2. Run tests before accepting completion
3. Parse test output for pass/fail
4. If failed:
   - Don't mark complete
   - Inject failure info into next prompt
   - Increment failure counter

**CLI Usage:**
```bash
makima contract create --require-tests "Feature X"
```

**Configuration:**
```toml
[quality_gates]
require_tests = false
test_command = "npm test"  # Auto-detected if not set
test_timeout_secs = 300
```

**Integration Points:**
- `daemon/task/completion_gate.rs` → Test validation
- `daemon/process/` → Test runner
- New file: `daemon/quality/test_runner.rs`

---

### 2.5 PRD Mode

**Name:** `--prd-mode`
**Priority:** MEDIUM
**Flag:** `--prd-mode` or `--prd`

**Description:**
Enable ralph-style PRD workflow with structured JSON task tracking.

**Motivation:**
- Ralph's `prd.json` provides clear task breakdown
- Structured format aids automation
- Priority-based execution
- Clear pass/fail tracking

**Current State in Makima:**
- Plans are free-form text
- Task status is in database, not file-based
- No structured PRD format

**Implementation Approach:**
1. When `--prd-mode` enabled:
   - Create `prd.json` in worktree:
     ```json
     {
       "project": "Contract Name",
       "branchName": "makima/feature",
       "description": "Contract goal",
       "userStories": [
         {
           "id": "US-001",
           "title": "Story title",
           "description": "As a...",
           "acceptanceCriteria": ["Criterion 1"],
           "priority": 1,
           "passes": false,
           "notes": ""
         }
       ]
     }
     ```
   - Tasks update `passes` field on completion
   - Supervisor reads PRD to find next incomplete story
2. Sync between database and `prd.json`

**CLI Usage:**
```bash
makima contract create --prd-mode "Feature X"
```

**Configuration:**
```toml
[prd_mode]
enabled = false
auto_generate_from_plan = true
sync_to_database = true
```

**Integration Points:**
- `daemon/task/manager.rs` → PRD sync
- New file: `daemon/prd/manager.rs`
- New file: `daemon/prd/models.rs`

---

### 2.6 Learning Mode

**Name:** `--learn`
**Priority:** LOW
**Flag:** `--learn` or `-l`

**Description:**
Enable cross-task learning that extracts patterns and improves future prompts.

**Motivation:**
- Ralph's AGENTS.md consolidates patterns
- Learning from success improves future runs
- Learning from failure prevents repeating mistakes

**Current State in Makima:**
- No cross-task learning
- Each task starts fresh
- Patterns not extracted or reused

**Implementation Approach:**
1. On task completion, extract:
   - Files commonly modified together
   - Commands that succeeded/failed
   - Error patterns and solutions
2. Store in `learnings.db` (SQLite) per repository
3. Inject relevant learnings into future task prompts:
   ```
   ## Learned Patterns for this codebase:
   - Always run `npm run typecheck` before commit
   - The auth middleware is in src/middleware/auth.ts
   - Database migrations require `npx prisma generate` after
   ```

**CLI Usage:**
```bash
makima contract create --learn "Feature X"
```

**Configuration:**
```toml
[learning]
enabled = false
extract_file_patterns = true
extract_command_patterns = true
max_learnings_injected = 10
```

**Integration Points:**
- `daemon/task/manager.rs` → Learning extraction
- `daemon/process/claude.rs` → Learning injection
- New file: `daemon/learning/extractor.rs`
- New file: `daemon/learning/database.rs`

---

### 2.7 Browser Verification for UI

**Name:** `--browser-verify`
**Priority:** LOW
**Flag:** `--browser-verify` or `--ui`

**Description:**
For UI-related tasks, require browser verification before completion.

**Motivation:**
- Ralph includes "Verify in browser" as acceptance criteria
- Visual verification catches issues that tests miss
- Ensures UI actually works, not just compiles

**Current State in Makima:**
- No browser verification
- UI tasks treated same as backend
- No visual testing integration

**Implementation Approach:**
1. Detect UI-related tasks from file patterns:
   - `*.tsx`, `*.vue`, `*.svelte`
   - `components/`, `pages/`, `views/`
2. When completing, prompt for verification:
   - Launch dev server if needed
   - Open browser to relevant URL
   - Wait for user confirmation or screenshot analysis
3. Alternative: Integrate with Playwright for visual testing

**CLI Usage:**
```bash
makima contract create --browser-verify "Add login page"
```

**Configuration:**
```toml
[browser_verify]
enabled = false
auto_detect_ui_tasks = true
dev_server_command = "npm run dev"
base_url = "http://localhost:3000"
```

**Integration Points:**
- `daemon/task/completion_gate.rs` → Browser check
- New file: `daemon/quality/browser_verify.rs`

---

## Part 3: Implementation Priorities

### Phase 1: Foundation (High Priority)
1. **Structured Progress Logging** - Core to context management
2. **Context Recovery Pattern** - Enables stateless iterations
3. **Commit Discipline** - Ensures quality git history
4. **Maximum Iterations Limit** - Prevents runaway loops

### Phase 2: Quality (Medium Priority)
5. **Verifiable Acceptance Criteria** - Improves completion reliability
6. **Dependency-Ordered Execution** - Prevents out-of-order failures
7. **Task Sizing Validation** - Catches too-large tasks early
8. **Require Tests Quality Gate** - Ensures working code
9. **Single-Story Mode** - Ralph's core pattern

### Phase 3: Advanced (Low Priority)
10. **PRD Mode** - Full ralph-style workflow
11. **Learning Mode** - Cross-task intelligence
12. **Archive Previous** - Run isolation
13. **Browser Verification** - UI quality

---

## Part 4: Configuration Summary

### New Configuration File Sections

```toml
# makima-daemon.toml additions

[progress_log]
enabled = true
max_entries_injected = 20
consolidation_threshold = 50

[context_recovery]
enabled = true
include_git_status = true
include_recent_progress = true

[dependency_ordering]
enabled = true
auto_detect = true
warn_on_violation = true

[acceptance_criteria]
enabled = true
require_verifiable = true
auto_append_standard = ["no_uncommitted_changes"]

[task_sizing]
enabled = true
max_files_mentioned = 10
max_plan_words = 500
warn_on_scope_words = ["entire", "all", "complete", "refactor"]

[commit_discipline]
enabled = true
require_tests = false
require_lint = false
message_format = "conventional"

[autonomous_loop]
default_max_iterations = 10
hard_limit = 50

[execution]
single_task_mode = false

[quality_gates]
require_tests = false
test_command = ""  # Auto-detected
test_timeout_secs = 300

[prd_mode]
enabled = false
auto_generate_from_plan = true
sync_to_database = true

[learning]
enabled = false
extract_file_patterns = true
extract_command_patterns = true
max_learnings_injected = 10

[browser_verify]
enabled = false
auto_detect_ui_tasks = true
dev_server_command = "npm run dev"
base_url = "http://localhost:3000"
```

### CLI Flag Summary

| Flag | Short | Feature | Default |
|------|-------|---------|---------|
| `--max-iterations` | `-i` | Iteration limit | 10 |
| `--single-task` | `-1` | One task per run | false |
| `--archive-previous` | `--archive` | Archive old runs | false |
| `--require-tests` | `--tests` | Test quality gate | false |
| `--prd-mode` | `--prd` | PRD-style workflow | false |
| `--learn` | `-l` | Cross-task learning | false |
| `--browser-verify` | `--ui` | UI verification | false |

---

## Part 5: Migration Path

### For Existing Contracts

1. Progress logging starts fresh (no historical data)
2. Context recovery applies to new tasks only
3. Existing tasks not affected by new validation
4. Can opt-in to optional features per-contract

### Backward Compatibility

- All opinionated features have graceful defaults
- Optional features are off by default
- No breaking changes to existing CLI/API
- Configuration is additive

---

## Conclusion

These features address the core challenges mentioned in the contract goal:
- **Manual steering** → Progress logging, context recovery, learning mode
- **Context between runs** → Structured persistence, progress.txt pattern
- **Handholding** → Verifiable criteria, commit discipline, quality gates

The opinionated features make makima more reliable out of the box, while optional features provide ralph-style workflows for users who want them.