summaryrefslogtreecommitdiff
path: root/ralph-features-spec.md
diff options
context:
space:
mode:
Diffstat (limited to 'ralph-features-spec.md')
-rw-r--r--ralph-features-spec.md773
1 files changed, 773 insertions, 0 deletions
diff --git a/ralph-features-spec.md b/ralph-features-spec.md
new file mode 100644
index 0000000..f25a8fe
--- /dev/null
+++ b/ralph-features-spec.md
@@ -0,0 +1,773 @@
+# Ralph-Inspired Features for Makima
+
+## Overview
+
+This specification outlines features derived from the [ralph](https://github.com/snarktank/ralph) autonomous AI agent loop system that can be implemented in makima to reduce manual steering and improve context management between runs.
+
+---
+
+## Part 1: Opinionated Features (Always Enabled)
+
+These features represent best practices that should be core to makima's behavior.
+
+### 1.1 Structured Progress Logging
+
+**Name:** `progress-log`
+**Priority:** HIGH
+
+**Description:**
+Implement an append-only progress log file (`progress.txt` or similar) that persists learnings, patterns, and context across task iterations.
+
+**Motivation:**
+- Ralph's most powerful feature is its dual-file learning system
+- Captures context that survives Claude's context window limits
+- Enables pattern discovery over time
+- Provides debugging history
+
+**Current State in Makima:**
+- `progress_summary` field exists but is per-task, not persistent
+- Task events stored in database but not summarized
+- No cross-task learning mechanism
+
+**Implementation Approach:**
+1. Add `progress.log` file to each task's worktree
+2. Append structured entries at task completion:
+ ```
+ ## [Timestamp] - Task [ID]: [Name]
+ - Status: [done/failed]
+ - Files changed: [list]
+ - **Learnings:**
+ - [Pattern discovered]
+ - [Gotcha encountered]
+ ---
+ ```
+3. Inject progress.log contents into new task prompts
+4. Periodic consolidation into `AGENTS.md` equivalent
+
+**Configuration:**
+```toml
+[progress_log]
+enabled = true # Always on
+max_entries_injected = 20 # Limit for prompt injection
+consolidation_threshold = 50 # Trigger consolidation
+```
+
+**Integration Points:**
+- `daemon/task/manager.rs` → `on_completion()` hook
+- `daemon/process/claude.rs` → `inject_system_prompt()`
+- New file: `daemon/task/progress_log.rs`
+
+---
+
+### 1.2 Context Recovery Pattern
+
+**Name:** `context-recovery`
+**Priority:** HIGH
+
+**Description:**
+Standardize how context is rebuilt when tasks resume or restart, ensuring Claude can quickly orient itself.
+
+**Motivation:**
+- Ralph's stateless model works because context recovery is systematic
+- Each iteration reads from well-defined artifacts
+- Reduces confusion and repeated work
+
+**Current State in Makima:**
+- `conversation_state` stored for resumption
+- `--continue` flag relies on Claude's session state
+- No structured "where we left off" pattern
+
+**Implementation Approach:**
+1. Create standard context recovery header for task prompts:
+ ```
+ ## Context Recovery
+ - Current branch: [branch name]
+ - Git status: [uncommitted changes summary]
+ - Last checkpoint: [timestamp, message]
+ - Progress log (recent): [last 5 entries]
+ - Current phase: [research/specify/plan/execute/review]
+ ```
+2. Auto-generate on task start/resume
+3. Include in system prompt before user plan
+
+**Integration Points:**
+- `daemon/task/manager.rs` → `build_context_recovery()`
+- `daemon/process/claude.rs` → Prepend to injected prompt
+- New file: `daemon/task/context_recovery.rs`
+
+---
+
+### 1.3 Dependency-Ordered Task Execution
+
+**Name:** `dependency-ordering`
+**Priority:** MEDIUM
+
+**Description:**
+Enforce that tasks execute in dependency order: schema changes → backend → UI.
+
+**Motivation:**
+- Ralph explicitly orders stories: database → server → UI → dashboard
+- Prevents tasks from failing due to missing dependencies
+- Creates clean commit boundaries
+
+**Current State in Makima:**
+- Tasks have `priority` field but no dependency inference
+- Supervisors manually order task creation
+- No validation of execution order
+
+**Implementation Approach:**
+1. Add `depends_on: Vec<Uuid>` field to tasks
+2. Validate dependencies before marking task as runnable
+3. Auto-detect dependency patterns:
+ - Migration files → backend code
+ - Types/models → consumers
+ - APIs → UI components
+4. Warn if a task seems out of order based on file patterns
+
+**Configuration:**
+```toml
+[dependency_ordering]
+enabled = true
+auto_detect = true
+warn_on_violation = true
+```
+
+**Integration Points:**
+- `db/models.rs` → Task model extension
+- `daemon/task/manager.rs` → `can_start_task()` validation
+- New file: `daemon/task/dependency_analysis.rs`
+
+---
+
+### 1.4 Verifiable Acceptance Criteria
+
+**Name:** `acceptance-criteria`
+**Priority:** MEDIUM
+
+**Description:**
+Require that all tasks have verifiable (not vague) acceptance criteria, and automatically validate them.
+
+**Motivation:**
+- Ralph requires criteria like "Typecheck passes", "Tests pass"
+- Prevents "done" status on incomplete work
+- Provides clear success definition
+
+**Current State in Makima:**
+- `COMPLETION_GATE` signals readiness
+- No structured criteria validation
+- Manual interpretation of "ready"
+
+**Implementation Approach:**
+1. Parse task plans for acceptance criteria section
+2. Identify verifiable vs vague criteria:
+ - **Good:** "All tests pass", "No TypeScript errors"
+ - **Bad:** "Works correctly", "Good UX"
+3. Auto-append standard criteria if missing:
+ - "No uncommitted changes remain"
+ - "CI/linting passes" (if configured)
+4. Validate criteria satisfaction before marking complete
+
+**Configuration:**
+```toml
+[acceptance_criteria]
+enabled = true
+require_verifiable = true
+auto_append_standard = ["no_uncommitted_changes", "tests_pass"]
+```
+
+**Integration Points:**
+- `daemon/task/completion_gate.rs` → Extend validation
+- `llm/task_output.rs` → Parse criteria from plan
+- New file: `daemon/task/criteria_validator.rs`
+
+---
+
+### 1.5 Task Sizing Validation
+
+**Name:** `task-sizing`
+**Priority:** MEDIUM
+
+**Description:**
+Warn or prevent tasks that are likely too large to complete in one context window.
+
+**Motivation:**
+- Ralph's story sizing is crucial: "If you can't describe it in 2-3 sentences, it's too big"
+- Large tasks exhaust context, require handoffs
+- Smaller tasks = cleaner commits, easier recovery
+
+**Current State in Makima:**
+- No task size estimation
+- `auto_handoff` exists but reactive
+- Manual task breakdown by supervisors
+
+**Implementation Approach:**
+1. Estimate task complexity from plan text:
+ - Number of files mentioned
+ - Scope words ("entire", "all", "refactor")
+ - Estimated token count
+2. Warn if task exceeds thresholds
+3. Suggest breakdown for large tasks
+
+**Thresholds:**
+- Files mentioned > 10 → Warning
+- Plan length > 500 words → Warning
+- Scope words detected → Strong warning
+
+**Configuration:**
+```toml
+[task_sizing]
+enabled = true
+max_files_mentioned = 10
+max_plan_words = 500
+warn_on_scope_words = ["entire", "all", "complete", "refactor"]
+```
+
+**Integration Points:**
+- `daemon/task/manager.rs` → `validate_task_size()`
+- Supervisor prompts → Include sizing guidance
+- New file: `daemon/task/sizing_validator.rs`
+
+---
+
+### 1.6 Commit Discipline
+
+**Name:** `commit-discipline`
+**Priority:** HIGH
+
+**Description:**
+Enforce structured commit messages and only allow commits when quality checks pass.
+
+**Motivation:**
+- Ralph: "Only commit when tests pass"
+- Clean git history aids context recovery
+- Structured messages enable automation
+
+**Current State in Makima:**
+- Checkpoints create commits automatically
+- No quality gate before commit
+- Commit messages not standardized
+
+**Implementation Approach:**
+1. Standardize commit message format:
+ ```
+ feat/fix/chore: [Task ID] - [Summary]
+
+ [Optional body]
+
+ Co-Authored-By: Claude <noreply@anthropic.com>
+ ```
+2. Run quality checks before checkpoint commit:
+ - TypeScript/lint (if configured)
+ - Tests (if configured)
+3. Reject commit if checks fail, provide feedback
+
+**Configuration:**
+```toml
+[commit_discipline]
+enabled = true
+require_tests = false # Optional
+require_lint = false # Optional
+message_format = "conventional" # conventional, simple
+```
+
+**Integration Points:**
+- `daemon/worktree/manager.rs` → `create_checkpoint()`
+- `daemon/task/manager.rs` → Pre-commit hooks
+- New file: `daemon/task/commit_validator.rs`
+
+---
+
+## Part 2: Optional Features (Flag-Controlled)
+
+These features provide advanced control and should be opt-in via CLI flags or configuration.
+
+### 2.1 Maximum Iterations Limit
+
+**Name:** `--max-iterations`
+**Priority:** HIGH
+**Flag:** `--max-iterations <N>` or `-i <N>`
+
+**Description:**
+Limit the number of autonomous loop iterations before stopping.
+
+**Motivation:**
+- Ralph uses `max_iterations` (default 10)
+- Prevents runaway loops that waste tokens
+- Provides predictable behavior
+
+**Current State in Makima:**
+- Circuit breaker has `iteration_count` limit (10)
+- Not configurable at task/contract level
+- No per-run override
+
+**Implementation Approach:**
+1. Add `--max-iterations` flag to contract/task creation
+2. Store in task metadata
+3. Check count in autonomous loop logic
+4. Exit cleanly with message when limit reached
+
+**CLI Usage:**
+```bash
+makima contract create --max-iterations 5 "Feature X"
+makima supervisor spawn "Task" "Plan" --max-iterations 3
+```
+
+**Configuration:**
+```toml
+[autonomous_loop]
+default_max_iterations = 10
+hard_limit = 50 # Absolute maximum
+```
+
+**Integration Points:**
+- `daemon/task/manager.rs` → Loop control
+- `db/models.rs` → Task field
+- CLI argument parsing
+
+---
+
+### 2.2 Single-Story-Per-Run Mode
+
+**Name:** `--single-task`
+**Priority:** MEDIUM
+**Flag:** `--single-task` or `-1`
+
+**Description:**
+Execute exactly one task per Claude invocation, then stop (don't auto-continue).
+
+**Motivation:**
+- Ralph's model: one story per iteration
+- Ensures complete focus
+- Creates clean boundaries
+- Simplifies failure recovery
+
+**Current State in Makima:**
+- Tasks can run multiple iterations
+- Supervisor can spawn multiple concurrent tasks
+- No single-task mode
+
+**Implementation Approach:**
+1. When `--single-task` enabled:
+ - Execute one task
+ - Parse completion gate
+ - Stop regardless of `ready` status
+ - Report status and exit
+2. User reviews, then manually continues or adjusts
+
+**CLI Usage:**
+```bash
+makima contract create --single-task "Feature X"
+```
+
+**Configuration:**
+```toml
+[execution]
+single_task_mode = false # Default
+```
+
+**Integration Points:**
+- `daemon/task/manager.rs` → Execution loop
+- `server/handlers/contract_daemon.rs` → Contract options
+- CLI flags
+
+---
+
+### 2.3 Archive Previous Runs
+
+**Name:** `--archive-previous`
+**Priority:** LOW
+**Flag:** `--archive-previous` or `--archive`
+
+**Description:**
+When starting a new feature/contract, archive the previous run's artifacts.
+
+**Motivation:**
+- Ralph archives to `archive/YYYY-MM-DD-feature-name/`
+- Clean separation between features
+- Preserves history for reference
+- Prevents context pollution
+
+**Current State in Makima:**
+- Worktrees are per-task but ephemeral
+- No archiving mechanism
+- Old task data in database but hard to access
+
+**Implementation Approach:**
+1. On contract creation with `--archive`:
+ - Find previous contract with same name/goal
+ - Copy key artifacts to `archive/` directory:
+ - progress.log
+ - Final checkpoint
+ - Summary document
+2. Archive structure:
+ ```
+ archive/
+ └── 2026-01-22-feature-name/
+ ├── progress.log
+ ├── summary.md
+ └── final-diff.patch
+ ```
+
+**CLI Usage:**
+```bash
+makima contract create --archive-previous "Feature X v2"
+```
+
+**Integration Points:**
+- `daemon/worktree/manager.rs` → Archive logic
+- `server/handlers/contracts.rs` → Archive on create
+- New file: `daemon/archive/manager.rs`
+
+---
+
+### 2.4 Require Tests Quality Gate
+
+**Name:** `--require-tests`
+**Priority:** MEDIUM
+**Flag:** `--require-tests` or `--tests`
+
+**Description:**
+Block task completion unless tests pass.
+
+**Motivation:**
+- Ralph: stories require "Tests pass" in acceptance criteria
+- Ensures quality before merge
+- Catches regressions early
+
+**Current State in Makima:**
+- Completion gate is self-reported by Claude
+- No actual test execution
+- Circuit breaker is reactive, not proactive
+
+**Implementation Approach:**
+1. Detect test framework from project:
+ - `package.json` scripts
+ - `pytest`, `cargo test`, etc.
+2. Run tests before accepting completion
+3. Parse test output for pass/fail
+4. If failed:
+ - Don't mark complete
+ - Inject failure info into next prompt
+ - Increment failure counter
+
+**CLI Usage:**
+```bash
+makima contract create --require-tests "Feature X"
+```
+
+**Configuration:**
+```toml
+[quality_gates]
+require_tests = false
+test_command = "npm test" # Auto-detected if not set
+test_timeout_secs = 300
+```
+
+**Integration Points:**
+- `daemon/task/completion_gate.rs` → Test validation
+- `daemon/process/` → Test runner
+- New file: `daemon/quality/test_runner.rs`
+
+---
+
+### 2.5 PRD Mode
+
+**Name:** `--prd-mode`
+**Priority:** MEDIUM
+**Flag:** `--prd-mode` or `--prd`
+
+**Description:**
+Enable ralph-style PRD workflow with structured JSON task tracking.
+
+**Motivation:**
+- Ralph's `prd.json` provides clear task breakdown
+- Structured format aids automation
+- Priority-based execution
+- Clear pass/fail tracking
+
+**Current State in Makima:**
+- Plans are free-form text
+- Task status is in database, not file-based
+- No structured PRD format
+
+**Implementation Approach:**
+1. When `--prd-mode` enabled:
+ - Create `prd.json` in worktree:
+ ```json
+ {
+ "project": "Contract Name",
+ "branchName": "makima/feature",
+ "description": "Contract goal",
+ "userStories": [
+ {
+ "id": "US-001",
+ "title": "Story title",
+ "description": "As a...",
+ "acceptanceCriteria": ["Criterion 1"],
+ "priority": 1,
+ "passes": false,
+ "notes": ""
+ }
+ ]
+ }
+ ```
+ - Tasks update `passes` field on completion
+ - Supervisor reads PRD to find next incomplete story
+2. Sync between database and `prd.json`
+
+**CLI Usage:**
+```bash
+makima contract create --prd-mode "Feature X"
+```
+
+**Configuration:**
+```toml
+[prd_mode]
+enabled = false
+auto_generate_from_plan = true
+sync_to_database = true
+```
+
+**Integration Points:**
+- `daemon/task/manager.rs` → PRD sync
+- New file: `daemon/prd/manager.rs`
+- New file: `daemon/prd/models.rs`
+
+---
+
+### 2.6 Learning Mode
+
+**Name:** `--learn`
+**Priority:** LOW
+**Flag:** `--learn` or `-l`
+
+**Description:**
+Enable cross-task learning that extracts patterns and improves future prompts.
+
+**Motivation:**
+- Ralph's AGENTS.md consolidates patterns
+- Learning from success improves future runs
+- Learning from failure prevents repeating mistakes
+
+**Current State in Makima:**
+- No cross-task learning
+- Each task starts fresh
+- Patterns not extracted or reused
+
+**Implementation Approach:**
+1. On task completion, extract:
+ - Files commonly modified together
+ - Commands that succeeded/failed
+ - Error patterns and solutions
+2. Store in `learnings.db` (SQLite) per repository
+3. Inject relevant learnings into future task prompts:
+ ```
+ ## Learned Patterns for this codebase:
+ - Always run `npm run typecheck` before commit
+ - The auth middleware is in src/middleware/auth.ts
+ - Database migrations require `npx prisma generate` after
+ ```
+
+**CLI Usage:**
+```bash
+makima contract create --learn "Feature X"
+```
+
+**Configuration:**
+```toml
+[learning]
+enabled = false
+extract_file_patterns = true
+extract_command_patterns = true
+max_learnings_injected = 10
+```
+
+**Integration Points:**
+- `daemon/task/manager.rs` → Learning extraction
+- `daemon/process/claude.rs` → Learning injection
+- New file: `daemon/learning/extractor.rs`
+- New file: `daemon/learning/database.rs`
+
+---
+
+### 2.7 Browser Verification for UI
+
+**Name:** `--browser-verify`
+**Priority:** LOW
+**Flag:** `--browser-verify` or `--ui`
+
+**Description:**
+For UI-related tasks, require browser verification before completion.
+
+**Motivation:**
+- Ralph includes "Verify in browser" as acceptance criteria
+- Visual verification catches issues that tests miss
+- Ensures UI actually works, not just compiles
+
+**Current State in Makima:**
+- No browser verification
+- UI tasks treated same as backend
+- No visual testing integration
+
+**Implementation Approach:**
+1. Detect UI-related tasks from file patterns:
+ - `*.tsx`, `*.vue`, `*.svelte`
+ - `components/`, `pages/`, `views/`
+2. When completing, prompt for verification:
+ - Launch dev server if needed
+ - Open browser to relevant URL
+ - Wait for user confirmation or screenshot analysis
+3. Alternative: Integrate with Playwright for visual testing
+
+**CLI Usage:**
+```bash
+makima contract create --browser-verify "Add login page"
+```
+
+**Configuration:**
+```toml
+[browser_verify]
+enabled = false
+auto_detect_ui_tasks = true
+dev_server_command = "npm run dev"
+base_url = "http://localhost:3000"
+```
+
+**Integration Points:**
+- `daemon/task/completion_gate.rs` → Browser check
+- New file: `daemon/quality/browser_verify.rs`
+
+---
+
+## Part 3: Implementation Priorities
+
+### Phase 1: Foundation (High Priority)
+1. **Structured Progress Logging** - Core to context management
+2. **Context Recovery Pattern** - Enables stateless iterations
+3. **Commit Discipline** - Ensures quality git history
+4. **Maximum Iterations Limit** - Prevents runaway loops
+
+### Phase 2: Quality (Medium Priority)
+5. **Verifiable Acceptance Criteria** - Improves completion reliability
+6. **Dependency-Ordered Execution** - Prevents out-of-order failures
+7. **Task Sizing Validation** - Catches too-large tasks early
+8. **Require Tests Quality Gate** - Ensures working code
+9. **Single-Story Mode** - Ralph's core pattern
+
+### Phase 3: Advanced (Low Priority)
+10. **PRD Mode** - Full ralph-style workflow
+11. **Learning Mode** - Cross-task intelligence
+12. **Archive Previous** - Run isolation
+13. **Browser Verification** - UI quality
+
+---
+
+## Part 4: Configuration Summary
+
+### New Configuration File Sections
+
+```toml
+# makima-daemon.toml additions
+
+[progress_log]
+enabled = true
+max_entries_injected = 20
+consolidation_threshold = 50
+
+[context_recovery]
+enabled = true
+include_git_status = true
+include_recent_progress = true
+
+[dependency_ordering]
+enabled = true
+auto_detect = true
+warn_on_violation = true
+
+[acceptance_criteria]
+enabled = true
+require_verifiable = true
+auto_append_standard = ["no_uncommitted_changes"]
+
+[task_sizing]
+enabled = true
+max_files_mentioned = 10
+max_plan_words = 500
+warn_on_scope_words = ["entire", "all", "complete", "refactor"]
+
+[commit_discipline]
+enabled = true
+require_tests = false
+require_lint = false
+message_format = "conventional"
+
+[autonomous_loop]
+default_max_iterations = 10
+hard_limit = 50
+
+[execution]
+single_task_mode = false
+
+[quality_gates]
+require_tests = false
+test_command = "" # Auto-detected
+test_timeout_secs = 300
+
+[prd_mode]
+enabled = false
+auto_generate_from_plan = true
+sync_to_database = true
+
+[learning]
+enabled = false
+extract_file_patterns = true
+extract_command_patterns = true
+max_learnings_injected = 10
+
+[browser_verify]
+enabled = false
+auto_detect_ui_tasks = true
+dev_server_command = "npm run dev"
+base_url = "http://localhost:3000"
+```
+
+### CLI Flag Summary
+
+| Flag | Short | Feature | Default |
+|------|-------|---------|---------|
+| `--max-iterations` | `-i` | Iteration limit | 10 |
+| `--single-task` | `-1` | One task per run | false |
+| `--archive-previous` | `--archive` | Archive old runs | false |
+| `--require-tests` | `--tests` | Test quality gate | false |
+| `--prd-mode` | `--prd` | PRD-style workflow | false |
+| `--learn` | `-l` | Cross-task learning | false |
+| `--browser-verify` | `--ui` | UI verification | false |
+
+---
+
+## Part 5: Migration Path
+
+### For Existing Contracts
+
+1. Progress logging starts fresh (no historical data)
+2. Context recovery applies to new tasks only
+3. Existing tasks not affected by new validation
+4. Can opt-in to optional features per-contract
+
+### Backward Compatibility
+
+- All opinionated features have graceful defaults
+- Optional features are off by default
+- No breaking changes to existing CLI/API
+- Configuration is additive
+
+---
+
+## Conclusion
+
+These features address the core challenges mentioned in the contract goal:
+- **Manual steering** → Progress logging, context recovery, learning mode
+- **Context between runs** → Structured persistence, progress.txt pattern
+- **Handholding** → Verifiable criteria, commit discipline, quality gates
+
+The opinionated features make makima more reliable out of the box, while optional features provide ralph-style workflows for users who want them.