# Ralph-Inspired Features for Makima ## Overview This specification outlines features derived from the [ralph](https://github.com/snarktank/ralph) autonomous AI agent loop system that can be implemented in makima to reduce manual steering and improve context management between runs. --- ## Part 1: Opinionated Features (Always Enabled) These features represent best practices that should be core to makima's behavior. ### 1.1 Structured Progress Logging **Name:** `progress-log` **Priority:** HIGH **Description:** Implement an append-only progress log file (`progress.txt` or similar) that persists learnings, patterns, and context across task iterations. **Motivation:** - Ralph's most powerful feature is its dual-file learning system - Captures context that survives Claude's context window limits - Enables pattern discovery over time - Provides debugging history **Current State in Makima:** - `progress_summary` field exists but is per-task, not persistent - Task events stored in database but not summarized - No cross-task learning mechanism **Implementation Approach:** 1. Add `progress.log` file to each task's worktree 2. Append structured entries at task completion: ``` ## [Timestamp] - Task [ID]: [Name] - Status: [done/failed] - Files changed: [list] - **Learnings:** - [Pattern discovered] - [Gotcha encountered] --- ``` 3. Inject progress.log contents into new task prompts 4. Periodic consolidation into `AGENTS.md` equivalent **Configuration:** ```toml [progress_log] enabled = true # Always on max_entries_injected = 20 # Limit for prompt injection consolidation_threshold = 50 # Trigger consolidation ``` **Integration Points:** - `daemon/task/manager.rs` → `on_completion()` hook - `daemon/process/claude.rs` → `inject_system_prompt()` - New file: `daemon/task/progress_log.rs` --- ### 1.2 Context Recovery Pattern **Name:** `context-recovery` **Priority:** HIGH **Description:** Standardize how context is rebuilt when tasks resume or restart, ensuring Claude can quickly orient itself. **Motivation:** - Ralph's stateless model works because context recovery is systematic - Each iteration reads from well-defined artifacts - Reduces confusion and repeated work **Current State in Makima:** - `conversation_state` stored for resumption - `--continue` flag relies on Claude's session state - No structured "where we left off" pattern **Implementation Approach:** 1. Create standard context recovery header for task prompts: ``` ## Context Recovery - Current branch: [branch name] - Git status: [uncommitted changes summary] - Last checkpoint: [timestamp, message] - Progress log (recent): [last 5 entries] - Current phase: [research/specify/plan/execute/review] ``` 2. Auto-generate on task start/resume 3. Include in system prompt before user plan **Integration Points:** - `daemon/task/manager.rs` → `build_context_recovery()` - `daemon/process/claude.rs` → Prepend to injected prompt - New file: `daemon/task/context_recovery.rs` --- ### 1.3 Dependency-Ordered Task Execution **Name:** `dependency-ordering` **Priority:** MEDIUM **Description:** Enforce that tasks execute in dependency order: schema changes → backend → UI. **Motivation:** - Ralph explicitly orders stories: database → server → UI → dashboard - Prevents tasks from failing due to missing dependencies - Creates clean commit boundaries **Current State in Makima:** - Tasks have `priority` field but no dependency inference - Supervisors manually order task creation - No validation of execution order **Implementation Approach:** 1. Add `depends_on: Vec` field to tasks 2. Validate dependencies before marking task as runnable 3. Auto-detect dependency patterns: - Migration files → backend code - Types/models → consumers - APIs → UI components 4. Warn if a task seems out of order based on file patterns **Configuration:** ```toml [dependency_ordering] enabled = true auto_detect = true warn_on_violation = true ``` **Integration Points:** - `db/models.rs` → Task model extension - `daemon/task/manager.rs` → `can_start_task()` validation - New file: `daemon/task/dependency_analysis.rs` --- ### 1.4 Verifiable Acceptance Criteria **Name:** `acceptance-criteria` **Priority:** MEDIUM **Description:** Require that all tasks have verifiable (not vague) acceptance criteria, and automatically validate them. **Motivation:** - Ralph requires criteria like "Typecheck passes", "Tests pass" - Prevents "done" status on incomplete work - Provides clear success definition **Current State in Makima:** - `COMPLETION_GATE` signals readiness - No structured criteria validation - Manual interpretation of "ready" **Implementation Approach:** 1. Parse task plans for acceptance criteria section 2. Identify verifiable vs vague criteria: - **Good:** "All tests pass", "No TypeScript errors" - **Bad:** "Works correctly", "Good UX" 3. Auto-append standard criteria if missing: - "No uncommitted changes remain" - "CI/linting passes" (if configured) 4. Validate criteria satisfaction before marking complete **Configuration:** ```toml [acceptance_criteria] enabled = true require_verifiable = true auto_append_standard = ["no_uncommitted_changes", "tests_pass"] ``` **Integration Points:** - `daemon/task/completion_gate.rs` → Extend validation - `llm/task_output.rs` → Parse criteria from plan - New file: `daemon/task/criteria_validator.rs` --- ### 1.5 Task Sizing Validation **Name:** `task-sizing` **Priority:** MEDIUM **Description:** Warn or prevent tasks that are likely too large to complete in one context window. **Motivation:** - Ralph's story sizing is crucial: "If you can't describe it in 2-3 sentences, it's too big" - Large tasks exhaust context, require handoffs - Smaller tasks = cleaner commits, easier recovery **Current State in Makima:** - No task size estimation - `auto_handoff` exists but reactive - Manual task breakdown by supervisors **Implementation Approach:** 1. Estimate task complexity from plan text: - Number of files mentioned - Scope words ("entire", "all", "refactor") - Estimated token count 2. Warn if task exceeds thresholds 3. Suggest breakdown for large tasks **Thresholds:** - Files mentioned > 10 → Warning - Plan length > 500 words → Warning - Scope words detected → Strong warning **Configuration:** ```toml [task_sizing] enabled = true max_files_mentioned = 10 max_plan_words = 500 warn_on_scope_words = ["entire", "all", "complete", "refactor"] ``` **Integration Points:** - `daemon/task/manager.rs` → `validate_task_size()` - Supervisor prompts → Include sizing guidance - New file: `daemon/task/sizing_validator.rs` --- ### 1.6 Commit Discipline **Name:** `commit-discipline` **Priority:** HIGH **Description:** Enforce structured commit messages and only allow commits when quality checks pass. **Motivation:** - Ralph: "Only commit when tests pass" - Clean git history aids context recovery - Structured messages enable automation **Current State in Makima:** - Checkpoints create commits automatically - No quality gate before commit - Commit messages not standardized **Implementation Approach:** 1. Standardize commit message format: ``` feat/fix/chore: [Task ID] - [Summary] [Optional body] Co-Authored-By: Claude ``` 2. Run quality checks before checkpoint commit: - TypeScript/lint (if configured) - Tests (if configured) 3. Reject commit if checks fail, provide feedback **Configuration:** ```toml [commit_discipline] enabled = true require_tests = false # Optional require_lint = false # Optional message_format = "conventional" # conventional, simple ``` **Integration Points:** - `daemon/worktree/manager.rs` → `create_checkpoint()` - `daemon/task/manager.rs` → Pre-commit hooks - New file: `daemon/task/commit_validator.rs` --- ## Part 2: Optional Features (Flag-Controlled) These features provide advanced control and should be opt-in via CLI flags or configuration. ### 2.1 Maximum Iterations Limit **Name:** `--max-iterations` **Priority:** HIGH **Flag:** `--max-iterations ` or `-i ` **Description:** Limit the number of autonomous loop iterations before stopping. **Motivation:** - Ralph uses `max_iterations` (default 10) - Prevents runaway loops that waste tokens - Provides predictable behavior **Current State in Makima:** - Circuit breaker has `iteration_count` limit (10) - Not configurable at task/contract level - No per-run override **Implementation Approach:** 1. Add `--max-iterations` flag to contract/task creation 2. Store in task metadata 3. Check count in autonomous loop logic 4. Exit cleanly with message when limit reached **CLI Usage:** ```bash makima contract create --max-iterations 5 "Feature X" makima supervisor spawn "Task" "Plan" --max-iterations 3 ``` **Configuration:** ```toml [autonomous_loop] default_max_iterations = 10 hard_limit = 50 # Absolute maximum ``` **Integration Points:** - `daemon/task/manager.rs` → Loop control - `db/models.rs` → Task field - CLI argument parsing --- ### 2.2 Single-Story-Per-Run Mode **Name:** `--single-task` **Priority:** MEDIUM **Flag:** `--single-task` or `-1` **Description:** Execute exactly one task per Claude invocation, then stop (don't auto-continue). **Motivation:** - Ralph's model: one story per iteration - Ensures complete focus - Creates clean boundaries - Simplifies failure recovery **Current State in Makima:** - Tasks can run multiple iterations - Supervisor can spawn multiple concurrent tasks - No single-task mode **Implementation Approach:** 1. When `--single-task` enabled: - Execute one task - Parse completion gate - Stop regardless of `ready` status - Report status and exit 2. User reviews, then manually continues or adjusts **CLI Usage:** ```bash makima contract create --single-task "Feature X" ``` **Configuration:** ```toml [execution] single_task_mode = false # Default ``` **Integration Points:** - `daemon/task/manager.rs` → Execution loop - `server/handlers/contract_daemon.rs` → Contract options - CLI flags --- ### 2.3 Archive Previous Runs **Name:** `--archive-previous` **Priority:** LOW **Flag:** `--archive-previous` or `--archive` **Description:** When starting a new feature/contract, archive the previous run's artifacts. **Motivation:** - Ralph archives to `archive/YYYY-MM-DD-feature-name/` - Clean separation between features - Preserves history for reference - Prevents context pollution **Current State in Makima:** - Worktrees are per-task but ephemeral - No archiving mechanism - Old task data in database but hard to access **Implementation Approach:** 1. On contract creation with `--archive`: - Find previous contract with same name/goal - Copy key artifacts to `archive/` directory: - progress.log - Final checkpoint - Summary document 2. Archive structure: ``` archive/ └── 2026-01-22-feature-name/ ├── progress.log ├── summary.md └── final-diff.patch ``` **CLI Usage:** ```bash makima contract create --archive-previous "Feature X v2" ``` **Integration Points:** - `daemon/worktree/manager.rs` → Archive logic - `server/handlers/contracts.rs` → Archive on create - New file: `daemon/archive/manager.rs` --- ### 2.4 Require Tests Quality Gate **Name:** `--require-tests` **Priority:** MEDIUM **Flag:** `--require-tests` or `--tests` **Description:** Block task completion unless tests pass. **Motivation:** - Ralph: stories require "Tests pass" in acceptance criteria - Ensures quality before merge - Catches regressions early **Current State in Makima:** - Completion gate is self-reported by Claude - No actual test execution - Circuit breaker is reactive, not proactive **Implementation Approach:** 1. Detect test framework from project: - `package.json` scripts - `pytest`, `cargo test`, etc. 2. Run tests before accepting completion 3. Parse test output for pass/fail 4. If failed: - Don't mark complete - Inject failure info into next prompt - Increment failure counter **CLI Usage:** ```bash makima contract create --require-tests "Feature X" ``` **Configuration:** ```toml [quality_gates] require_tests = false test_command = "npm test" # Auto-detected if not set test_timeout_secs = 300 ``` **Integration Points:** - `daemon/task/completion_gate.rs` → Test validation - `daemon/process/` → Test runner - New file: `daemon/quality/test_runner.rs` --- ### 2.5 PRD Mode **Name:** `--prd-mode` **Priority:** MEDIUM **Flag:** `--prd-mode` or `--prd` **Description:** Enable ralph-style PRD workflow with structured JSON task tracking. **Motivation:** - Ralph's `prd.json` provides clear task breakdown - Structured format aids automation - Priority-based execution - Clear pass/fail tracking **Current State in Makima:** - Plans are free-form text - Task status is in database, not file-based - No structured PRD format **Implementation Approach:** 1. When `--prd-mode` enabled: - Create `prd.json` in worktree: ```json { "project": "Contract Name", "branchName": "makima/feature", "description": "Contract goal", "userStories": [ { "id": "US-001", "title": "Story title", "description": "As a...", "acceptanceCriteria": ["Criterion 1"], "priority": 1, "passes": false, "notes": "" } ] } ``` - Tasks update `passes` field on completion - Supervisor reads PRD to find next incomplete story 2. Sync between database and `prd.json` **CLI Usage:** ```bash makima contract create --prd-mode "Feature X" ``` **Configuration:** ```toml [prd_mode] enabled = false auto_generate_from_plan = true sync_to_database = true ``` **Integration Points:** - `daemon/task/manager.rs` → PRD sync - New file: `daemon/prd/manager.rs` - New file: `daemon/prd/models.rs` --- ### 2.6 Learning Mode **Name:** `--learn` **Priority:** LOW **Flag:** `--learn` or `-l` **Description:** Enable cross-task learning that extracts patterns and improves future prompts. **Motivation:** - Ralph's AGENTS.md consolidates patterns - Learning from success improves future runs - Learning from failure prevents repeating mistakes **Current State in Makima:** - No cross-task learning - Each task starts fresh - Patterns not extracted or reused **Implementation Approach:** 1. On task completion, extract: - Files commonly modified together - Commands that succeeded/failed - Error patterns and solutions 2. Store in `learnings.db` (SQLite) per repository 3. Inject relevant learnings into future task prompts: ``` ## Learned Patterns for this codebase: - Always run `npm run typecheck` before commit - The auth middleware is in src/middleware/auth.ts - Database migrations require `npx prisma generate` after ``` **CLI Usage:** ```bash makima contract create --learn "Feature X" ``` **Configuration:** ```toml [learning] enabled = false extract_file_patterns = true extract_command_patterns = true max_learnings_injected = 10 ``` **Integration Points:** - `daemon/task/manager.rs` → Learning extraction - `daemon/process/claude.rs` → Learning injection - New file: `daemon/learning/extractor.rs` - New file: `daemon/learning/database.rs` --- ### 2.7 Browser Verification for UI **Name:** `--browser-verify` **Priority:** LOW **Flag:** `--browser-verify` or `--ui` **Description:** For UI-related tasks, require browser verification before completion. **Motivation:** - Ralph includes "Verify in browser" as acceptance criteria - Visual verification catches issues that tests miss - Ensures UI actually works, not just compiles **Current State in Makima:** - No browser verification - UI tasks treated same as backend - No visual testing integration **Implementation Approach:** 1. Detect UI-related tasks from file patterns: - `*.tsx`, `*.vue`, `*.svelte` - `components/`, `pages/`, `views/` 2. When completing, prompt for verification: - Launch dev server if needed - Open browser to relevant URL - Wait for user confirmation or screenshot analysis 3. Alternative: Integrate with Playwright for visual testing **CLI Usage:** ```bash makima contract create --browser-verify "Add login page" ``` **Configuration:** ```toml [browser_verify] enabled = false auto_detect_ui_tasks = true dev_server_command = "npm run dev" base_url = "http://localhost:3000" ``` **Integration Points:** - `daemon/task/completion_gate.rs` → Browser check - New file: `daemon/quality/browser_verify.rs` --- ## Part 3: Implementation Priorities ### Phase 1: Foundation (High Priority) 1. **Structured Progress Logging** - Core to context management 2. **Context Recovery Pattern** - Enables stateless iterations 3. **Commit Discipline** - Ensures quality git history 4. **Maximum Iterations Limit** - Prevents runaway loops ### Phase 2: Quality (Medium Priority) 5. **Verifiable Acceptance Criteria** - Improves completion reliability 6. **Dependency-Ordered Execution** - Prevents out-of-order failures 7. **Task Sizing Validation** - Catches too-large tasks early 8. **Require Tests Quality Gate** - Ensures working code 9. **Single-Story Mode** - Ralph's core pattern ### Phase 3: Advanced (Low Priority) 10. **PRD Mode** - Full ralph-style workflow 11. **Learning Mode** - Cross-task intelligence 12. **Archive Previous** - Run isolation 13. **Browser Verification** - UI quality --- ## Part 4: Configuration Summary ### New Configuration File Sections ```toml # makima-daemon.toml additions [progress_log] enabled = true max_entries_injected = 20 consolidation_threshold = 50 [context_recovery] enabled = true include_git_status = true include_recent_progress = true [dependency_ordering] enabled = true auto_detect = true warn_on_violation = true [acceptance_criteria] enabled = true require_verifiable = true auto_append_standard = ["no_uncommitted_changes"] [task_sizing] enabled = true max_files_mentioned = 10 max_plan_words = 500 warn_on_scope_words = ["entire", "all", "complete", "refactor"] [commit_discipline] enabled = true require_tests = false require_lint = false message_format = "conventional" [autonomous_loop] default_max_iterations = 10 hard_limit = 50 [execution] single_task_mode = false [quality_gates] require_tests = false test_command = "" # Auto-detected test_timeout_secs = 300 [prd_mode] enabled = false auto_generate_from_plan = true sync_to_database = true [learning] enabled = false extract_file_patterns = true extract_command_patterns = true max_learnings_injected = 10 [browser_verify] enabled = false auto_detect_ui_tasks = true dev_server_command = "npm run dev" base_url = "http://localhost:3000" ``` ### CLI Flag Summary | Flag | Short | Feature | Default | |------|-------|---------|---------| | `--max-iterations` | `-i` | Iteration limit | 10 | | `--single-task` | `-1` | One task per run | false | | `--archive-previous` | `--archive` | Archive old runs | false | | `--require-tests` | `--tests` | Test quality gate | false | | `--prd-mode` | `--prd` | PRD-style workflow | false | | `--learn` | `-l` | Cross-task learning | false | | `--browser-verify` | `--ui` | UI verification | false | --- ## Part 5: Migration Path ### For Existing Contracts 1. Progress logging starts fresh (no historical data) 2. Context recovery applies to new tasks only 3. Existing tasks not affected by new validation 4. Can opt-in to optional features per-contract ### Backward Compatibility - All opinionated features have graceful defaults - Optional features are off by default - No breaking changes to existing CLI/API - Configuration is additive --- ## Conclusion These features address the core challenges mentioned in the contract goal: - **Manual steering** → Progress logging, context recovery, learning mode - **Context between runs** → Structured persistence, progress.txt pattern - **Handholding** → Verifiable criteria, commit discipline, quality gates The opinionated features make makima more reliable out of the box, while optional features provide ralph-style workflows for users who want them.