diff options
| author | soryu <soryu@soryu.co> | 2026-01-23 02:57:13 +0000 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2026-01-23 02:57:13 +0000 |
| commit | 5595a2fce2e426fd9f1b6224df467a2300f06238 (patch) | |
| tree | 57b4e20335f6dbab641f1474f34d048960802188 /ralph-features-spec.md | |
| parent | 1ed362424dafec690f919154f5116471951cda9c (diff) | |
| download | soryu-5595a2fce2e426fd9f1b6224df467a2300f06238.tar.gz soryu-5595a2fce2e426fd9f1b6224df467a2300f06238.zip | |
docs: Add ralph analysis and feature specification (#22)
- ralph-analysis.md: Comprehensive analysis of ralph repository
- Stateless AI loop pattern with file-based persistence
- prd.json for task tracking, progress.txt for learnings
- AGENTS.md for consolidated patterns
- makima-architecture.md: Analysis of makima's current architecture
- Existing COMPLETION_GATE, circuit breaker, autonomous loop
- Extension points for ralph-inspired features
- ralph-features-spec.md: Detailed feature specification
- 6 opinionated features (always enabled)
- 7 optional features (flag-controlled)
- Implementation priorities and migration path
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Diffstat (limited to 'ralph-features-spec.md')
| -rw-r--r-- | ralph-features-spec.md | 773 |
1 files changed, 773 insertions, 0 deletions
diff --git a/ralph-features-spec.md b/ralph-features-spec.md new file mode 100644 index 0000000..f25a8fe --- /dev/null +++ b/ralph-features-spec.md @@ -0,0 +1,773 @@ +# Ralph-Inspired Features for Makima + +## Overview + +This specification outlines features derived from the [ralph](https://github.com/snarktank/ralph) autonomous AI agent loop system that can be implemented in makima to reduce manual steering and improve context management between runs. + +--- + +## Part 1: Opinionated Features (Always Enabled) + +These features represent best practices that should be core to makima's behavior. + +### 1.1 Structured Progress Logging + +**Name:** `progress-log` +**Priority:** HIGH + +**Description:** +Implement an append-only progress log file (`progress.txt` or similar) that persists learnings, patterns, and context across task iterations. + +**Motivation:** +- Ralph's most powerful feature is its dual-file learning system +- Captures context that survives Claude's context window limits +- Enables pattern discovery over time +- Provides debugging history + +**Current State in Makima:** +- `progress_summary` field exists but is per-task, not persistent +- Task events stored in database but not summarized +- No cross-task learning mechanism + +**Implementation Approach:** +1. Add `progress.log` file to each task's worktree +2. Append structured entries at task completion: + ``` + ## [Timestamp] - Task [ID]: [Name] + - Status: [done/failed] + - Files changed: [list] + - **Learnings:** + - [Pattern discovered] + - [Gotcha encountered] + --- + ``` +3. Inject progress.log contents into new task prompts +4. Periodic consolidation into `AGENTS.md` equivalent + +**Configuration:** +```toml +[progress_log] +enabled = true # Always on +max_entries_injected = 20 # Limit for prompt injection +consolidation_threshold = 50 # Trigger consolidation +``` + +**Integration Points:** +- `daemon/task/manager.rs` → `on_completion()` hook +- `daemon/process/claude.rs` → `inject_system_prompt()` +- New file: `daemon/task/progress_log.rs` + +--- + +### 1.2 Context Recovery Pattern + +**Name:** `context-recovery` +**Priority:** HIGH + +**Description:** +Standardize how context is rebuilt when tasks resume or restart, ensuring Claude can quickly orient itself. + +**Motivation:** +- Ralph's stateless model works because context recovery is systematic +- Each iteration reads from well-defined artifacts +- Reduces confusion and repeated work + +**Current State in Makima:** +- `conversation_state` stored for resumption +- `--continue` flag relies on Claude's session state +- No structured "where we left off" pattern + +**Implementation Approach:** +1. Create standard context recovery header for task prompts: + ``` + ## Context Recovery + - Current branch: [branch name] + - Git status: [uncommitted changes summary] + - Last checkpoint: [timestamp, message] + - Progress log (recent): [last 5 entries] + - Current phase: [research/specify/plan/execute/review] + ``` +2. Auto-generate on task start/resume +3. Include in system prompt before user plan + +**Integration Points:** +- `daemon/task/manager.rs` → `build_context_recovery()` +- `daemon/process/claude.rs` → Prepend to injected prompt +- New file: `daemon/task/context_recovery.rs` + +--- + +### 1.3 Dependency-Ordered Task Execution + +**Name:** `dependency-ordering` +**Priority:** MEDIUM + +**Description:** +Enforce that tasks execute in dependency order: schema changes → backend → UI. + +**Motivation:** +- Ralph explicitly orders stories: database → server → UI → dashboard +- Prevents tasks from failing due to missing dependencies +- Creates clean commit boundaries + +**Current State in Makima:** +- Tasks have `priority` field but no dependency inference +- Supervisors manually order task creation +- No validation of execution order + +**Implementation Approach:** +1. Add `depends_on: Vec<Uuid>` field to tasks +2. Validate dependencies before marking task as runnable +3. Auto-detect dependency patterns: + - Migration files → backend code + - Types/models → consumers + - APIs → UI components +4. Warn if a task seems out of order based on file patterns + +**Configuration:** +```toml +[dependency_ordering] +enabled = true +auto_detect = true +warn_on_violation = true +``` + +**Integration Points:** +- `db/models.rs` → Task model extension +- `daemon/task/manager.rs` → `can_start_task()` validation +- New file: `daemon/task/dependency_analysis.rs` + +--- + +### 1.4 Verifiable Acceptance Criteria + +**Name:** `acceptance-criteria` +**Priority:** MEDIUM + +**Description:** +Require that all tasks have verifiable (not vague) acceptance criteria, and automatically validate them. + +**Motivation:** +- Ralph requires criteria like "Typecheck passes", "Tests pass" +- Prevents "done" status on incomplete work +- Provides clear success definition + +**Current State in Makima:** +- `COMPLETION_GATE` signals readiness +- No structured criteria validation +- Manual interpretation of "ready" + +**Implementation Approach:** +1. Parse task plans for acceptance criteria section +2. Identify verifiable vs vague criteria: + - **Good:** "All tests pass", "No TypeScript errors" + - **Bad:** "Works correctly", "Good UX" +3. Auto-append standard criteria if missing: + - "No uncommitted changes remain" + - "CI/linting passes" (if configured) +4. Validate criteria satisfaction before marking complete + +**Configuration:** +```toml +[acceptance_criteria] +enabled = true +require_verifiable = true +auto_append_standard = ["no_uncommitted_changes", "tests_pass"] +``` + +**Integration Points:** +- `daemon/task/completion_gate.rs` → Extend validation +- `llm/task_output.rs` → Parse criteria from plan +- New file: `daemon/task/criteria_validator.rs` + +--- + +### 1.5 Task Sizing Validation + +**Name:** `task-sizing` +**Priority:** MEDIUM + +**Description:** +Warn or prevent tasks that are likely too large to complete in one context window. + +**Motivation:** +- Ralph's story sizing is crucial: "If you can't describe it in 2-3 sentences, it's too big" +- Large tasks exhaust context, require handoffs +- Smaller tasks = cleaner commits, easier recovery + +**Current State in Makima:** +- No task size estimation +- `auto_handoff` exists but reactive +- Manual task breakdown by supervisors + +**Implementation Approach:** +1. Estimate task complexity from plan text: + - Number of files mentioned + - Scope words ("entire", "all", "refactor") + - Estimated token count +2. Warn if task exceeds thresholds +3. Suggest breakdown for large tasks + +**Thresholds:** +- Files mentioned > 10 → Warning +- Plan length > 500 words → Warning +- Scope words detected → Strong warning + +**Configuration:** +```toml +[task_sizing] +enabled = true +max_files_mentioned = 10 +max_plan_words = 500 +warn_on_scope_words = ["entire", "all", "complete", "refactor"] +``` + +**Integration Points:** +- `daemon/task/manager.rs` → `validate_task_size()` +- Supervisor prompts → Include sizing guidance +- New file: `daemon/task/sizing_validator.rs` + +--- + +### 1.6 Commit Discipline + +**Name:** `commit-discipline` +**Priority:** HIGH + +**Description:** +Enforce structured commit messages and only allow commits when quality checks pass. + +**Motivation:** +- Ralph: "Only commit when tests pass" +- Clean git history aids context recovery +- Structured messages enable automation + +**Current State in Makima:** +- Checkpoints create commits automatically +- No quality gate before commit +- Commit messages not standardized + +**Implementation Approach:** +1. Standardize commit message format: + ``` + feat/fix/chore: [Task ID] - [Summary] + + [Optional body] + + Co-Authored-By: Claude <noreply@anthropic.com> + ``` +2. Run quality checks before checkpoint commit: + - TypeScript/lint (if configured) + - Tests (if configured) +3. Reject commit if checks fail, provide feedback + +**Configuration:** +```toml +[commit_discipline] +enabled = true +require_tests = false # Optional +require_lint = false # Optional +message_format = "conventional" # conventional, simple +``` + +**Integration Points:** +- `daemon/worktree/manager.rs` → `create_checkpoint()` +- `daemon/task/manager.rs` → Pre-commit hooks +- New file: `daemon/task/commit_validator.rs` + +--- + +## Part 2: Optional Features (Flag-Controlled) + +These features provide advanced control and should be opt-in via CLI flags or configuration. + +### 2.1 Maximum Iterations Limit + +**Name:** `--max-iterations` +**Priority:** HIGH +**Flag:** `--max-iterations <N>` or `-i <N>` + +**Description:** +Limit the number of autonomous loop iterations before stopping. + +**Motivation:** +- Ralph uses `max_iterations` (default 10) +- Prevents runaway loops that waste tokens +- Provides predictable behavior + +**Current State in Makima:** +- Circuit breaker has `iteration_count` limit (10) +- Not configurable at task/contract level +- No per-run override + +**Implementation Approach:** +1. Add `--max-iterations` flag to contract/task creation +2. Store in task metadata +3. Check count in autonomous loop logic +4. Exit cleanly with message when limit reached + +**CLI Usage:** +```bash +makima contract create --max-iterations 5 "Feature X" +makima supervisor spawn "Task" "Plan" --max-iterations 3 +``` + +**Configuration:** +```toml +[autonomous_loop] +default_max_iterations = 10 +hard_limit = 50 # Absolute maximum +``` + +**Integration Points:** +- `daemon/task/manager.rs` → Loop control +- `db/models.rs` → Task field +- CLI argument parsing + +--- + +### 2.2 Single-Story-Per-Run Mode + +**Name:** `--single-task` +**Priority:** MEDIUM +**Flag:** `--single-task` or `-1` + +**Description:** +Execute exactly one task per Claude invocation, then stop (don't auto-continue). + +**Motivation:** +- Ralph's model: one story per iteration +- Ensures complete focus +- Creates clean boundaries +- Simplifies failure recovery + +**Current State in Makima:** +- Tasks can run multiple iterations +- Supervisor can spawn multiple concurrent tasks +- No single-task mode + +**Implementation Approach:** +1. When `--single-task` enabled: + - Execute one task + - Parse completion gate + - Stop regardless of `ready` status + - Report status and exit +2. User reviews, then manually continues or adjusts + +**CLI Usage:** +```bash +makima contract create --single-task "Feature X" +``` + +**Configuration:** +```toml +[execution] +single_task_mode = false # Default +``` + +**Integration Points:** +- `daemon/task/manager.rs` → Execution loop +- `server/handlers/contract_daemon.rs` → Contract options +- CLI flags + +--- + +### 2.3 Archive Previous Runs + +**Name:** `--archive-previous` +**Priority:** LOW +**Flag:** `--archive-previous` or `--archive` + +**Description:** +When starting a new feature/contract, archive the previous run's artifacts. + +**Motivation:** +- Ralph archives to `archive/YYYY-MM-DD-feature-name/` +- Clean separation between features +- Preserves history for reference +- Prevents context pollution + +**Current State in Makima:** +- Worktrees are per-task but ephemeral +- No archiving mechanism +- Old task data in database but hard to access + +**Implementation Approach:** +1. On contract creation with `--archive`: + - Find previous contract with same name/goal + - Copy key artifacts to `archive/` directory: + - progress.log + - Final checkpoint + - Summary document +2. Archive structure: + ``` + archive/ + └── 2026-01-22-feature-name/ + ├── progress.log + ├── summary.md + └── final-diff.patch + ``` + +**CLI Usage:** +```bash +makima contract create --archive-previous "Feature X v2" +``` + +**Integration Points:** +- `daemon/worktree/manager.rs` → Archive logic +- `server/handlers/contracts.rs` → Archive on create +- New file: `daemon/archive/manager.rs` + +--- + +### 2.4 Require Tests Quality Gate + +**Name:** `--require-tests` +**Priority:** MEDIUM +**Flag:** `--require-tests` or `--tests` + +**Description:** +Block task completion unless tests pass. + +**Motivation:** +- Ralph: stories require "Tests pass" in acceptance criteria +- Ensures quality before merge +- Catches regressions early + +**Current State in Makima:** +- Completion gate is self-reported by Claude +- No actual test execution +- Circuit breaker is reactive, not proactive + +**Implementation Approach:** +1. Detect test framework from project: + - `package.json` scripts + - `pytest`, `cargo test`, etc. +2. Run tests before accepting completion +3. Parse test output for pass/fail +4. If failed: + - Don't mark complete + - Inject failure info into next prompt + - Increment failure counter + +**CLI Usage:** +```bash +makima contract create --require-tests "Feature X" +``` + +**Configuration:** +```toml +[quality_gates] +require_tests = false +test_command = "npm test" # Auto-detected if not set +test_timeout_secs = 300 +``` + +**Integration Points:** +- `daemon/task/completion_gate.rs` → Test validation +- `daemon/process/` → Test runner +- New file: `daemon/quality/test_runner.rs` + +--- + +### 2.5 PRD Mode + +**Name:** `--prd-mode` +**Priority:** MEDIUM +**Flag:** `--prd-mode` or `--prd` + +**Description:** +Enable ralph-style PRD workflow with structured JSON task tracking. + +**Motivation:** +- Ralph's `prd.json` provides clear task breakdown +- Structured format aids automation +- Priority-based execution +- Clear pass/fail tracking + +**Current State in Makima:** +- Plans are free-form text +- Task status is in database, not file-based +- No structured PRD format + +**Implementation Approach:** +1. When `--prd-mode` enabled: + - Create `prd.json` in worktree: + ```json + { + "project": "Contract Name", + "branchName": "makima/feature", + "description": "Contract goal", + "userStories": [ + { + "id": "US-001", + "title": "Story title", + "description": "As a...", + "acceptanceCriteria": ["Criterion 1"], + "priority": 1, + "passes": false, + "notes": "" + } + ] + } + ``` + - Tasks update `passes` field on completion + - Supervisor reads PRD to find next incomplete story +2. Sync between database and `prd.json` + +**CLI Usage:** +```bash +makima contract create --prd-mode "Feature X" +``` + +**Configuration:** +```toml +[prd_mode] +enabled = false +auto_generate_from_plan = true +sync_to_database = true +``` + +**Integration Points:** +- `daemon/task/manager.rs` → PRD sync +- New file: `daemon/prd/manager.rs` +- New file: `daemon/prd/models.rs` + +--- + +### 2.6 Learning Mode + +**Name:** `--learn` +**Priority:** LOW +**Flag:** `--learn` or `-l` + +**Description:** +Enable cross-task learning that extracts patterns and improves future prompts. + +**Motivation:** +- Ralph's AGENTS.md consolidates patterns +- Learning from success improves future runs +- Learning from failure prevents repeating mistakes + +**Current State in Makima:** +- No cross-task learning +- Each task starts fresh +- Patterns not extracted or reused + +**Implementation Approach:** +1. On task completion, extract: + - Files commonly modified together + - Commands that succeeded/failed + - Error patterns and solutions +2. Store in `learnings.db` (SQLite) per repository +3. Inject relevant learnings into future task prompts: + ``` + ## Learned Patterns for this codebase: + - Always run `npm run typecheck` before commit + - The auth middleware is in src/middleware/auth.ts + - Database migrations require `npx prisma generate` after + ``` + +**CLI Usage:** +```bash +makima contract create --learn "Feature X" +``` + +**Configuration:** +```toml +[learning] +enabled = false +extract_file_patterns = true +extract_command_patterns = true +max_learnings_injected = 10 +``` + +**Integration Points:** +- `daemon/task/manager.rs` → Learning extraction +- `daemon/process/claude.rs` → Learning injection +- New file: `daemon/learning/extractor.rs` +- New file: `daemon/learning/database.rs` + +--- + +### 2.7 Browser Verification for UI + +**Name:** `--browser-verify` +**Priority:** LOW +**Flag:** `--browser-verify` or `--ui` + +**Description:** +For UI-related tasks, require browser verification before completion. + +**Motivation:** +- Ralph includes "Verify in browser" as acceptance criteria +- Visual verification catches issues that tests miss +- Ensures UI actually works, not just compiles + +**Current State in Makima:** +- No browser verification +- UI tasks treated same as backend +- No visual testing integration + +**Implementation Approach:** +1. Detect UI-related tasks from file patterns: + - `*.tsx`, `*.vue`, `*.svelte` + - `components/`, `pages/`, `views/` +2. When completing, prompt for verification: + - Launch dev server if needed + - Open browser to relevant URL + - Wait for user confirmation or screenshot analysis +3. Alternative: Integrate with Playwright for visual testing + +**CLI Usage:** +```bash +makima contract create --browser-verify "Add login page" +``` + +**Configuration:** +```toml +[browser_verify] +enabled = false +auto_detect_ui_tasks = true +dev_server_command = "npm run dev" +base_url = "http://localhost:3000" +``` + +**Integration Points:** +- `daemon/task/completion_gate.rs` → Browser check +- New file: `daemon/quality/browser_verify.rs` + +--- + +## Part 3: Implementation Priorities + +### Phase 1: Foundation (High Priority) +1. **Structured Progress Logging** - Core to context management +2. **Context Recovery Pattern** - Enables stateless iterations +3. **Commit Discipline** - Ensures quality git history +4. **Maximum Iterations Limit** - Prevents runaway loops + +### Phase 2: Quality (Medium Priority) +5. **Verifiable Acceptance Criteria** - Improves completion reliability +6. **Dependency-Ordered Execution** - Prevents out-of-order failures +7. **Task Sizing Validation** - Catches too-large tasks early +8. **Require Tests Quality Gate** - Ensures working code +9. **Single-Story Mode** - Ralph's core pattern + +### Phase 3: Advanced (Low Priority) +10. **PRD Mode** - Full ralph-style workflow +11. **Learning Mode** - Cross-task intelligence +12. **Archive Previous** - Run isolation +13. **Browser Verification** - UI quality + +--- + +## Part 4: Configuration Summary + +### New Configuration File Sections + +```toml +# makima-daemon.toml additions + +[progress_log] +enabled = true +max_entries_injected = 20 +consolidation_threshold = 50 + +[context_recovery] +enabled = true +include_git_status = true +include_recent_progress = true + +[dependency_ordering] +enabled = true +auto_detect = true +warn_on_violation = true + +[acceptance_criteria] +enabled = true +require_verifiable = true +auto_append_standard = ["no_uncommitted_changes"] + +[task_sizing] +enabled = true +max_files_mentioned = 10 +max_plan_words = 500 +warn_on_scope_words = ["entire", "all", "complete", "refactor"] + +[commit_discipline] +enabled = true +require_tests = false +require_lint = false +message_format = "conventional" + +[autonomous_loop] +default_max_iterations = 10 +hard_limit = 50 + +[execution] +single_task_mode = false + +[quality_gates] +require_tests = false +test_command = "" # Auto-detected +test_timeout_secs = 300 + +[prd_mode] +enabled = false +auto_generate_from_plan = true +sync_to_database = true + +[learning] +enabled = false +extract_file_patterns = true +extract_command_patterns = true +max_learnings_injected = 10 + +[browser_verify] +enabled = false +auto_detect_ui_tasks = true +dev_server_command = "npm run dev" +base_url = "http://localhost:3000" +``` + +### CLI Flag Summary + +| Flag | Short | Feature | Default | +|------|-------|---------|---------| +| `--max-iterations` | `-i` | Iteration limit | 10 | +| `--single-task` | `-1` | One task per run | false | +| `--archive-previous` | `--archive` | Archive old runs | false | +| `--require-tests` | `--tests` | Test quality gate | false | +| `--prd-mode` | `--prd` | PRD-style workflow | false | +| `--learn` | `-l` | Cross-task learning | false | +| `--browser-verify` | `--ui` | UI verification | false | + +--- + +## Part 5: Migration Path + +### For Existing Contracts + +1. Progress logging starts fresh (no historical data) +2. Context recovery applies to new tasks only +3. Existing tasks not affected by new validation +4. Can opt-in to optional features per-contract + +### Backward Compatibility + +- All opinionated features have graceful defaults +- Optional features are off by default +- No breaking changes to existing CLI/API +- Configuration is additive + +--- + +## Conclusion + +These features address the core challenges mentioned in the contract goal: +- **Manual steering** → Progress logging, context recovery, learning mode +- **Context between runs** → Structured persistence, progress.txt pattern +- **Handholding** → Verifiable criteria, commit discipline, quality gates + +The opinionated features make makima more reliable out of the box, while optional features provide ralph-style workflows for users who want them. |
