diff options
| author | soryu <soryu@soryu.co> | 2026-01-23 02:57:13 +0000 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2026-01-23 02:57:13 +0000 |
| commit | 5595a2fce2e426fd9f1b6224df467a2300f06238 (patch) | |
| tree | 57b4e20335f6dbab641f1474f34d048960802188 | |
| parent | 1ed362424dafec690f919154f5116471951cda9c (diff) | |
| download | soryu-5595a2fce2e426fd9f1b6224df467a2300f06238.tar.gz soryu-5595a2fce2e426fd9f1b6224df467a2300f06238.zip | |
docs: Add ralph analysis and feature specification (#22)
- ralph-analysis.md: Comprehensive analysis of ralph repository
- Stateless AI loop pattern with file-based persistence
- prd.json for task tracking, progress.txt for learnings
- AGENTS.md for consolidated patterns
- makima-architecture.md: Analysis of makima's current architecture
- Existing COMPLETION_GATE, circuit breaker, autonomous loop
- Extension points for ralph-inspired features
- ralph-features-spec.md: Detailed feature specification
- 6 opinionated features (always enabled)
- 7 optional features (flag-controlled)
- Implementation priorities and migration path
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
| -rw-r--r-- | makima-architecture.md | 503 | ||||
| -rw-r--r-- | ralph-analysis.md | 455 | ||||
| -rw-r--r-- | ralph-features-spec.md | 773 |
3 files changed, 1731 insertions, 0 deletions
diff --git a/makima-architecture.md b/makima-architecture.md new file mode 100644 index 0000000..69ab06f --- /dev/null +++ b/makima-architecture.md @@ -0,0 +1,503 @@ +# Makima Architecture Analysis + +## Executive Summary + +Makima is a distributed task orchestration system for managing AI coding agents (primarily Claude Code instances). It follows a client-server architecture with daemons running on local machines that execute tasks, while a central server coordinates work through contracts. The system already implements several patterns similar to ralph, including completion gates, autonomous loop mode, and circuit breakers. + +--- + +## 1. Current Architecture Overview + +### 1.1 High-Level Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ MAKIMA SERVER (Rust) │ +│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────────────┐ │ +│ │ REST API │ │ WebSocket │ │ PostgreSQL │ │ LLM Tools │ │ +│ │ Handlers │ │ Hub │ │ DB │ │ (Chat, Analysis) │ │ +│ └─────────────┘ └──────────────┘ └─────────────┘ └──────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────────┘ + │ + ┌───────────────┼───────────────┐ + ▼ ▼ ▼ + ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ + │ DAEMON 1 │ │ DAEMON 2 │ │ DAEMON N │ + │ (Worker) │ │ (Worker) │ │ (Worker) │ + ├──────────────┤ ├──────────────┤ ├──────────────┤ + │ Task Manager │ │ Task Manager │ │ Task Manager │ + │ Worktree │ │ Worktree │ │ Worktree │ + │ Manager │ │ Manager │ │ Manager │ + │ Process │ │ Process │ │ Process │ + │ Manager │ │ Manager │ │ Manager │ + └──────────────┘ └──────────────┘ └──────────────┘ + │ │ │ + ▼ ▼ ▼ + ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ + │ Claude Code │ │ Claude Code │ │ Claude Code │ + │ Instances │ │ Instances │ │ Instances │ + └──────────────┘ └──────────────┘ └──────────────┘ +``` + +### 1.2 Core Components + +#### Server-Side (`makima/src/server/`) + +| Component | Location | Responsibility | +|-----------|----------|----------------| +| **REST API** | `handlers/*.rs` | HTTP endpoints for contracts, tasks, files, mesh operations | +| **WebSocket Hub** | `handlers/mesh_daemon.rs`, `mesh_ws.rs` | Real-time communication with daemons | +| **Database** | `../db/` | PostgreSQL via sqlx for persistent state | +| **Authentication** | `auth.rs` | API key and JWT authentication | +| **LLM Integration** | `../llm/` | Claude/Groq clients, tool execution | + +#### Daemon-Side (`makima/src/daemon/`) + +| Component | Location | Responsibility | +|-----------|----------|----------------| +| **Task Manager** | `task/manager.rs` | Task lifecycle, concurrency control | +| **Task State** | `task/state.rs` | State machine for task progression | +| **Completion Gate** | `task/completion_gate.rs` | Autonomous loop termination logic | +| **Process Manager** | `process/` | Spawns and manages Claude Code subprocesses | +| **Worktree Manager** | `worktree/` | Git worktree isolation for tasks | +| **WebSocket Client** | `ws/` | Bidirectional communication with server | +| **Local DB** | `db/local.rs` | SQLite for crash recovery | +| **TUI** | `tui/` | Interactive terminal interface | + +--- + +## 2. Key Components and Responsibilities + +### 2.1 Contract System + +Contracts are the top-level organizational unit representing a body of work: + +```rust +// From db/models.rs +pub struct Contract { + pub id: Uuid, + pub name: String, + pub contract_type: String, // "simple" or "specification" + pub phase: String, // research, specify, plan, execute, review + pub status: String, // active, completed, archived + pub supervisor_task_id: Option<Uuid>, + pub autonomous_loop: bool, // Enable auto-restart on incomplete + pub phase_guard: bool, // Require user approval for phase transitions +} +``` + +**Contract Types:** +- `simple`: Plan → Execute workflow +- `specification`: Research → Specify → Plan → Execute → Review + +### 2.2 Task Orchestration + +Tasks represent individual units of work executed by Claude Code: + +```rust +// Simplified from db/models.rs +pub struct Task { + pub id: Uuid, + pub contract_id: Option<Uuid>, + pub parent_task_id: Option<Uuid>, + pub is_supervisor: bool, + pub status: String, // pending, running, paused, blocked, done, failed + pub plan: String, + pub daemon_id: Option<Uuid>, + pub continue_from_task_id: Option<Uuid>, // For task continuation chains + pub conversation_state: Option<serde_json::Value>, // For resumption +} +``` + +**Task Hierarchy:** +1. **Supervisor Tasks**: Long-running orchestrators that spawn worker tasks +2. **Worker Tasks**: Execute specific implementation work +3. **Subtask Chains**: Tasks can continue from other tasks' worktrees + +### 2.3 Task State Machine + +``` + ┌────────────────┐ + │ Initializing │ + └───────┬────────┘ + │ + ┌───────▼────────┐ + │ Starting │ + └───────┬────────┘ + │ + ┌───────────────▼───────────────┐ + │ Running │ + └───────┬───────┬───────┬───────┘ + │ │ │ + ┌───────────▼─┐ ┌───▼───┐ ┌─▼────────┐ + │ Paused │ │Blocked│ │Completed │ + └─────────────┘ └───────┘ └──────────┘ + │ │ + ▼ ▼ + ┌──────────────────────┐ + │ Failed/Interrupted │ + └──────────────────────┘ +``` + +### 2.4 Worktree Isolation + +Each task gets its own git worktree, providing: +- Complete isolation from other tasks +- Ability to merge changes via git +- Checkpointing via git commits + +```rust +// From daemon/worktree/mod.rs +pub struct WorktreeInfo { + pub path: PathBuf, + pub branch: String, + pub task_id: Uuid, +} +``` + +--- + +## 3. Existing Context Management Mechanisms + +### 3.1 Completion Gate (Ralph-Inspired) + +The `CompletionGate` system allows tasks to signal completion status: + +```rust +// From daemon/task/completion_gate.rs +pub struct CompletionGate { + pub ready: bool, + pub reason: Option<String>, + pub progress: Option<String>, + pub blockers: Option<Vec<String>>, +} +``` + +**Format in Claude output:** +```xml +<COMPLETION_GATE> +ready: true +reason: "All tests pass" +progress: "Implemented feature X" +</COMPLETION_GATE> +``` + +### 3.2 Circuit Breaker + +Prevents infinite loops in autonomous mode: + +```rust +pub struct CircuitBreaker { + pub runs_without_changes: u32, // Trips after 3 runs with no changes + pub same_error_count: u32, // Trips after 5 identical errors + pub iteration_count: u32, // Trips after 10 iterations + pub is_open: bool, + pub open_reason: Option<String>, +} +``` + +### 3.3 Autonomous Loop Mode + +When `autonomous_loop: true` on a contract: +1. Task runs to completion +2. Output is parsed for `COMPLETION_GATE` +3. If `ready: false`, task is restarted with `--continue` +4. Circuit breaker monitors for stuck states + +### 3.4 Supervisor State Persistence + +```rust +pub struct SupervisorState { + pub conversation_history: serde_json::Value, + pub pending_task_ids: Vec<Uuid>, + pub phase: String, + pub last_activity: DateTime<Utc>, +} +``` + +### 3.5 Conversation Snapshots + +```rust +pub struct ConversationSnapshot { + pub task_id: Uuid, + pub checkpoint_id: Option<Uuid>, + pub snapshot_type: String, // 'auto', 'manual', 'checkpoint' + pub conversation_state: serde_json::Value, +} +``` + +### 3.6 Phase Guidance System + +```rust +// From llm/phase_guidance.rs +pub struct PhaseDeliverables { + pub phase: String, + pub recommended_files: Vec<RecommendedFile>, + pub requires_repository: bool, + pub requires_tasks: bool, + pub guidance: String, +} +``` + +--- + +## 4. Extension Points for Ralph-Inspired Features + +### 4.1 Task Manager Hook Points + +Location: `daemon/task/manager.rs` + +| Hook Point | Current State | Extension Opportunity | +|------------|---------------|----------------------| +| `spawn_task()` | Creates worktree, spawns process | Add pre-flight checks, memory injection | +| `handle_output()` | Streams to server | Enhanced context extraction | +| `on_completion()` | Cleanup, status update | Post-task analysis, learning | +| `restart_with_continue()` | Autonomous loop restart | Context summarization | + +### 4.2 Process Manager Hook Points + +Location: `daemon/process/claude.rs` + +| Hook Point | Current State | Extension Opportunity | +|------------|---------------|----------------------| +| `build_command()` | Constructs CLI args | Dynamic prompt injection | +| `inject_system_prompt()` | Static prompts | Context-aware prompting | +| `parse_output()` | JSON message parsing | Structured output extraction | + +### 4.3 Server API Extension Points + +Location: `server/handlers/` + +| Endpoint Category | Files | Extension Opportunity | +|-------------------|-------|----------------------| +| Contract Daemon API | `contract_daemon.rs` | Enhanced progress tracking | +| Supervisor API | `mesh_supervisor.rs` | Smarter task scheduling | +| History API | `history.rs` | Learning from past sessions | + +### 4.4 Configuration Extension Points + +Location: `daemon/config.rs` + +```rust +pub struct ProcessConfig { + pub claude_command: String, + pub claude_args: Vec<String>, + pub env_vars: HashMap<String, String>, + // Extension: Add ralph-style configs + // pub context_window_size: usize, + // pub memory_extraction_enabled: bool, + // pub learning_mode: LearningMode, +} +``` + +--- + +## 5. Current Limitations That Ralph Patterns Could Address + +### 5.1 Context Window Management + +**Current State:** +- No explicit context window tracking +- Conversation history stored but not summarized +- `--continue` flag relies on Claude's session state + +**Ralph Pattern Opportunities:** +- Token counting per message +- Automatic context summarization when approaching limits +- Smart context pruning strategies + +### 5.2 Memory and Learning + +**Current State:** +- Output stored in `task_events` table +- Checkpoints stored with diffs +- No cross-task learning + +**Ralph Pattern Opportunities:** +- Extract patterns from successful completions +- Build knowledge base from task outputs +- Learn from failures to improve future prompts + +### 5.3 Progress Tracking + +**Current State:** +- `progress_summary` field on tasks +- `COMPLETION_GATE` for completion signaling +- Phase checklist for deliverables + +**Ralph Pattern Opportunities:** +- Structured progress metrics extraction +- Confidence scoring +- Automatic milestone detection + +### 5.4 Error Recovery + +**Current State:** +- Circuit breaker stops on repeated errors +- Daemon failover with retry count +- Manual intervention required for stuck tasks + +**Ralph Pattern Opportunities:** +- Intelligent error classification +- Automatic recovery strategies +- Error pattern learning + +### 5.5 Task Planning + +**Current State:** +- Human-written plans +- LLM-generated task breakdowns (via `task_output.rs`) +- Static orchestrator prompts + +**Ralph Pattern Opportunities:** +- Dynamic plan refinement +- Dependency inference +- Resource estimation + +--- + +## 6. Data Flow Diagrams + +### 6.1 Task Execution Flow + +``` +┌─────────┐ SpawnTask ┌──────────┐ +│ Server │ ───────────────► │ Daemon │ +└─────────┘ └────┬─────┘ + │ + ┌─────────▼─────────┐ + │ Task Manager │ + │ - Create worktree│ + │ - Setup env vars │ + └─────────┬─────────┘ + │ + ┌─────────▼─────────┐ + │ Process Manager │ + │ - Spawn Claude │ + │ - Inject prompt │ + └─────────┬─────────┘ + │ + ┌─────────▼─────────┐ + │ Claude Code │ + │ - Execute task │ + │ - Stream output │ + └─────────┬─────────┘ + │ + ┌───────────────────┼───────────────────┐ + ▼ ▼ ▼ + ┌────────────┐ ┌──────────────┐ ┌─────────────┐ + │ TaskOutput │ │ Checkpoints │ │ Completion │ + │ Events │ │ (Git) │ │ Gate │ + └────────────┘ └──────────────┘ └─────────────┘ +``` + +### 6.2 Autonomous Loop Flow + +``` +┌─────────────────────────────────────────────────────────┐ +│ AUTONOMOUS LOOP │ +└─────────────────────────────────────────────────────────┘ + │ + ┌───────────▼───────────┐ + │ Execute Task │ + └───────────┬───────────┘ + │ + ┌───────────▼───────────┐ + │ Parse COMPLETION_GATE │ + └───────────┬───────────┘ + │ + ┌───────────────┼───────────────┐ + │ │ │ + ▼ ▼ ▼ + ready: true ready: false No Gate Found + │ │ │ + ▼ ▼ ▼ + Complete ┌───────────┐ ┌───────────┐ + │ Check │ │ Circuit │ + │ Circuit │ │ Breaker │ + │ Breaker │ │ Trip? │ + └─────┬─────┘ └─────┬─────┘ + │ │ + ┌──────┴──────┐ ┌──────┴──────┐ + ▼ ▼ ▼ ▼ + Continue Stop Continue + with Loop with Loop +``` + +--- + +## 7. Key Files Reference + +### Daemon Core +- `src/daemon/mod.rs` - Module exports +- `src/daemon/task/manager.rs` - Task lifecycle (~1200 lines) +- `src/daemon/task/state.rs` - State machine +- `src/daemon/task/completion_gate.rs` - Ralph-style completion +- `src/daemon/process/claude.rs` - Claude subprocess management +- `src/daemon/worktree/manager.rs` - Git worktree isolation +- `src/daemon/ws/protocol.rs` - Server-daemon protocol + +### Server Core +- `src/server/mod.rs` - Route configuration +- `src/server/handlers/mesh_supervisor.rs` - Supervisor operations +- `src/server/handlers/contract_daemon.rs` - Contract CLI interface +- `src/server/handlers/history.rs` - Conversation history + +### LLM Integration +- `src/llm/phase_guidance.rs` - Phase deliverables +- `src/llm/contract_tools.rs` - Contract interaction tools +- `src/llm/task_output.rs` - Output analysis + +### CLI +- `src/bin/makima.rs` - CLI entry point +- `src/daemon/cli/mod.rs` - CLI commands + +--- + +## 8. Recommendations for Ralph Integration + +### Priority 1: Enhanced Context Management +- Add token counting to message parsing +- Implement context summarization triggers +- Create memory extraction pipeline + +### Priority 2: Structured Progress Tracking +- Extend `COMPLETION_GATE` format +- Add confidence scoring +- Implement milestone detection + +### Priority 3: Learning System +- Create patterns database +- Implement cross-task knowledge sharing +- Add success/failure analysis + +### Priority 4: Improved Error Recovery +- Classify error types +- Create recovery playbooks +- Implement automatic retry strategies + +--- + +## Appendix: Related Configuration + +### Daemon Configuration (`makima-daemon.toml`) +```toml +[server] +url = "wss://api.makima.jp" +api_key = "..." + +[process] +max_concurrent_tasks = 4 +claude_command = "claude" +heartbeat_commit_interval_secs = 300 + +[worktree] +base_dir = "~/.makima/worktrees" +repos_dir = "~/.makima/repos" +``` + +### Environment Variables +- `MAKIMA_API_KEY` - API authentication +- `MAKIMA_DAEMON_SERVER_URL` - Server WebSocket URL +- `MAKIMA_TASK_ID` - Set in task environment +- `MAKIMA_CONTRACT_ID` - Set in task environment diff --git a/ralph-analysis.md b/ralph-analysis.md new file mode 100644 index 0000000..89df62c --- /dev/null +++ b/ralph-analysis.md @@ -0,0 +1,455 @@ +# Ralph Analysis: Autonomous AI Agent Loop System + +## Executive Summary + +**Ralph** is an autonomous AI agent loop system designed to run AI coding tools (Amp or Claude Code) repeatedly until all Product Requirements Document (PRD) items are complete. Based on [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/), it represents a paradigm for autonomous software development where each iteration spawns a fresh AI instance with clean context, relying on git history, a progress log, and a structured PRD JSON file for persistence between runs. + +The core philosophy is simple yet powerful: break work into small, independently completable stories, run AI agents in a loop, and let structured persistence mechanisms carry context forward. This approach solves the fundamental problem of AI context limits by treating each iteration as a stateless worker that reads from and writes to well-defined artifacts. + +--- + +## Architecture Overview + +### High-Level Flow + +``` +┌──────────────────────────────────────────────────────────────────┐ +│ SETUP PHASE │ +├──────────────────────────────────────────────────────────────────┤ +│ 1. User writes a PRD (markdown) │ +│ 2. Convert PRD to prd.json (structured user stories) │ +│ 3. Run ralph.sh (starts autonomous loop) │ +└──────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌──────────────────────────────────────────────────────────────────┐ +│ EXECUTION LOOP │ +├──────────────────────────────────────────────────────────────────┤ +│ 4. AI picks highest priority story where passes: false │ +│ 5. Implements the story (writes code, runs tests) │ +│ 6. Commits changes (if tests pass) │ +│ 7. Updates prd.json (sets passes: true) │ +│ 8. Logs learnings to progress.txt │ +│ 9. Updates AGENTS.md/CLAUDE.md with reusable patterns │ +│ 10. Check: More stories? → Loop back to step 4 │ +└──────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌──────────────────────────────────────────────────────────────────┐ +│ COMPLETION │ +├──────────────────────────────────────────────────────────────────┤ +│ Output: <promise>COMPLETE</promise> and exit │ +└──────────────────────────────────────────────────────────────────┘ +``` + +### Core Components + +| Component | Purpose | Persistence | +|-----------|---------|-------------| +| `ralph.sh` | Bash loop that spawns fresh AI instances | N/A (orchestrator) | +| `prd.json` | Task list with status tracking | Git-tracked JSON | +| `progress.txt` | Append-only learnings log | Git-tracked text | +| `AGENTS.md` / `CLAUDE.md` | Reusable patterns for future iterations | Git-tracked markdown | +| `prompt.md` | Instructions template for Amp | Static config | +| Skills (`prd`, `ralph`) | PRD generation and conversion helpers | Static config | + +--- + +## Key Features + +### 1. **Stateless Iteration Model** + +Each iteration spawns a completely fresh AI instance with no memory of previous work. Context is rebuilt from: +- Git history (what was committed) +- `progress.txt` (learnings and context) +- `prd.json` (which stories are done) + +**Key insight**: This sidesteps the AI context window limit by treating each run as independent, with structured artifacts serving as the "memory." + +### 2. **Structured Task Management (prd.json)** + +```json +{ + "project": "MyApp", + "branchName": "ralph/task-priority", + "description": "Task Priority System - Add priority levels to tasks", + "userStories": [ + { + "id": "US-001", + "title": "Add priority field to database", + "description": "As a developer, I need to store task priority...", + "acceptanceCriteria": [ + "Add priority column to tasks table", + "Typecheck passes" + ], + "priority": 1, + "passes": false, + "notes": "" + } + ] +} +``` + +**Design decisions:** +- Priority-based ordering ensures dependencies are handled correctly +- `passes: false/true` provides clear completion tracking +- Acceptance criteria are verifiable (not vague) +- Stories are sized to fit within one context window + +### 3. **Progressive Learning System** + +The dual-file learning system distinguishes between: + +**`progress.txt`** - Append-only chronological log: +``` +## [Date/Time] - [Story ID] +- What was implemented +- Files changed +- **Learnings for future iterations:** + - Patterns discovered + - Gotchas encountered + - Useful context +--- +``` + +**`AGENTS.md` / `CLAUDE.md`** - Consolidated reusable patterns: +``` +## Codebase Patterns +- Use `sql<number>` template for aggregations +- Always use `IF NOT EXISTS` for migrations +- Export types from actions.ts for UI components +``` + +**Key insight**: Chronological learnings for debugging, consolidated patterns for quick reference. + +### 4. **Branch-Based Run Isolation** + +- Each feature uses a dedicated branch (`ralph/feature-name`) +- When starting a new feature, previous runs are archived to `archive/YYYY-MM-DD-feature-name/` +- Clean separation between features prevents context pollution + +### 5. **Quality Feedback Loops** + +Ralph requires feedback loops to function: +- Typecheck catches type errors +- Tests verify behavior +- CI must stay green (broken code compounds) + +Stories must include verifiable acceptance criteria like "Typecheck passes" and "Tests pass." + +### 6. **Browser Verification for UI Stories** + +Frontend stories include "Verify in browser using dev-browser skill" as acceptance criteria. This ensures visual verification of UI changes, not just code compilation. + +### 7. **Stop Condition Protocol** + +The loop terminates when all stories have `passes: true`. The AI outputs: +``` +<promise>COMPLETE</promise> +``` + +This magic string is grep'd by `ralph.sh` to detect completion. + +### 8. **Multi-Tool Support** + +Ralph supports both Amp and Claude Code: +```bash +./ralph.sh --tool amp [max_iterations] # Default +./ralph.sh --tool claude [max_iterations] +``` + +Each tool has its own prompt template (`prompt.md` for Amp, `CLAUDE.md` for Claude Code). + +### 9. **Skills System for PRD Workflow** + +Two skills automate PRD creation: + +**`prd` skill**: Generates structured PRDs with clarifying questions +- Asks 3-5 essential questions with lettered options (for quick "1A, 2C, 3B" responses) +- Creates markdown PRD with user stories, functional requirements, non-goals + +**`ralph` skill**: Converts markdown PRDs to JSON +- Enforces story sizing (completable in one iteration) +- Orders by dependencies (schema → backend → UI) +- Adds standard criteria ("Typecheck passes", "Verify in browser") + +--- + +## Notable Patterns and Design Decisions + +### 1. **Single Story Per Iteration** + +**Design**: Each AI run handles exactly ONE user story, never more. + +**Rationale**: +- Ensures complete focus on a single task +- Prevents context exhaustion mid-feature +- Creates clean commit boundaries +- Simplifies failure recovery (retry a single story, not multiple) + +### 2. **Append-Only Progress Log** + +**Design**: `progress.txt` is append-only, never overwritten. + +**Rationale**: +- Preserves full history for debugging +- Enables pattern discovery over time +- Prevents accidental loss of learnings +- Supports consolidation into AGENTS.md when patterns emerge + +### 3. **Story Sizing Rules** + +**Design**: Stories must be small enough for one context window. + +**Right-sized examples:** +- Add a database column and migration +- Add a UI component to an existing page +- Update a server action with new logic +- Add a filter dropdown to a list + +**Too big (must split):** +- "Build the entire dashboard" +- "Add authentication" +- "Refactor the API" + +**Rule of thumb**: If you can't describe the change in 2-3 sentences, it's too big. + +### 4. **Dependency-Ordered Execution** + +**Design**: Stories execute in priority order, earlier stories can't depend on later ones. + +**Correct order:** +1. Schema/database changes (migrations) +2. Server actions / backend logic +3. UI components that use the backend +4. Dashboard/summary views that aggregate data + +### 5. **Commit Discipline** + +**Design**: Only commit when tests pass, with structured messages. + +``` +feat: [Story ID] - [Story Title] +``` + +**Rationale**: Clean git history provides context recovery for future iterations. + +### 6. **Verifiable Acceptance Criteria** + +**Design**: Every criterion must be testable, never vague. + +**Good**: "Button shows confirmation dialog before deleting" +**Bad**: "Works correctly", "Good UX", "Handles edge cases" + +### 7. **Archiving Previous Runs** + +**Design**: When `branchName` changes, archive previous `prd.json` and `progress.txt` to `archive/YYYY-MM-DD-feature-name/`. + +**Rationale**: Clean separation between features, preserves history for reference. + +--- + +## Context Management Strategy + +Ralph's context management is its most innovative aspect: + +### Between Runs (Persistence) + +| Mechanism | What It Carries | Format | +|-----------|-----------------|--------| +| Git commits | Code changes, file structure | Versioned files | +| `prd.json` | Task completion status | Structured JSON | +| `progress.txt` | Learnings, gotchas, patterns | Structured text | +| `AGENTS.md` | Consolidated reusable patterns | Markdown | + +### Within a Run (Instructions) + +The AI receives: +1. Instructions from `prompt.md` or `CLAUDE.md` +2. The `prd.json` file content +3. The `progress.txt` file (especially Codebase Patterns section) +4. Access to read any file via AI tool capabilities + +### Context Recovery Pattern + +Each iteration: +1. Reads `progress.txt` Codebase Patterns section first (quick reference) +2. Reads `prd.json` to find next incomplete story +3. Checks git branch matches expected branch +4. Implements story +5. Appends learnings to `progress.txt` +6. Optionally consolidates patterns to AGENTS.md + +--- + +## Agent Orchestration Model + +### Single-Agent Loop (Not Multi-Agent) + +Ralph is NOT a multi-agent system. It's a single-agent loop where: +- One AI instance runs at a time +- Each instance is independent (no inter-agent communication) +- Coordination happens via file-based state (prd.json, progress.txt) + +### Orchestration via Bash Script + +`ralph.sh` is a simple bash loop: +```bash +for i in $(seq 1 $MAX_ITERATIONS); do + OUTPUT=$(cat prompt.md | amp --dangerously-allow-all 2>&1 | tee /dev/stderr) || true + + if echo "$OUTPUT" | grep -q "<promise>COMPLETE</promise>"; then + echo "Ralph completed all tasks!" + exit 0 + fi +done +``` + +**Key points:** +- Uses `--dangerously-allow-all` (Amp) or `--dangerously-skip-permissions` (Claude) for autonomous operation +- Outputs are piped through `tee` for visibility +- Completion detected via grep for magic string +- 2-second sleep between iterations + +--- + +## Error Handling and Recovery + +### Implicit Error Handling + +Ralph has minimal explicit error handling. Instead: +- If tests fail, the story isn't committed +- If the AI can't complete a story, it logs learnings and the next iteration retries +- If max iterations are reached, the script exits with an error +- Human intervention is expected for complex failures + +### Recovery via Progress Log + +Failed attempts are documented in `progress.txt`: +``` +## [Date/Time] - [Story ID] +- Attempted to implement X +- Failed because Y +- **Learnings:** + - Don't do Z + - Instead try W +--- +``` + +The next iteration reads these learnings and avoids the same mistakes. + +--- + +## Configuration and Customization + +### Per-Project Customization + +After copying the prompt template to your project: +- Add project-specific quality check commands +- Include codebase conventions +- Add common gotchas for your stack + +### Amp Auto-Handoff Configuration + +For large stories that approach context limits: +```json +{ + "amp.experimental.autoHandoff": { "context": 90 } +} +``` + +This enables automatic handoff when context fills up. + +### Iteration Limits + +```bash +./ralph.sh [max_iterations] # Default: 10 +``` + +--- + +## Comparison to Typical Orchestration Approaches + +| Aspect | Ralph | Typical Orchestration | +|--------|-------|----------------------| +| **Memory** | File-based (git, JSON, text) | In-memory state, databases | +| **Coordination** | Sequential loop | Often parallel/concurrent | +| **Agent Communication** | Via files | Direct messaging, queues | +| **Complexity** | Simple bash script (~100 LOC) | Often complex frameworks | +| **Failure Recovery** | Retry from last good state | Explicit retry logic, checkpoints | +| **Context Management** | Fresh context per iteration | Persistent context, context windows | +| **Task Decomposition** | Pre-planned user stories | Often dynamic planning | +| **Human Oversight** | Minimal during run | Often requires approval gates | + +### Key Differentiators + +1. **Simplicity**: Ralph is a bash script, not a framework +2. **Statelessness**: Each iteration is independent +3. **Git-Native**: Uses git as the primary state management +4. **AI-Tool Agnostic**: Works with both Amp and Claude Code +5. **Human-Readable Artifacts**: All state is in human-readable files + +--- + +## Implications for Makima + +### Features to Consider Adopting + +1. **Structured PRD-to-JSON workflow** with skills +2. **Append-only progress logging** for context between runs +3. **Story sizing enforcement** (completable in one context window) +4. **Dependency-ordered task execution** +5. **Branch-based run isolation** with archiving +6. **Consolidated patterns file** (AGENTS.md equivalent) +7. **Magic string completion protocol** (`<promise>COMPLETE</promise>`) +8. **Verifiable acceptance criteria** enforcement +9. **Browser verification** for UI stories + +### Optional Features (Flag-Controlled) + +1. `--max-iterations` limit +2. `--auto-handoff` for context management +3. `--archive-previous` for run isolation +4. `--require-tests` for quality gates +5. `--single-story-per-run` mode + +### Opinionated Features + +1. Task decomposition must result in context-window-sized stories +2. Progress logs must be append-only +3. All commits must pass quality checks +4. Acceptance criteria must be verifiable +5. Dependencies must be ordered correctly + +--- + +## Appendix: File Structure Reference + +``` +project/ +├── scripts/ralph/ +│ ├── ralph.sh # Main loop script +│ ├── prompt.md # Amp instructions +│ ├── CLAUDE.md # Claude Code instructions +│ ├── prd.json # Active task list +│ ├── progress.txt # Append-only learnings +│ └── archive/ # Previous run archives +│ └── YYYY-MM-DD-feature-name/ +│ ├── prd.json +│ └── progress.txt +├── skills/ +│ ├── prd/ +│ │ └── SKILL.md # PRD generation skill +│ └── ralph/ +│ └── SKILL.md # PRD-to-JSON conversion skill +└── AGENTS.md # Codebase-wide patterns +``` + +--- + +## References + +- [Ralph GitHub Repository](https://github.com/snarktank/ralph) +- [Geoffrey Huntley's Ralph Article](https://ghuntley.com/ralph/) +- [Amp Documentation](https://ampcode.com/manual) +- [Claude Code Documentation](https://docs.anthropic.com/en/docs/claude-code) diff --git a/ralph-features-spec.md b/ralph-features-spec.md new file mode 100644 index 0000000..f25a8fe --- /dev/null +++ b/ralph-features-spec.md @@ -0,0 +1,773 @@ +# Ralph-Inspired Features for Makima + +## Overview + +This specification outlines features derived from the [ralph](https://github.com/snarktank/ralph) autonomous AI agent loop system that can be implemented in makima to reduce manual steering and improve context management between runs. + +--- + +## Part 1: Opinionated Features (Always Enabled) + +These features represent best practices that should be core to makima's behavior. + +### 1.1 Structured Progress Logging + +**Name:** `progress-log` +**Priority:** HIGH + +**Description:** +Implement an append-only progress log file (`progress.txt` or similar) that persists learnings, patterns, and context across task iterations. + +**Motivation:** +- Ralph's most powerful feature is its dual-file learning system +- Captures context that survives Claude's context window limits +- Enables pattern discovery over time +- Provides debugging history + +**Current State in Makima:** +- `progress_summary` field exists but is per-task, not persistent +- Task events stored in database but not summarized +- No cross-task learning mechanism + +**Implementation Approach:** +1. Add `progress.log` file to each task's worktree +2. Append structured entries at task completion: + ``` + ## [Timestamp] - Task [ID]: [Name] + - Status: [done/failed] + - Files changed: [list] + - **Learnings:** + - [Pattern discovered] + - [Gotcha encountered] + --- + ``` +3. Inject progress.log contents into new task prompts +4. Periodic consolidation into `AGENTS.md` equivalent + +**Configuration:** +```toml +[progress_log] +enabled = true # Always on +max_entries_injected = 20 # Limit for prompt injection +consolidation_threshold = 50 # Trigger consolidation +``` + +**Integration Points:** +- `daemon/task/manager.rs` → `on_completion()` hook +- `daemon/process/claude.rs` → `inject_system_prompt()` +- New file: `daemon/task/progress_log.rs` + +--- + +### 1.2 Context Recovery Pattern + +**Name:** `context-recovery` +**Priority:** HIGH + +**Description:** +Standardize how context is rebuilt when tasks resume or restart, ensuring Claude can quickly orient itself. + +**Motivation:** +- Ralph's stateless model works because context recovery is systematic +- Each iteration reads from well-defined artifacts +- Reduces confusion and repeated work + +**Current State in Makima:** +- `conversation_state` stored for resumption +- `--continue` flag relies on Claude's session state +- No structured "where we left off" pattern + +**Implementation Approach:** +1. Create standard context recovery header for task prompts: + ``` + ## Context Recovery + - Current branch: [branch name] + - Git status: [uncommitted changes summary] + - Last checkpoint: [timestamp, message] + - Progress log (recent): [last 5 entries] + - Current phase: [research/specify/plan/execute/review] + ``` +2. Auto-generate on task start/resume +3. Include in system prompt before user plan + +**Integration Points:** +- `daemon/task/manager.rs` → `build_context_recovery()` +- `daemon/process/claude.rs` → Prepend to injected prompt +- New file: `daemon/task/context_recovery.rs` + +--- + +### 1.3 Dependency-Ordered Task Execution + +**Name:** `dependency-ordering` +**Priority:** MEDIUM + +**Description:** +Enforce that tasks execute in dependency order: schema changes → backend → UI. + +**Motivation:** +- Ralph explicitly orders stories: database → server → UI → dashboard +- Prevents tasks from failing due to missing dependencies +- Creates clean commit boundaries + +**Current State in Makima:** +- Tasks have `priority` field but no dependency inference +- Supervisors manually order task creation +- No validation of execution order + +**Implementation Approach:** +1. Add `depends_on: Vec<Uuid>` field to tasks +2. Validate dependencies before marking task as runnable +3. Auto-detect dependency patterns: + - Migration files → backend code + - Types/models → consumers + - APIs → UI components +4. Warn if a task seems out of order based on file patterns + +**Configuration:** +```toml +[dependency_ordering] +enabled = true +auto_detect = true +warn_on_violation = true +``` + +**Integration Points:** +- `db/models.rs` → Task model extension +- `daemon/task/manager.rs` → `can_start_task()` validation +- New file: `daemon/task/dependency_analysis.rs` + +--- + +### 1.4 Verifiable Acceptance Criteria + +**Name:** `acceptance-criteria` +**Priority:** MEDIUM + +**Description:** +Require that all tasks have verifiable (not vague) acceptance criteria, and automatically validate them. + +**Motivation:** +- Ralph requires criteria like "Typecheck passes", "Tests pass" +- Prevents "done" status on incomplete work +- Provides clear success definition + +**Current State in Makima:** +- `COMPLETION_GATE` signals readiness +- No structured criteria validation +- Manual interpretation of "ready" + +**Implementation Approach:** +1. Parse task plans for acceptance criteria section +2. Identify verifiable vs vague criteria: + - **Good:** "All tests pass", "No TypeScript errors" + - **Bad:** "Works correctly", "Good UX" +3. Auto-append standard criteria if missing: + - "No uncommitted changes remain" + - "CI/linting passes" (if configured) +4. Validate criteria satisfaction before marking complete + +**Configuration:** +```toml +[acceptance_criteria] +enabled = true +require_verifiable = true +auto_append_standard = ["no_uncommitted_changes", "tests_pass"] +``` + +**Integration Points:** +- `daemon/task/completion_gate.rs` → Extend validation +- `llm/task_output.rs` → Parse criteria from plan +- New file: `daemon/task/criteria_validator.rs` + +--- + +### 1.5 Task Sizing Validation + +**Name:** `task-sizing` +**Priority:** MEDIUM + +**Description:** +Warn or prevent tasks that are likely too large to complete in one context window. + +**Motivation:** +- Ralph's story sizing is crucial: "If you can't describe it in 2-3 sentences, it's too big" +- Large tasks exhaust context, require handoffs +- Smaller tasks = cleaner commits, easier recovery + +**Current State in Makima:** +- No task size estimation +- `auto_handoff` exists but reactive +- Manual task breakdown by supervisors + +**Implementation Approach:** +1. Estimate task complexity from plan text: + - Number of files mentioned + - Scope words ("entire", "all", "refactor") + - Estimated token count +2. Warn if task exceeds thresholds +3. Suggest breakdown for large tasks + +**Thresholds:** +- Files mentioned > 10 → Warning +- Plan length > 500 words → Warning +- Scope words detected → Strong warning + +**Configuration:** +```toml +[task_sizing] +enabled = true +max_files_mentioned = 10 +max_plan_words = 500 +warn_on_scope_words = ["entire", "all", "complete", "refactor"] +``` + +**Integration Points:** +- `daemon/task/manager.rs` → `validate_task_size()` +- Supervisor prompts → Include sizing guidance +- New file: `daemon/task/sizing_validator.rs` + +--- + +### 1.6 Commit Discipline + +**Name:** `commit-discipline` +**Priority:** HIGH + +**Description:** +Enforce structured commit messages and only allow commits when quality checks pass. + +**Motivation:** +- Ralph: "Only commit when tests pass" +- Clean git history aids context recovery +- Structured messages enable automation + +**Current State in Makima:** +- Checkpoints create commits automatically +- No quality gate before commit +- Commit messages not standardized + +**Implementation Approach:** +1. Standardize commit message format: + ``` + feat/fix/chore: [Task ID] - [Summary] + + [Optional body] + + Co-Authored-By: Claude <noreply@anthropic.com> + ``` +2. Run quality checks before checkpoint commit: + - TypeScript/lint (if configured) + - Tests (if configured) +3. Reject commit if checks fail, provide feedback + +**Configuration:** +```toml +[commit_discipline] +enabled = true +require_tests = false # Optional +require_lint = false # Optional +message_format = "conventional" # conventional, simple +``` + +**Integration Points:** +- `daemon/worktree/manager.rs` → `create_checkpoint()` +- `daemon/task/manager.rs` → Pre-commit hooks +- New file: `daemon/task/commit_validator.rs` + +--- + +## Part 2: Optional Features (Flag-Controlled) + +These features provide advanced control and should be opt-in via CLI flags or configuration. + +### 2.1 Maximum Iterations Limit + +**Name:** `--max-iterations` +**Priority:** HIGH +**Flag:** `--max-iterations <N>` or `-i <N>` + +**Description:** +Limit the number of autonomous loop iterations before stopping. + +**Motivation:** +- Ralph uses `max_iterations` (default 10) +- Prevents runaway loops that waste tokens +- Provides predictable behavior + +**Current State in Makima:** +- Circuit breaker has `iteration_count` limit (10) +- Not configurable at task/contract level +- No per-run override + +**Implementation Approach:** +1. Add `--max-iterations` flag to contract/task creation +2. Store in task metadata +3. Check count in autonomous loop logic +4. Exit cleanly with message when limit reached + +**CLI Usage:** +```bash +makima contract create --max-iterations 5 "Feature X" +makima supervisor spawn "Task" "Plan" --max-iterations 3 +``` + +**Configuration:** +```toml +[autonomous_loop] +default_max_iterations = 10 +hard_limit = 50 # Absolute maximum +``` + +**Integration Points:** +- `daemon/task/manager.rs` → Loop control +- `db/models.rs` → Task field +- CLI argument parsing + +--- + +### 2.2 Single-Story-Per-Run Mode + +**Name:** `--single-task` +**Priority:** MEDIUM +**Flag:** `--single-task` or `-1` + +**Description:** +Execute exactly one task per Claude invocation, then stop (don't auto-continue). + +**Motivation:** +- Ralph's model: one story per iteration +- Ensures complete focus +- Creates clean boundaries +- Simplifies failure recovery + +**Current State in Makima:** +- Tasks can run multiple iterations +- Supervisor can spawn multiple concurrent tasks +- No single-task mode + +**Implementation Approach:** +1. When `--single-task` enabled: + - Execute one task + - Parse completion gate + - Stop regardless of `ready` status + - Report status and exit +2. User reviews, then manually continues or adjusts + +**CLI Usage:** +```bash +makima contract create --single-task "Feature X" +``` + +**Configuration:** +```toml +[execution] +single_task_mode = false # Default +``` + +**Integration Points:** +- `daemon/task/manager.rs` → Execution loop +- `server/handlers/contract_daemon.rs` → Contract options +- CLI flags + +--- + +### 2.3 Archive Previous Runs + +**Name:** `--archive-previous` +**Priority:** LOW +**Flag:** `--archive-previous` or `--archive` + +**Description:** +When starting a new feature/contract, archive the previous run's artifacts. + +**Motivation:** +- Ralph archives to `archive/YYYY-MM-DD-feature-name/` +- Clean separation between features +- Preserves history for reference +- Prevents context pollution + +**Current State in Makima:** +- Worktrees are per-task but ephemeral +- No archiving mechanism +- Old task data in database but hard to access + +**Implementation Approach:** +1. On contract creation with `--archive`: + - Find previous contract with same name/goal + - Copy key artifacts to `archive/` directory: + - progress.log + - Final checkpoint + - Summary document +2. Archive structure: + ``` + archive/ + └── 2026-01-22-feature-name/ + ├── progress.log + ├── summary.md + └── final-diff.patch + ``` + +**CLI Usage:** +```bash +makima contract create --archive-previous "Feature X v2" +``` + +**Integration Points:** +- `daemon/worktree/manager.rs` → Archive logic +- `server/handlers/contracts.rs` → Archive on create +- New file: `daemon/archive/manager.rs` + +--- + +### 2.4 Require Tests Quality Gate + +**Name:** `--require-tests` +**Priority:** MEDIUM +**Flag:** `--require-tests` or `--tests` + +**Description:** +Block task completion unless tests pass. + +**Motivation:** +- Ralph: stories require "Tests pass" in acceptance criteria +- Ensures quality before merge +- Catches regressions early + +**Current State in Makima:** +- Completion gate is self-reported by Claude +- No actual test execution +- Circuit breaker is reactive, not proactive + +**Implementation Approach:** +1. Detect test framework from project: + - `package.json` scripts + - `pytest`, `cargo test`, etc. +2. Run tests before accepting completion +3. Parse test output for pass/fail +4. If failed: + - Don't mark complete + - Inject failure info into next prompt + - Increment failure counter + +**CLI Usage:** +```bash +makima contract create --require-tests "Feature X" +``` + +**Configuration:** +```toml +[quality_gates] +require_tests = false +test_command = "npm test" # Auto-detected if not set +test_timeout_secs = 300 +``` + +**Integration Points:** +- `daemon/task/completion_gate.rs` → Test validation +- `daemon/process/` → Test runner +- New file: `daemon/quality/test_runner.rs` + +--- + +### 2.5 PRD Mode + +**Name:** `--prd-mode` +**Priority:** MEDIUM +**Flag:** `--prd-mode` or `--prd` + +**Description:** +Enable ralph-style PRD workflow with structured JSON task tracking. + +**Motivation:** +- Ralph's `prd.json` provides clear task breakdown +- Structured format aids automation +- Priority-based execution +- Clear pass/fail tracking + +**Current State in Makima:** +- Plans are free-form text +- Task status is in database, not file-based +- No structured PRD format + +**Implementation Approach:** +1. When `--prd-mode` enabled: + - Create `prd.json` in worktree: + ```json + { + "project": "Contract Name", + "branchName": "makima/feature", + "description": "Contract goal", + "userStories": [ + { + "id": "US-001", + "title": "Story title", + "description": "As a...", + "acceptanceCriteria": ["Criterion 1"], + "priority": 1, + "passes": false, + "notes": "" + } + ] + } + ``` + - Tasks update `passes` field on completion + - Supervisor reads PRD to find next incomplete story +2. Sync between database and `prd.json` + +**CLI Usage:** +```bash +makima contract create --prd-mode "Feature X" +``` + +**Configuration:** +```toml +[prd_mode] +enabled = false +auto_generate_from_plan = true +sync_to_database = true +``` + +**Integration Points:** +- `daemon/task/manager.rs` → PRD sync +- New file: `daemon/prd/manager.rs` +- New file: `daemon/prd/models.rs` + +--- + +### 2.6 Learning Mode + +**Name:** `--learn` +**Priority:** LOW +**Flag:** `--learn` or `-l` + +**Description:** +Enable cross-task learning that extracts patterns and improves future prompts. + +**Motivation:** +- Ralph's AGENTS.md consolidates patterns +- Learning from success improves future runs +- Learning from failure prevents repeating mistakes + +**Current State in Makima:** +- No cross-task learning +- Each task starts fresh +- Patterns not extracted or reused + +**Implementation Approach:** +1. On task completion, extract: + - Files commonly modified together + - Commands that succeeded/failed + - Error patterns and solutions +2. Store in `learnings.db` (SQLite) per repository +3. Inject relevant learnings into future task prompts: + ``` + ## Learned Patterns for this codebase: + - Always run `npm run typecheck` before commit + - The auth middleware is in src/middleware/auth.ts + - Database migrations require `npx prisma generate` after + ``` + +**CLI Usage:** +```bash +makima contract create --learn "Feature X" +``` + +**Configuration:** +```toml +[learning] +enabled = false +extract_file_patterns = true +extract_command_patterns = true +max_learnings_injected = 10 +``` + +**Integration Points:** +- `daemon/task/manager.rs` → Learning extraction +- `daemon/process/claude.rs` → Learning injection +- New file: `daemon/learning/extractor.rs` +- New file: `daemon/learning/database.rs` + +--- + +### 2.7 Browser Verification for UI + +**Name:** `--browser-verify` +**Priority:** LOW +**Flag:** `--browser-verify` or `--ui` + +**Description:** +For UI-related tasks, require browser verification before completion. + +**Motivation:** +- Ralph includes "Verify in browser" as acceptance criteria +- Visual verification catches issues that tests miss +- Ensures UI actually works, not just compiles + +**Current State in Makima:** +- No browser verification +- UI tasks treated same as backend +- No visual testing integration + +**Implementation Approach:** +1. Detect UI-related tasks from file patterns: + - `*.tsx`, `*.vue`, `*.svelte` + - `components/`, `pages/`, `views/` +2. When completing, prompt for verification: + - Launch dev server if needed + - Open browser to relevant URL + - Wait for user confirmation or screenshot analysis +3. Alternative: Integrate with Playwright for visual testing + +**CLI Usage:** +```bash +makima contract create --browser-verify "Add login page" +``` + +**Configuration:** +```toml +[browser_verify] +enabled = false +auto_detect_ui_tasks = true +dev_server_command = "npm run dev" +base_url = "http://localhost:3000" +``` + +**Integration Points:** +- `daemon/task/completion_gate.rs` → Browser check +- New file: `daemon/quality/browser_verify.rs` + +--- + +## Part 3: Implementation Priorities + +### Phase 1: Foundation (High Priority) +1. **Structured Progress Logging** - Core to context management +2. **Context Recovery Pattern** - Enables stateless iterations +3. **Commit Discipline** - Ensures quality git history +4. **Maximum Iterations Limit** - Prevents runaway loops + +### Phase 2: Quality (Medium Priority) +5. **Verifiable Acceptance Criteria** - Improves completion reliability +6. **Dependency-Ordered Execution** - Prevents out-of-order failures +7. **Task Sizing Validation** - Catches too-large tasks early +8. **Require Tests Quality Gate** - Ensures working code +9. **Single-Story Mode** - Ralph's core pattern + +### Phase 3: Advanced (Low Priority) +10. **PRD Mode** - Full ralph-style workflow +11. **Learning Mode** - Cross-task intelligence +12. **Archive Previous** - Run isolation +13. **Browser Verification** - UI quality + +--- + +## Part 4: Configuration Summary + +### New Configuration File Sections + +```toml +# makima-daemon.toml additions + +[progress_log] +enabled = true +max_entries_injected = 20 +consolidation_threshold = 50 + +[context_recovery] +enabled = true +include_git_status = true +include_recent_progress = true + +[dependency_ordering] +enabled = true +auto_detect = true +warn_on_violation = true + +[acceptance_criteria] +enabled = true +require_verifiable = true +auto_append_standard = ["no_uncommitted_changes"] + +[task_sizing] +enabled = true +max_files_mentioned = 10 +max_plan_words = 500 +warn_on_scope_words = ["entire", "all", "complete", "refactor"] + +[commit_discipline] +enabled = true +require_tests = false +require_lint = false +message_format = "conventional" + +[autonomous_loop] +default_max_iterations = 10 +hard_limit = 50 + +[execution] +single_task_mode = false + +[quality_gates] +require_tests = false +test_command = "" # Auto-detected +test_timeout_secs = 300 + +[prd_mode] +enabled = false +auto_generate_from_plan = true +sync_to_database = true + +[learning] +enabled = false +extract_file_patterns = true +extract_command_patterns = true +max_learnings_injected = 10 + +[browser_verify] +enabled = false +auto_detect_ui_tasks = true +dev_server_command = "npm run dev" +base_url = "http://localhost:3000" +``` + +### CLI Flag Summary + +| Flag | Short | Feature | Default | +|------|-------|---------|---------| +| `--max-iterations` | `-i` | Iteration limit | 10 | +| `--single-task` | `-1` | One task per run | false | +| `--archive-previous` | `--archive` | Archive old runs | false | +| `--require-tests` | `--tests` | Test quality gate | false | +| `--prd-mode` | `--prd` | PRD-style workflow | false | +| `--learn` | `-l` | Cross-task learning | false | +| `--browser-verify` | `--ui` | UI verification | false | + +--- + +## Part 5: Migration Path + +### For Existing Contracts + +1. Progress logging starts fresh (no historical data) +2. Context recovery applies to new tasks only +3. Existing tasks not affected by new validation +4. Can opt-in to optional features per-contract + +### Backward Compatibility + +- All opinionated features have graceful defaults +- Optional features are off by default +- No breaking changes to existing CLI/API +- Configuration is additive + +--- + +## Conclusion + +These features address the core challenges mentioned in the contract goal: +- **Manual steering** → Progress logging, context recovery, learning mode +- **Context between runs** → Structured persistence, progress.txt pattern +- **Handholding** → Verifiable criteria, commit discipline, quality gates + +The opinionated features make makima more reliable out of the box, while optional features provide ralph-style workflows for users who want them. |
