docs: Add ralph analysis and feature specification (#22)

- ralph-analysis.md: Comprehensive analysis of ralph repository - Stateless AI loop pattern with file-based persistence - prd.json for task tracking, progress.txt for learnings - AGENTS.md for consolidated patterns - makima-architecture.md: Analysis of makima's current architecture - Existing COMPLETION_GATE, circuit breaker, autonomous loop - Extension points for ralph-inspired features - ralph-features-spec.md: Detailed feature specification - 6 opinionated features (always enabled) - 7 optional features (flag-controlled) - Implementation priorities and migration path Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
author: soryu <soryu@soryu.co> 2026-01-23 02:57:13 +0000
committer: GitHub <noreply@github.com> 2026-01-23 02:57:13 +0000
commit: 5595a2fce2e426fd9f1b6224df467a2300f06238 (patch)
tree: 57b4e20335f6dbab641f1474f34d048960802188
parent: 1ed362424dafec690f919154f5116471951cda9c (diff)
download: soryu-5595a2fce2e426fd9f1b6224df467a2300f06238.tar.gz
soryu-5595a2fce2e426fd9f1b6224df467a2300f06238.zip
3 files changed, 1731 insertions, 0 deletions
diff --git a/makima-architecture.md b/makima-architecture.md
new file mode 100644
index 0000000..69ab06f
--- /dev/null
+++ b/makima-architecture.md
@@ -0,0 +1,503 @@
+# Makima Architecture Analysis
+
+## Executive Summary
+
+Makima is a distributed task orchestration system for managing AI coding agents (primarily Claude Code instances). It follows a client-server architecture with daemons running on local machines that execute tasks, while a central server coordinates work through contracts. The system already implements several patterns similar to ralph, including completion gates, autonomous loop mode, and circuit breakers.
+
+---
+
+## 1. Current Architecture Overview
+
+### 1.1 High-Level Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                            MAKIMA SERVER (Rust)                              │
+│  ┌─────────────┐  ┌──────────────┐  ┌─────────────┐  ┌──────────────────┐   │
+│  │   REST API  │  │  WebSocket   │  │  PostgreSQL │  │   LLM Tools      │   │
+│  │   Handlers  │  │    Hub       │  │     DB      │  │ (Chat, Analysis) │   │
+│  └─────────────┘  └──────────────┘  └─────────────┘  └──────────────────┘   │
+└─────────────────────────────────────────────────────────────────────────────┘
+                                    │
+                    ┌───────────────┼───────────────┐
+                    ▼               ▼               ▼
+            ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
+            │   DAEMON 1   │ │   DAEMON 2   │ │   DAEMON N   │
+            │  (Worker)    │ │  (Worker)    │ │  (Worker)    │
+            ├──────────────┤ ├──────────────┤ ├──────────────┤
+            │ Task Manager │ │ Task Manager │ │ Task Manager │
+            │   Worktree   │ │   Worktree   │ │   Worktree   │
+            │   Manager    │ │   Manager    │ │   Manager    │
+            │   Process    │ │   Process    │ │   Process    │
+            │   Manager    │ │   Manager    │ │   Manager    │
+            └──────────────┘ └──────────────┘ └──────────────┘
+                    │               │               │
+                    ▼               ▼               ▼
+            ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
+            │ Claude Code  │ │ Claude Code  │ │ Claude Code  │
+            │  Instances   │ │  Instances   │ │  Instances   │
+            └──────────────┘ └──────────────┘ └──────────────┘
+```
+
+### 1.2 Core Components
+
+#### Server-Side (`makima/src/server/`)
+
+| Component | Location | Responsibility |
+|-----------|----------|----------------|
+| **REST API** | `handlers/*.rs` | HTTP endpoints for contracts, tasks, files, mesh operations |
+| **WebSocket Hub** | `handlers/mesh_daemon.rs`, `mesh_ws.rs` | Real-time communication with daemons |
+| **Database** | `../db/` | PostgreSQL via sqlx for persistent state |
+| **Authentication** | `auth.rs` | API key and JWT authentication |
+| **LLM Integration** | `../llm/` | Claude/Groq clients, tool execution |
+
+#### Daemon-Side (`makima/src/daemon/`)
+
+| Component | Location | Responsibility |
+|-----------|----------|----------------|
+| **Task Manager** | `task/manager.rs` | Task lifecycle, concurrency control |
+| **Task State** | `task/state.rs` | State machine for task progression |
+| **Completion Gate** | `task/completion_gate.rs` | Autonomous loop termination logic |
+| **Process Manager** | `process/` | Spawns and manages Claude Code subprocesses |
+| **Worktree Manager** | `worktree/` | Git worktree isolation for tasks |
+| **WebSocket Client** | `ws/` | Bidirectional communication with server |
+| **Local DB** | `db/local.rs` | SQLite for crash recovery |
+| **TUI** | `tui/` | Interactive terminal interface |
+
+---
+
+## 2. Key Components and Responsibilities
+
+### 2.1 Contract System
+
+Contracts are the top-level organizational unit representing a body of work:
+
+```rust
+// From db/models.rs
+pub struct Contract {
+    pub id: Uuid,
+    pub name: String,
+    pub contract_type: String,    // "simple" or "specification"
+    pub phase: String,            // research, specify, plan, execute, review
+    pub status: String,           // active, completed, archived
+    pub supervisor_task_id: Option<Uuid>,
+    pub autonomous_loop: bool,    // Enable auto-restart on incomplete
+    pub phase_guard: bool,        // Require user approval for phase transitions
+}
+```
+
+**Contract Types:**
+- `simple`: Plan → Execute workflow
+- `specification`: Research → Specify → Plan → Execute → Review
+
+### 2.2 Task Orchestration
+
+Tasks represent individual units of work executed by Claude Code:
+
+```rust
+// Simplified from db/models.rs
+pub struct Task {
+    pub id: Uuid,
+    pub contract_id: Option<Uuid>,
+    pub parent_task_id: Option<Uuid>,
+    pub is_supervisor: bool,
+    pub status: String,  // pending, running, paused, blocked, done, failed
+    pub plan: String,
+    pub daemon_id: Option<Uuid>,
+    pub continue_from_task_id: Option<Uuid>,  // For task continuation chains
+    pub conversation_state: Option<serde_json::Value>,  // For resumption
+}
+```
+
+**Task Hierarchy:**
+1. **Supervisor Tasks**: Long-running orchestrators that spawn worker tasks
+2. **Worker Tasks**: Execute specific implementation work
+3. **Subtask Chains**: Tasks can continue from other tasks' worktrees
+
+### 2.3 Task State Machine
+
+```
+                    ┌────────────────┐
+                    │  Initializing  │
+                    └───────┬────────┘
+                            │
+                    ┌───────▼────────┐
+                    │    Starting    │
+                    └───────┬────────┘
+                            │
+            ┌───────────────▼───────────────┐
+            │           Running             │
+            └───────┬───────┬───────┬───────┘
+                    │       │       │
+        ┌───────────▼─┐ ┌───▼───┐ ┌─▼────────┐
+        │   Paused    │ │Blocked│ │Completed │
+        └─────────────┘ └───────┘ └──────────┘
+                    │       │
+                    ▼       ▼
+            ┌──────────────────────┐
+            │  Failed/Interrupted  │
+            └──────────────────────┘
+```
+
+### 2.4 Worktree Isolation
+
+Each task gets its own git worktree, providing:
+- Complete isolation from other tasks
+- Ability to merge changes via git
+- Checkpointing via git commits
+
+```rust
+// From daemon/worktree/mod.rs
+pub struct WorktreeInfo {
+    pub path: PathBuf,
+    pub branch: String,
+    pub task_id: Uuid,
+}
+```
+
+---
+
+## 3. Existing Context Management Mechanisms
+
+### 3.1 Completion Gate (Ralph-Inspired)
+
+The `CompletionGate` system allows tasks to signal completion status:
+
+```rust
+// From daemon/task/completion_gate.rs
+pub struct CompletionGate {
+    pub ready: bool,
+    pub reason: Option<String>,
+    pub progress: Option<String>,
+    pub blockers: Option<Vec<String>>,
+}
+```
+
+**Format in Claude output:**
+```xml
+<COMPLETION_GATE>
+ready: true
+reason: "All tests pass"
+progress: "Implemented feature X"
+</COMPLETION_GATE>
+```
+
+### 3.2 Circuit Breaker
+
+Prevents infinite loops in autonomous mode:
+
+```rust
+pub struct CircuitBreaker {
+    pub runs_without_changes: u32,      // Trips after 3 runs with no changes
+    pub same_error_count: u32,          // Trips after 5 identical errors
+    pub iteration_count: u32,           // Trips after 10 iterations
+    pub is_open: bool,
+    pub open_reason: Option<String>,
+}
+```
+
+### 3.3 Autonomous Loop Mode
+
+When `autonomous_loop: true` on a contract:
+1. Task runs to completion
+2. Output is parsed for `COMPLETION_GATE`
+3. If `ready: false`, task is restarted with `--continue`
+4. Circuit breaker monitors for stuck states
+
+### 3.4 Supervisor State Persistence
+
+```rust
+pub struct SupervisorState {
+    pub conversation_history: serde_json::Value,
+    pub pending_task_ids: Vec<Uuid>,
+    pub phase: String,
+    pub last_activity: DateTime<Utc>,
+}
+```
+
+### 3.5 Conversation Snapshots
+
+```rust
+pub struct ConversationSnapshot {
+    pub task_id: Uuid,
+    pub checkpoint_id: Option<Uuid>,
+    pub snapshot_type: String,  // 'auto', 'manual', 'checkpoint'
+    pub conversation_state: serde_json::Value,
+}
+```
+
+### 3.6 Phase Guidance System
+
+```rust
+// From llm/phase_guidance.rs
+pub struct PhaseDeliverables {
+    pub phase: String,
+    pub recommended_files: Vec<RecommendedFile>,
+    pub requires_repository: bool,
+    pub requires_tasks: bool,
+    pub guidance: String,
+}
+```
+
+---
+
+## 4. Extension Points for Ralph-Inspired Features
+
+### 4.1 Task Manager Hook Points
+
+Location: `daemon/task/manager.rs`
+
+| Hook Point | Current State | Extension Opportunity |
+|------------|---------------|----------------------|
+| `spawn_task()` | Creates worktree, spawns process | Add pre-flight checks, memory injection |
+| `handle_output()` | Streams to server | Enhanced context extraction |
+| `on_completion()` | Cleanup, status update | Post-task analysis, learning |
+| `restart_with_continue()` | Autonomous loop restart | Context summarization |
+
+### 4.2 Process Manager Hook Points
+
+Location: `daemon/process/claude.rs`
+
+| Hook Point | Current State | Extension Opportunity |
+|------------|---------------|----------------------|
+| `build_command()` | Constructs CLI args | Dynamic prompt injection |
+| `inject_system_prompt()` | Static prompts | Context-aware prompting |
+| `parse_output()` | JSON message parsing | Structured output extraction |
+
+### 4.3 Server API Extension Points
+
+Location: `server/handlers/`
+
+| Endpoint Category | Files | Extension Opportunity |
+|-------------------|-------|----------------------|
+| Contract Daemon API | `contract_daemon.rs` | Enhanced progress tracking |
+| Supervisor API | `mesh_supervisor.rs` | Smarter task scheduling |
+| History API | `history.rs` | Learning from past sessions |
+
+### 4.4 Configuration Extension Points
+
+Location: `daemon/config.rs`
+
+```rust
+pub struct ProcessConfig {
+    pub claude_command: String,
+    pub claude_args: Vec<String>,
+    pub env_vars: HashMap<String, String>,
+    // Extension: Add ralph-style configs
+    // pub context_window_size: usize,
+    // pub memory_extraction_enabled: bool,
+    // pub learning_mode: LearningMode,
+}
+```
+
+---
+
+## 5. Current Limitations That Ralph Patterns Could Address
+
+### 5.1 Context Window Management
+
+**Current State:**
+- No explicit context window tracking
+- Conversation history stored but not summarized
+- `--continue` flag relies on Claude's session state
+
+**Ralph Pattern Opportunities:**
+- Token counting per message
+- Automatic context summarization when approaching limits
+- Smart context pruning strategies
+
+### 5.2 Memory and Learning
+
+**Current State:**
+- Output stored in `task_events` table
+- Checkpoints stored with diffs
+- No cross-task learning
+
+**Ralph Pattern Opportunities:**
+- Extract patterns from successful completions
+- Build knowledge base from task outputs
+- Learn from failures to improve future prompts
+
+### 5.3 Progress Tracking
+
+**Current State:**
+- `progress_summary` field on tasks
+- `COMPLETION_GATE` for completion signaling
+- Phase checklist for deliverables
+
+**Ralph Pattern Opportunities:**
+- Structured progress metrics extraction
+- Confidence scoring
+- Automatic milestone detection
+
+### 5.4 Error Recovery
+
+**Current State:**
+- Circuit breaker stops on repeated errors
+- Daemon failover with retry count
+- Manual intervention required for stuck tasks
+
+**Ralph Pattern Opportunities:**
+- Intelligent error classification
+- Automatic recovery strategies
+- Error pattern learning
+
+### 5.5 Task Planning
+
+**Current State:**
+- Human-written plans
+- LLM-generated task breakdowns (via `task_output.rs`)
+- Static orchestrator prompts
+
+**Ralph Pattern Opportunities:**
+- Dynamic plan refinement
+- Dependency inference
+- Resource estimation
+
+---
+
+## 6. Data Flow Diagrams
+
+### 6.1 Task Execution Flow
+
+```
+┌─────────┐    SpawnTask     ┌──────────┐
+│ Server  │ ───────────────► │  Daemon  │
+└─────────┘                  └────┬─────┘
+                                  │
+                        ┌─────────▼─────────┐
+                        │   Task Manager    │
+                        │   - Create worktree│
+                        │   - Setup env vars │
+                        └─────────┬─────────┘
+                                  │
+                        ┌─────────▼─────────┐
+                        │  Process Manager  │
+                        │   - Spawn Claude  │
+                        │   - Inject prompt │
+                        └─────────┬─────────┘
+                                  │
+                        ┌─────────▼─────────┐
+                        │   Claude Code     │
+                        │   - Execute task  │
+                        │   - Stream output │
+                        └─────────┬─────────┘
+                                  │
+              ┌───────────────────┼───────────────────┐
+              ▼                   ▼                   ▼
+       ┌────────────┐    ┌──────────────┐    ┌─────────────┐
+       │ TaskOutput │    │ Checkpoints  │    │ Completion  │
+       │  Events    │    │   (Git)      │    │   Gate      │
+       └────────────┘    └──────────────┘    └─────────────┘
+```
+
+### 6.2 Autonomous Loop Flow
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    AUTONOMOUS LOOP                       │
+└─────────────────────────────────────────────────────────┘
+                            │
+                ┌───────────▼───────────┐
+                │    Execute Task       │
+                └───────────┬───────────┘
+                            │
+                ┌───────────▼───────────┐
+                │  Parse COMPLETION_GATE │
+                └───────────┬───────────┘
+                            │
+            ┌───────────────┼───────────────┐
+            │               │               │
+            ▼               ▼               ▼
+      ready: true    ready: false    No Gate Found
+            │               │               │
+            ▼               ▼               ▼
+        Complete    ┌───────────┐    ┌───────────┐
+                    │  Check    │    │  Circuit  │
+                    │  Circuit  │    │  Breaker  │
+                    │  Breaker  │    │   Trip?   │
+                    └─────┬─────┘    └─────┬─────┘
+                          │                │
+                   ┌──────┴──────┐  ┌──────┴──────┐
+                   ▼             ▼  ▼             ▼
+               Continue       Stop          Continue
+              with Loop                    with Loop
+```
+
+---
+
+## 7. Key Files Reference
+
+### Daemon Core
+- `src/daemon/mod.rs` - Module exports
+- `src/daemon/task/manager.rs` - Task lifecycle (~1200 lines)
+- `src/daemon/task/state.rs` - State machine
+- `src/daemon/task/completion_gate.rs` - Ralph-style completion
+- `src/daemon/process/claude.rs` - Claude subprocess management
+- `src/daemon/worktree/manager.rs` - Git worktree isolation
+- `src/daemon/ws/protocol.rs` - Server-daemon protocol
+
+### Server Core
+- `src/server/mod.rs` - Route configuration
+- `src/server/handlers/mesh_supervisor.rs` - Supervisor operations
+- `src/server/handlers/contract_daemon.rs` - Contract CLI interface
+- `src/server/handlers/history.rs` - Conversation history
+
+### LLM Integration
+- `src/llm/phase_guidance.rs` - Phase deliverables
+- `src/llm/contract_tools.rs` - Contract interaction tools
+- `src/llm/task_output.rs` - Output analysis
+
+### CLI
+- `src/bin/makima.rs` - CLI entry point
+- `src/daemon/cli/mod.rs` - CLI commands
+
+---
+
+## 8. Recommendations for Ralph Integration
+
+### Priority 1: Enhanced Context Management
+- Add token counting to message parsing
+- Implement context summarization triggers
+- Create memory extraction pipeline
+
+### Priority 2: Structured Progress Tracking
+- Extend `COMPLETION_GATE` format
+- Add confidence scoring
+- Implement milestone detection
+
+### Priority 3: Learning System
+- Create patterns database
+- Implement cross-task knowledge sharing
+- Add success/failure analysis
+
+### Priority 4: Improved Error Recovery
+- Classify error types
+- Create recovery playbooks
+- Implement automatic retry strategies
+
+---
+
+## Appendix: Related Configuration
+
+### Daemon Configuration (`makima-daemon.toml`)
+```toml
+[server]
+url = "wss://api.makima.jp"
+api_key = "..."
+
+[process]
+max_concurrent_tasks = 4
+claude_command = "claude"
+heartbeat_commit_interval_secs = 300
+
+[worktree]
+base_dir = "~/.makima/worktrees"
+repos_dir = "~/.makima/repos"
+```
+
+### Environment Variables
+- `MAKIMA_API_KEY` - API authentication
+- `MAKIMA_DAEMON_SERVER_URL` - Server WebSocket URL
+- `MAKIMA_TASK_ID` - Set in task environment
+- `MAKIMA_CONTRACT_ID` - Set in task environment
diff --git a/ralph-analysis.md b/ralph-analysis.md
new file mode 100644
index 0000000..89df62c
--- /dev/null
+++ b/ralph-analysis.md
@@ -0,0 +1,455 @@
+# Ralph Analysis: Autonomous AI Agent Loop System
+
+## Executive Summary
+
+**Ralph** is an autonomous AI agent loop system designed to run AI coding tools (Amp or Claude Code) repeatedly until all Product Requirements Document (PRD) items are complete. Based on [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/), it represents a paradigm for autonomous software development where each iteration spawns a fresh AI instance with clean context, relying on git history, a progress log, and a structured PRD JSON file for persistence between runs.
+
+The core philosophy is simple yet powerful: break work into small, independently completable stories, run AI agents in a loop, and let structured persistence mechanisms carry context forward. This approach solves the fundamental problem of AI context limits by treating each iteration as a stateless worker that reads from and writes to well-defined artifacts.
+
+---
+
+## Architecture Overview
+
+### High-Level Flow
+
+```
+┌──────────────────────────────────────────────────────────────────┐
+│                         SETUP PHASE                              │
+├──────────────────────────────────────────────────────────────────┤
+│  1. User writes a PRD (markdown)                                 │
+│  2. Convert PRD to prd.json (structured user stories)            │
+│  3. Run ralph.sh (starts autonomous loop)                        │
+└──────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌──────────────────────────────────────────────────────────────────┐
+│                         EXECUTION LOOP                           │
+├──────────────────────────────────────────────────────────────────┤
+│  4. AI picks highest priority story where passes: false          │
+│  5. Implements the story (writes code, runs tests)               │
+│  6. Commits changes (if tests pass)                              │
+│  7. Updates prd.json (sets passes: true)                         │
+│  8. Logs learnings to progress.txt                               │
+│  9. Updates AGENTS.md/CLAUDE.md with reusable patterns           │
+│ 10. Check: More stories? → Loop back to step 4                   │
+└──────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌──────────────────────────────────────────────────────────────────┐
+│                         COMPLETION                               │
+├──────────────────────────────────────────────────────────────────┤
+│  Output: <promise>COMPLETE</promise> and exit                    │
+└──────────────────────────────────────────────────────────────────┘
+```
+
+### Core Components
+
+| Component | Purpose | Persistence |
+|-----------|---------|-------------|
+| `ralph.sh` | Bash loop that spawns fresh AI instances | N/A (orchestrator) |
+| `prd.json` | Task list with status tracking | Git-tracked JSON |
+| `progress.txt` | Append-only learnings log | Git-tracked text |
+| `AGENTS.md` / `CLAUDE.md` | Reusable patterns for future iterations | Git-tracked markdown |
+| `prompt.md` | Instructions template for Amp | Static config |
+| Skills (`prd`, `ralph`) | PRD generation and conversion helpers | Static config |
+
+---
+
+## Key Features
+
+### 1. **Stateless Iteration Model**
+
+Each iteration spawns a completely fresh AI instance with no memory of previous work. Context is rebuilt from:
+- Git history (what was committed)
+- `progress.txt` (learnings and context)
+- `prd.json` (which stories are done)
+
+**Key insight**: This sidesteps the AI context window limit by treating each run as independent, with structured artifacts serving as the "memory."
+
+### 2. **Structured Task Management (prd.json)**
+
+```json
+{
+  "project": "MyApp",
+  "branchName": "ralph/task-priority",
+  "description": "Task Priority System - Add priority levels to tasks",
+  "userStories": [
+    {
+      "id": "US-001",
+      "title": "Add priority field to database",
+      "description": "As a developer, I need to store task priority...",
+      "acceptanceCriteria": [
+        "Add priority column to tasks table",
+        "Typecheck passes"
+      ],
+      "priority": 1,
+      "passes": false,
+      "notes": ""
+    }
+  ]
+}
+```
+
+**Design decisions:**
+- Priority-based ordering ensures dependencies are handled correctly
+- `passes: false/true` provides clear completion tracking
+- Acceptance criteria are verifiable (not vague)
+- Stories are sized to fit within one context window
+
+### 3. **Progressive Learning System**
+
+The dual-file learning system distinguishes between:
+
+**`progress.txt`** - Append-only chronological log:
+```
+## [Date/Time] - [Story ID]
+- What was implemented
+- Files changed
+- **Learnings for future iterations:**
+  - Patterns discovered
+  - Gotchas encountered
+  - Useful context
+---
+```
+
+**`AGENTS.md` / `CLAUDE.md`** - Consolidated reusable patterns:
+```
+## Codebase Patterns
+- Use `sql<number>` template for aggregations
+- Always use `IF NOT EXISTS` for migrations
+- Export types from actions.ts for UI components
+```
+
+**Key insight**: Chronological learnings for debugging, consolidated patterns for quick reference.
+
+### 4. **Branch-Based Run Isolation**
+
+- Each feature uses a dedicated branch (`ralph/feature-name`)
+- When starting a new feature, previous runs are archived to `archive/YYYY-MM-DD-feature-name/`
+- Clean separation between features prevents context pollution
+
+### 5. **Quality Feedback Loops**
+
+Ralph requires feedback loops to function:
+- Typecheck catches type errors
+- Tests verify behavior
+- CI must stay green (broken code compounds)
+
+Stories must include verifiable acceptance criteria like "Typecheck passes" and "Tests pass."
+
+### 6. **Browser Verification for UI Stories**
+
+Frontend stories include "Verify in browser using dev-browser skill" as acceptance criteria. This ensures visual verification of UI changes, not just code compilation.
+
+### 7. **Stop Condition Protocol**
+
+The loop terminates when all stories have `passes: true`. The AI outputs:
+```
+<promise>COMPLETE</promise>
+```
+
+This magic string is grep'd by `ralph.sh` to detect completion.
+
+### 8. **Multi-Tool Support**
+
+Ralph supports both Amp and Claude Code:
+```bash
+./ralph.sh --tool amp [max_iterations]   # Default
+./ralph.sh --tool claude [max_iterations]
+```
+
+Each tool has its own prompt template (`prompt.md` for Amp, `CLAUDE.md` for Claude Code).
+
+### 9. **Skills System for PRD Workflow**
+
+Two skills automate PRD creation:
+
+**`prd` skill**: Generates structured PRDs with clarifying questions
+- Asks 3-5 essential questions with lettered options (for quick "1A, 2C, 3B" responses)
+- Creates markdown PRD with user stories, functional requirements, non-goals
+
+**`ralph` skill**: Converts markdown PRDs to JSON
+- Enforces story sizing (completable in one iteration)
+- Orders by dependencies (schema → backend → UI)
+- Adds standard criteria ("Typecheck passes", "Verify in browser")
+
+---
+
+## Notable Patterns and Design Decisions
+
+### 1. **Single Story Per Iteration**
+
+**Design**: Each AI run handles exactly ONE user story, never more.
+
+**Rationale**:
+- Ensures complete focus on a single task
+- Prevents context exhaustion mid-feature
+- Creates clean commit boundaries
+- Simplifies failure recovery (retry a single story, not multiple)
+
+### 2. **Append-Only Progress Log**
+
+**Design**: `progress.txt` is append-only, never overwritten.
+
+**Rationale**:
+- Preserves full history for debugging
+- Enables pattern discovery over time
+- Prevents accidental loss of learnings
+- Supports consolidation into AGENTS.md when patterns emerge
+
+### 3. **Story Sizing Rules**
+
+**Design**: Stories must be small enough for one context window.
+
+**Right-sized examples:**
+- Add a database column and migration
+- Add a UI component to an existing page
+- Update a server action with new logic
+- Add a filter dropdown to a list
+
+**Too big (must split):**
+- "Build the entire dashboard"
+- "Add authentication"
+- "Refactor the API"
+
+**Rule of thumb**: If you can't describe the change in 2-3 sentences, it's too big.
+
+### 4. **Dependency-Ordered Execution**
+
+**Design**: Stories execute in priority order, earlier stories can't depend on later ones.
+
+**Correct order:**
+1. Schema/database changes (migrations)
+2. Server actions / backend logic
+3. UI components that use the backend
+4. Dashboard/summary views that aggregate data
+
+### 5. **Commit Discipline**
+
+**Design**: Only commit when tests pass, with structured messages.
+
+```
+feat: [Story ID] - [Story Title]
+```
+
+**Rationale**: Clean git history provides context recovery for future iterations.
+
+### 6. **Verifiable Acceptance Criteria**
+
+**Design**: Every criterion must be testable, never vague.
+
+**Good**: "Button shows confirmation dialog before deleting"
+**Bad**: "Works correctly", "Good UX", "Handles edge cases"
+
+### 7. **Archiving Previous Runs**
+
+**Design**: When `branchName` changes, archive previous `prd.json` and `progress.txt` to `archive/YYYY-MM-DD-feature-name/`.
+
+**Rationale**: Clean separation between features, preserves history for reference.
+
+---
+
+## Context Management Strategy
+
+Ralph's context management is its most innovative aspect:
+
+### Between Runs (Persistence)
+
+| Mechanism | What It Carries | Format |
+|-----------|-----------------|--------|
+| Git commits | Code changes, file structure | Versioned files |
+| `prd.json` | Task completion status | Structured JSON |
+| `progress.txt` | Learnings, gotchas, patterns | Structured text |
+| `AGENTS.md` | Consolidated reusable patterns | Markdown |
+
+### Within a Run (Instructions)
+
+The AI receives:
+1. Instructions from `prompt.md` or `CLAUDE.md`
+2. The `prd.json` file content
+3. The `progress.txt` file (especially Codebase Patterns section)
+4. Access to read any file via AI tool capabilities
+
+### Context Recovery Pattern
+
+Each iteration:
+1. Reads `progress.txt` Codebase Patterns section first (quick reference)
+2. Reads `prd.json` to find next incomplete story
+3. Checks git branch matches expected branch
+4. Implements story
+5. Appends learnings to `progress.txt`
+6. Optionally consolidates patterns to AGENTS.md
+
+---
+
+## Agent Orchestration Model
+
+### Single-Agent Loop (Not Multi-Agent)
+
+Ralph is NOT a multi-agent system. It's a single-agent loop where:
+- One AI instance runs at a time
+- Each instance is independent (no inter-agent communication)
+- Coordination happens via file-based state (prd.json, progress.txt)
+
+### Orchestration via Bash Script
+
+`ralph.sh` is a simple bash loop:
+```bash
+for i in $(seq 1 $MAX_ITERATIONS); do
+    OUTPUT=$(cat prompt.md | amp --dangerously-allow-all 2>&1 | tee /dev/stderr) || true
+
+    if echo "$OUTPUT" | grep -q "<promise>COMPLETE</promise>"; then
+        echo "Ralph completed all tasks!"
+        exit 0
+    fi
+done
+```
+
+**Key points:**
+- Uses `--dangerously-allow-all` (Amp) or `--dangerously-skip-permissions` (Claude) for autonomous operation
+- Outputs are piped through `tee` for visibility
+- Completion detected via grep for magic string
+- 2-second sleep between iterations
+
+---
+
+## Error Handling and Recovery
+
+### Implicit Error Handling
+
+Ralph has minimal explicit error handling. Instead:
+- If tests fail, the story isn't committed
+- If the AI can't complete a story, it logs learnings and the next iteration retries
+- If max iterations are reached, the script exits with an error
+- Human intervention is expected for complex failures
+
+### Recovery via Progress Log
+
+Failed attempts are documented in `progress.txt`:
+```
+## [Date/Time] - [Story ID]
+- Attempted to implement X
+- Failed because Y
+- **Learnings:**
+  - Don't do Z
+  - Instead try W
+---
+```
+
+The next iteration reads these learnings and avoids the same mistakes.
+
+---
+
+## Configuration and Customization
+
+### Per-Project Customization
+
+After copying the prompt template to your project:
+- Add project-specific quality check commands
+- Include codebase conventions
+- Add common gotchas for your stack
+
+### Amp Auto-Handoff Configuration
+
+For large stories that approach context limits:
+```json
+{
+  "amp.experimental.autoHandoff": { "context": 90 }
+}
+```
+
+This enables automatic handoff when context fills up.
+
+### Iteration Limits
+
+```bash
+./ralph.sh [max_iterations]  # Default: 10
+```
+
+---
+
+## Comparison to Typical Orchestration Approaches
+
+| Aspect | Ralph | Typical Orchestration |
+|--------|-------|----------------------|
+| **Memory** | File-based (git, JSON, text) | In-memory state, databases |
+| **Coordination** | Sequential loop | Often parallel/concurrent |
+| **Agent Communication** | Via files | Direct messaging, queues |
+| **Complexity** | Simple bash script (~100 LOC) | Often complex frameworks |
+| **Failure Recovery** | Retry from last good state | Explicit retry logic, checkpoints |
+| **Context Management** | Fresh context per iteration | Persistent context, context windows |
+| **Task Decomposition** | Pre-planned user stories | Often dynamic planning |
+| **Human Oversight** | Minimal during run | Often requires approval gates |
+
+### Key Differentiators
+
+1. **Simplicity**: Ralph is a bash script, not a framework
+2. **Statelessness**: Each iteration is independent
+3. **Git-Native**: Uses git as the primary state management
+4. **AI-Tool Agnostic**: Works with both Amp and Claude Code
+5. **Human-Readable Artifacts**: All state is in human-readable files
+
+---
+
+## Implications for Makima
+
+### Features to Consider Adopting
+
+1. **Structured PRD-to-JSON workflow** with skills
+2. **Append-only progress logging** for context between runs
+3. **Story sizing enforcement** (completable in one context window)
+4. **Dependency-ordered task execution**
+5. **Branch-based run isolation** with archiving
+6. **Consolidated patterns file** (AGENTS.md equivalent)
+7. **Magic string completion protocol** (`<promise>COMPLETE</promise>`)
+8. **Verifiable acceptance criteria** enforcement
+9. **Browser verification** for UI stories
+
+### Optional Features (Flag-Controlled)
+
+1. `--max-iterations` limit
+2. `--auto-handoff` for context management
+3. `--archive-previous` for run isolation
+4. `--require-tests` for quality gates
+5. `--single-story-per-run` mode
+
+### Opinionated Features
+
+1. Task decomposition must result in context-window-sized stories
+2. Progress logs must be append-only
+3. All commits must pass quality checks
+4. Acceptance criteria must be verifiable
+5. Dependencies must be ordered correctly
+
+---
+
+## Appendix: File Structure Reference
+
+```
+project/
+├── scripts/ralph/
+│   ├── ralph.sh            # Main loop script
+│   ├── prompt.md           # Amp instructions
+│   ├── CLAUDE.md           # Claude Code instructions
+│   ├── prd.json            # Active task list
+│   ├── progress.txt        # Append-only learnings
+│   └── archive/            # Previous run archives
+│       └── YYYY-MM-DD-feature-name/
+│           ├── prd.json
+│           └── progress.txt
+├── skills/
+│   ├── prd/
+│   │   └── SKILL.md        # PRD generation skill
+│   └── ralph/
+│       └── SKILL.md        # PRD-to-JSON conversion skill
+└── AGENTS.md               # Codebase-wide patterns
+```
+
+---
+
+## References
+
+- [Ralph GitHub Repository](https://github.com/snarktank/ralph)
+- [Geoffrey Huntley's Ralph Article](https://ghuntley.com/ralph/)
+- [Amp Documentation](https://ampcode.com/manual)
+- [Claude Code Documentation](https://docs.anthropic.com/en/docs/claude-code)
diff --git a/ralph-features-spec.md b/ralph-features-spec.md
new file mode 100644
index 0000000..f25a8fe
--- /dev/null
+++ b/ralph-features-spec.md
@@ -0,0 +1,773 @@
+# Ralph-Inspired Features for Makima
+
+## Overview
+
+This specification outlines features derived from the [ralph](https://github.com/snarktank/ralph) autonomous AI agent loop system that can be implemented in makima to reduce manual steering and improve context management between runs.
+
+---
+
+## Part 1: Opinionated Features (Always Enabled)
+
+These features represent best practices that should be core to makima's behavior.
+
+### 1.1 Structured Progress Logging
+
+**Name:** `progress-log`
+**Priority:** HIGH
+
+**Description:**
+Implement an append-only progress log file (`progress.txt` or similar) that persists learnings, patterns, and context across task iterations.
+
+**Motivation:**
+- Ralph's most powerful feature is its dual-file learning system
+- Captures context that survives Claude's context window limits
+- Enables pattern discovery over time
+- Provides debugging history
+
+**Current State in Makima:**
+- `progress_summary` field exists but is per-task, not persistent
+- Task events stored in database but not summarized
+- No cross-task learning mechanism
+
+**Implementation Approach:**
+1. Add `progress.log` file to each task's worktree
+2. Append structured entries at task completion:
+   ```
+   ## [Timestamp] - Task [ID]: [Name]
+   - Status: [done/failed]
+   - Files changed: [list]
+   - **Learnings:**
+     - [Pattern discovered]
+     - [Gotcha encountered]
+   ---
+   ```
+3. Inject progress.log contents into new task prompts
+4. Periodic consolidation into `AGENTS.md` equivalent
+
+**Configuration:**
+```toml
+[progress_log]
+enabled = true  # Always on
+max_entries_injected = 20  # Limit for prompt injection
+consolidation_threshold = 50  # Trigger consolidation
+```
+
+**Integration Points:**
+- `daemon/task/manager.rs` → `on_completion()` hook
+- `daemon/process/claude.rs` → `inject_system_prompt()`
+- New file: `daemon/task/progress_log.rs`
+
+---
+
+### 1.2 Context Recovery Pattern
+
+**Name:** `context-recovery`
+**Priority:** HIGH
+
+**Description:**
+Standardize how context is rebuilt when tasks resume or restart, ensuring Claude can quickly orient itself.
+
+**Motivation:**
+- Ralph's stateless model works because context recovery is systematic
+- Each iteration reads from well-defined artifacts
+- Reduces confusion and repeated work
+
+**Current State in Makima:**
+- `conversation_state` stored for resumption
+- `--continue` flag relies on Claude's session state
+- No structured "where we left off" pattern
+
+**Implementation Approach:**
+1. Create standard context recovery header for task prompts:
+   ```
+   ## Context Recovery
+   - Current branch: [branch name]
+   - Git status: [uncommitted changes summary]
+   - Last checkpoint: [timestamp, message]
+   - Progress log (recent): [last 5 entries]
+   - Current phase: [research/specify/plan/execute/review]
+   ```
+2. Auto-generate on task start/resume
+3. Include in system prompt before user plan
+
+**Integration Points:**
+- `daemon/task/manager.rs` → `build_context_recovery()`
+- `daemon/process/claude.rs` → Prepend to injected prompt
+- New file: `daemon/task/context_recovery.rs`
+
+---
+
+### 1.3 Dependency-Ordered Task Execution
+
+**Name:** `dependency-ordering`
+**Priority:** MEDIUM
+
+**Description:**
+Enforce that tasks execute in dependency order: schema changes → backend → UI.
+
+**Motivation:**
+- Ralph explicitly orders stories: database → server → UI → dashboard
+- Prevents tasks from failing due to missing dependencies
+- Creates clean commit boundaries
+
+**Current State in Makima:**
+- Tasks have `priority` field but no dependency inference
+- Supervisors manually order task creation
+- No validation of execution order
+
+**Implementation Approach:**
+1. Add `depends_on: Vec<Uuid>` field to tasks
+2. Validate dependencies before marking task as runnable
+3. Auto-detect dependency patterns:
+   - Migration files → backend code
+   - Types/models → consumers
+   - APIs → UI components
+4. Warn if a task seems out of order based on file patterns
+
+**Configuration:**
+```toml
+[dependency_ordering]
+enabled = true
+auto_detect = true
+warn_on_violation = true
+```
+
+**Integration Points:**
+- `db/models.rs` → Task model extension
+- `daemon/task/manager.rs` → `can_start_task()` validation
+- New file: `daemon/task/dependency_analysis.rs`
+
+---
+
+### 1.4 Verifiable Acceptance Criteria
+
+**Name:** `acceptance-criteria`
+**Priority:** MEDIUM
+
+**Description:**
+Require that all tasks have verifiable (not vague) acceptance criteria, and automatically validate them.
+
+**Motivation:**
+- Ralph requires criteria like "Typecheck passes", "Tests pass"
+- Prevents "done" status on incomplete work
+- Provides clear success definition
+
+**Current State in Makima:**
+- `COMPLETION_GATE` signals readiness
+- No structured criteria validation
+- Manual interpretation of "ready"
+
+**Implementation Approach:**
+1. Parse task plans for acceptance criteria section
+2. Identify verifiable vs vague criteria:
+   - **Good:** "All tests pass", "No TypeScript errors"
+   - **Bad:** "Works correctly", "Good UX"
+3. Auto-append standard criteria if missing:
+   - "No uncommitted changes remain"
+   - "CI/linting passes" (if configured)
+4. Validate criteria satisfaction before marking complete
+
+**Configuration:**
+```toml
+[acceptance_criteria]
+enabled = true
+require_verifiable = true
+auto_append_standard = ["no_uncommitted_changes", "tests_pass"]
+```
+
+**Integration Points:**
+- `daemon/task/completion_gate.rs` → Extend validation
+- `llm/task_output.rs` → Parse criteria from plan
+- New file: `daemon/task/criteria_validator.rs`
+
+---
+
+### 1.5 Task Sizing Validation
+
+**Name:** `task-sizing`
+**Priority:** MEDIUM
+
+**Description:**
+Warn or prevent tasks that are likely too large to complete in one context window.
+
+**Motivation:**
+- Ralph's story sizing is crucial: "If you can't describe it in 2-3 sentences, it's too big"
+- Large tasks exhaust context, require handoffs
+- Smaller tasks = cleaner commits, easier recovery
+
+**Current State in Makima:**
+- No task size estimation
+- `auto_handoff` exists but reactive
+- Manual task breakdown by supervisors
+
+**Implementation Approach:**
+1. Estimate task complexity from plan text:
+   - Number of files mentioned
+   - Scope words ("entire", "all", "refactor")
+   - Estimated token count
+2. Warn if task exceeds thresholds
+3. Suggest breakdown for large tasks
+
+**Thresholds:**
+- Files mentioned > 10 → Warning
+- Plan length > 500 words → Warning
+- Scope words detected → Strong warning
+
+**Configuration:**
+```toml
+[task_sizing]
+enabled = true
+max_files_mentioned = 10
+max_plan_words = 500
+warn_on_scope_words = ["entire", "all", "complete", "refactor"]
+```
+
+**Integration Points:**
+- `daemon/task/manager.rs` → `validate_task_size()`
+- Supervisor prompts → Include sizing guidance
+- New file: `daemon/task/sizing_validator.rs`
+
+---
+
+### 1.6 Commit Discipline
+
+**Name:** `commit-discipline`
+**Priority:** HIGH
+
+**Description:**
+Enforce structured commit messages and only allow commits when quality checks pass.
+
+**Motivation:**
+- Ralph: "Only commit when tests pass"
+- Clean git history aids context recovery
+- Structured messages enable automation
+
+**Current State in Makima:**
+- Checkpoints create commits automatically
+- No quality gate before commit
+- Commit messages not standardized
+
+**Implementation Approach:**
+1. Standardize commit message format:
+   ```
+   feat/fix/chore: [Task ID] - [Summary]
+
+   [Optional body]
+
+   Co-Authored-By: Claude <noreply@anthropic.com>
+   ```
+2. Run quality checks before checkpoint commit:
+   - TypeScript/lint (if configured)
+   - Tests (if configured)
+3. Reject commit if checks fail, provide feedback
+
+**Configuration:**
+```toml
+[commit_discipline]
+enabled = true
+require_tests = false  # Optional
+require_lint = false   # Optional
+message_format = "conventional"  # conventional, simple
+```
+
+**Integration Points:**
+- `daemon/worktree/manager.rs` → `create_checkpoint()`
+- `daemon/task/manager.rs` → Pre-commit hooks
+- New file: `daemon/task/commit_validator.rs`
+
+---
+
+## Part 2: Optional Features (Flag-Controlled)
+
+These features provide advanced control and should be opt-in via CLI flags or configuration.
+
+### 2.1 Maximum Iterations Limit
+
+**Name:** `--max-iterations`
+**Priority:** HIGH
+**Flag:** `--max-iterations <N>` or `-i <N>`
+
+**Description:**
+Limit the number of autonomous loop iterations before stopping.
+
+**Motivation:**
+- Ralph uses `max_iterations` (default 10)
+- Prevents runaway loops that waste tokens
+- Provides predictable behavior
+
+**Current State in Makima:**
+- Circuit breaker has `iteration_count` limit (10)
+- Not configurable at task/contract level
+- No per-run override
+
+**Implementation Approach:**
+1. Add `--max-iterations` flag to contract/task creation
+2. Store in task metadata
+3. Check count in autonomous loop logic
+4. Exit cleanly with message when limit reached
+
+**CLI Usage:**
+```bash
+makima contract create --max-iterations 5 "Feature X"
+makima supervisor spawn "Task" "Plan" --max-iterations 3
+```
+
+**Configuration:**
+```toml
+[autonomous_loop]
+default_max_iterations = 10
+hard_limit = 50  # Absolute maximum
+```
+
+**Integration Points:**
+- `daemon/task/manager.rs` → Loop control
+- `db/models.rs` → Task field
+- CLI argument parsing
+
+---
+
+### 2.2 Single-Story-Per-Run Mode
+
+**Name:** `--single-task`
+**Priority:** MEDIUM
+**Flag:** `--single-task` or `-1`
+
+**Description:**
+Execute exactly one task per Claude invocation, then stop (don't auto-continue).
+
+**Motivation:**
+- Ralph's model: one story per iteration
+- Ensures complete focus
+- Creates clean boundaries
+- Simplifies failure recovery
+
+**Current State in Makima:**
+- Tasks can run multiple iterations
+- Supervisor can spawn multiple concurrent tasks
+- No single-task mode
+
+**Implementation Approach:**
+1. When `--single-task` enabled:
+   - Execute one task
+   - Parse completion gate
+   - Stop regardless of `ready` status
+   - Report status and exit
+2. User reviews, then manually continues or adjusts
+
+**CLI Usage:**
+```bash
+makima contract create --single-task "Feature X"
+```
+
+**Configuration:**
+```toml
+[execution]
+single_task_mode = false  # Default
+```
+
+**Integration Points:**
+- `daemon/task/manager.rs` → Execution loop
+- `server/handlers/contract_daemon.rs` → Contract options
+- CLI flags
+
+---
+
+### 2.3 Archive Previous Runs
+
+**Name:** `--archive-previous`
+**Priority:** LOW
+**Flag:** `--archive-previous` or `--archive`
+
+**Description:**
+When starting a new feature/contract, archive the previous run's artifacts.
+
+**Motivation:**
+- Ralph archives to `archive/YYYY-MM-DD-feature-name/`
+- Clean separation between features
+- Preserves history for reference
+- Prevents context pollution
+
+**Current State in Makima:**
+- Worktrees are per-task but ephemeral
+- No archiving mechanism
+- Old task data in database but hard to access
+
+**Implementation Approach:**
+1. On contract creation with `--archive`:
+   - Find previous contract with same name/goal
+   - Copy key artifacts to `archive/` directory:
+     - progress.log
+     - Final checkpoint
+     - Summary document
+2. Archive structure:
+   ```
+   archive/
+   └── 2026-01-22-feature-name/
+       ├── progress.log
+       ├── summary.md
+       └── final-diff.patch
+   ```
+
+**CLI Usage:**
+```bash
+makima contract create --archive-previous "Feature X v2"
+```
+
+**Integration Points:**
+- `daemon/worktree/manager.rs` → Archive logic
+- `server/handlers/contracts.rs` → Archive on create
+- New file: `daemon/archive/manager.rs`
+
+---
+
+### 2.4 Require Tests Quality Gate
+
+**Name:** `--require-tests`
+**Priority:** MEDIUM
+**Flag:** `--require-tests` or `--tests`
+
+**Description:**
+Block task completion unless tests pass.
+
+**Motivation:**
+- Ralph: stories require "Tests pass" in acceptance criteria
+- Ensures quality before merge
+- Catches regressions early
+
+**Current State in Makima:**
+- Completion gate is self-reported by Claude
+- No actual test execution
+- Circuit breaker is reactive, not proactive
+
+**Implementation Approach:**
+1. Detect test framework from project:
+   - `package.json` scripts
+   - `pytest`, `cargo test`, etc.
+2. Run tests before accepting completion
+3. Parse test output for pass/fail
+4. If failed:
+   - Don't mark complete
+   - Inject failure info into next prompt
+   - Increment failure counter
+
+**CLI Usage:**
+```bash
+makima contract create --require-tests "Feature X"
+```
+
+**Configuration:**
+```toml
+[quality_gates]
+require_tests = false
+test_command = "npm test"  # Auto-detected if not set
+test_timeout_secs = 300
+```
+
+**Integration Points:**
+- `daemon/task/completion_gate.rs` → Test validation
+- `daemon/process/` → Test runner
+- New file: `daemon/quality/test_runner.rs`
+
+---
+
+### 2.5 PRD Mode
+
+**Name:** `--prd-mode`
+**Priority:** MEDIUM
+**Flag:** `--prd-mode` or `--prd`
+
+**Description:**
+Enable ralph-style PRD workflow with structured JSON task tracking.
+
+**Motivation:**
+- Ralph's `prd.json` provides clear task breakdown
+- Structured format aids automation
+- Priority-based execution
+- Clear pass/fail tracking
+
+**Current State in Makima:**
+- Plans are free-form text
+- Task status is in database, not file-based
+- No structured PRD format
+
+**Implementation Approach:**
+1. When `--prd-mode` enabled:
+   - Create `prd.json` in worktree:
+     ```json
+     {
+       "project": "Contract Name",
+       "branchName": "makima/feature",
+       "description": "Contract goal",
+       "userStories": [
+         {
+           "id": "US-001",
+           "title": "Story title",
+           "description": "As a...",
+           "acceptanceCriteria": ["Criterion 1"],
+           "priority": 1,
+           "passes": false,
+           "notes": ""
+         }
+       ]
+     }
+     ```
+   - Tasks update `passes` field on completion
+   - Supervisor reads PRD to find next incomplete story
+2. Sync between database and `prd.json`
+
+**CLI Usage:**
+```bash
+makima contract create --prd-mode "Feature X"
+```
+
+**Configuration:**
+```toml
+[prd_mode]
+enabled = false
+auto_generate_from_plan = true
+sync_to_database = true
+```
+
+**Integration Points:**
+- `daemon/task/manager.rs` → PRD sync
+- New file: `daemon/prd/manager.rs`
+- New file: `daemon/prd/models.rs`
+
+---
+
+### 2.6 Learning Mode
+
+**Name:** `--learn`
+**Priority:** LOW
+**Flag:** `--learn` or `-l`
+
+**Description:**
+Enable cross-task learning that extracts patterns and improves future prompts.
+
+**Motivation:**
+- Ralph's AGENTS.md consolidates patterns
+- Learning from success improves future runs
+- Learning from failure prevents repeating mistakes
+
+**Current State in Makima:**
+- No cross-task learning
+- Each task starts fresh
+- Patterns not extracted or reused
+
+**Implementation Approach:**
+1. On task completion, extract:
+   - Files commonly modified together
+   - Commands that succeeded/failed
+   - Error patterns and solutions
+2. Store in `learnings.db` (SQLite) per repository
+3. Inject relevant learnings into future task prompts:
+   ```
+   ## Learned Patterns for this codebase:
+   - Always run `npm run typecheck` before commit
+   - The auth middleware is in src/middleware/auth.ts
+   - Database migrations require `npx prisma generate` after
+   ```
+
+**CLI Usage:**
+```bash
+makima contract create --learn "Feature X"
+```
+
+**Configuration:**
+```toml
+[learning]
+enabled = false
+extract_file_patterns = true
+extract_command_patterns = true
+max_learnings_injected = 10
+```
+
+**Integration Points:**
+- `daemon/task/manager.rs` → Learning extraction
+- `daemon/process/claude.rs` → Learning injection
+- New file: `daemon/learning/extractor.rs`
+- New file: `daemon/learning/database.rs`
+
+---
+
+### 2.7 Browser Verification for UI
+
+**Name:** `--browser-verify`
+**Priority:** LOW
+**Flag:** `--browser-verify` or `--ui`
+
+**Description:**
+For UI-related tasks, require browser verification before completion.
+
+**Motivation:**
+- Ralph includes "Verify in browser" as acceptance criteria
+- Visual verification catches issues that tests miss
+- Ensures UI actually works, not just compiles
+
+**Current State in Makima:**
+- No browser verification
+- UI tasks treated same as backend
+- No visual testing integration
+
+**Implementation Approach:**
+1. Detect UI-related tasks from file patterns:
+   - `*.tsx`, `*.vue`, `*.svelte`
+   - `components/`, `pages/`, `views/`
+2. When completing, prompt for verification:
+   - Launch dev server if needed
+   - Open browser to relevant URL
+   - Wait for user confirmation or screenshot analysis
+3. Alternative: Integrate with Playwright for visual testing
+
+**CLI Usage:**
+```bash
+makima contract create --browser-verify "Add login page"
+```
+
+**Configuration:**
+```toml
+[browser_verify]
+enabled = false
+auto_detect_ui_tasks = true
+dev_server_command = "npm run dev"
+base_url = "http://localhost:3000"
+```
+
+**Integration Points:**
+- `daemon/task/completion_gate.rs` → Browser check
+- New file: `daemon/quality/browser_verify.rs`
+
+---
+
+## Part 3: Implementation Priorities
+
+### Phase 1: Foundation (High Priority)
+1. **Structured Progress Logging** - Core to context management
+2. **Context Recovery Pattern** - Enables stateless iterations
+3. **Commit Discipline** - Ensures quality git history
+4. **Maximum Iterations Limit** - Prevents runaway loops
+
+### Phase 2: Quality (Medium Priority)
+5. **Verifiable Acceptance Criteria** - Improves completion reliability
+6. **Dependency-Ordered Execution** - Prevents out-of-order failures
+7. **Task Sizing Validation** - Catches too-large tasks early
+8. **Require Tests Quality Gate** - Ensures working code
+9. **Single-Story Mode** - Ralph's core pattern
+
+### Phase 3: Advanced (Low Priority)
+10. **PRD Mode** - Full ralph-style workflow
+11. **Learning Mode** - Cross-task intelligence
+12. **Archive Previous** - Run isolation
+13. **Browser Verification** - UI quality
+
+---
+
+## Part 4: Configuration Summary
+
+### New Configuration File Sections
+
+```toml
+# makima-daemon.toml additions
+
+[progress_log]
+enabled = true
+max_entries_injected = 20
+consolidation_threshold = 50
+
+[context_recovery]
+enabled = true
+include_git_status = true
+include_recent_progress = true
+
+[dependency_ordering]
+enabled = true
+auto_detect = true
+warn_on_violation = true
+
+[acceptance_criteria]
+enabled = true
+require_verifiable = true
+auto_append_standard = ["no_uncommitted_changes"]
+
+[task_sizing]
+enabled = true
+max_files_mentioned = 10
+max_plan_words = 500
+warn_on_scope_words = ["entire", "all", "complete", "refactor"]
+
+[commit_discipline]
+enabled = true
+require_tests = false
+require_lint = false
+message_format = "conventional"
+
+[autonomous_loop]
+default_max_iterations = 10
+hard_limit = 50
+
+[execution]
+single_task_mode = false
+
+[quality_gates]
+require_tests = false
+test_command = ""  # Auto-detected
+test_timeout_secs = 300
+
+[prd_mode]
+enabled = false
+auto_generate_from_plan = true
+sync_to_database = true
+
+[learning]
+enabled = false
+extract_file_patterns = true
+extract_command_patterns = true
+max_learnings_injected = 10
+
+[browser_verify]
+enabled = false
+auto_detect_ui_tasks = true
+dev_server_command = "npm run dev"
+base_url = "http://localhost:3000"
+```
+
+### CLI Flag Summary
+
+| Flag | Short | Feature | Default |
+|------|-------|---------|---------|
+| `--max-iterations` | `-i` | Iteration limit | 10 |
+| `--single-task` | `-1` | One task per run | false |
+| `--archive-previous` | `--archive` | Archive old runs | false |
+| `--require-tests` | `--tests` | Test quality gate | false |
+| `--prd-mode` | `--prd` | PRD-style workflow | false |
+| `--learn` | `-l` | Cross-task learning | false |
+| `--browser-verify` | `--ui` | UI verification | false |
+
+---
+
+## Part 5: Migration Path
+
+### For Existing Contracts
+
+1. Progress logging starts fresh (no historical data)
+2. Context recovery applies to new tasks only
+3. Existing tasks not affected by new validation
+4. Can opt-in to optional features per-contract
+
+### Backward Compatibility
+
+- All opinionated features have graceful defaults
+- Optional features are off by default
+- No breaking changes to existing CLI/API
+- Configuration is additive
+
+---
+
+## Conclusion
+
+These features address the core challenges mentioned in the contract goal:
+- **Manual steering** → Progress logging, context recovery, learning mode
+- **Context between runs** → Structured persistence, progress.txt pattern
+- **Handholding** → Verifiable criteria, commit discipline, quality gates
+
+The opinionated features make makima more reliable out of the box, while optional features provide ralph-style workflows for users who want them.
author	soryu <soryu@soryu.co>	2026-01-23 02:57:13 +0000
committer	GitHub <noreply@github.com>	2026-01-23 02:57:13 +0000
commit	5595a2fce2e426fd9f1b6224df467a2300f06238 (patch)
tree	57b4e20335f6dbab641f1474f34d048960802188
parent	1ed362424dafec690f919154f5116471951cda9c (diff)
download	soryu-5595a2fce2e426fd9f1b6224df467a2300f06238.tar.gz soryu-5595a2fce2e426fd9f1b6224df467a2300f06238.zip