From 76bb9da745f6c12c8e7e587a9096677bbf98f395 Mon Sep 17 00:00:00 2001
From: soryu <soryu@soryu.co>
Date: Mon, 9 Feb 2026 16:51:59 +0000
Subject: Add compound engineering feature proposals for makima (#58)

Analyze the compound engineering plugin (https://github.com/EveryInc/compound-engineering-plugin)
and propose 6 features inspired by its patterns for adoption into makima:

- Multi-agent parallel review system (spawn-group/wait-group)
- Knowledge accumulation / compound learning phase
- Parallel plan deepening with research agents
- Workflow presets / pipeline templates (LFG-style one-command pipelines)
- Structured findings tracking with severity and lifecycle
- Reusable task templates with meta-commands

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/proposals/feature-multi-agent-review.md | 448 +++++++++++++++++++++++++++
 1 file changed, 448 insertions(+)
 create mode 100644 docs/proposals/feature-multi-agent-review.md

(limited to 'docs/proposals/feature-multi-agent-review.md')

diff --git a/docs/proposals/feature-multi-agent-review.md b/docs/proposals/feature-multi-agent-review.md
new file mode 100644
index 0000000..d678756
--- /dev/null
+++ b/docs/proposals/feature-multi-agent-review.md
@@ -0,0 +1,448 @@
+# Feature Proposal: Multi-Agent Parallel Review System
+
+> **Priority:** High
+> **Complexity:** Medium
+> **Estimated Effort:** 12-18 days
+> **Status:** Proposal
+> **Date:** 2026-02-09
+> **Dependencies:** [Findings Tracking](feature-findings-tracking.md) (recommended)
+> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Workflow Presets](feature-workflow-presets.md)
+
+---
+
+## Problem Statement
+
+Makima's contract lifecycle includes a **Review** phase, but it currently has:
+
+- **No automated review mechanism** — the review phase relies entirely on manual user inspection or a single supervisor task
+- **Single-perspective review** — even when a review task is spawned, it examines code from one viewpoint
+- **No structured review output** — findings are captured as unstructured text in task output
+- **No review templates** — each review must be configured from scratch
+- **No synthesis** — when multiple reviewers exist, there's no mechanism to deduplicate and prioritize findings
+
+For complex contracts touching security, performance, and architecture, a single-pass review consistently misses category-specific issues that specialized reviewers would catch.
+
+---
+
+## How Compound Engineering Solves This
+
+The compound engineering plugin spawns **12-15 specialized review agents in parallel**, each examining the code from a unique perspective:
+
+| Agent | Focus Area | Example Findings |
+|-------|-----------|-----------------|
+| Security Sentinel | Auth, injection, secrets, CSRF | SQL injection in user input handler |
+| Performance Oracle | N+1 queries, memory leaks, caching | Unbounded list growth in event handler |
+| Architecture Strategist | Coupling, SOLID, layering | Service directly accessing repository internals |
+| Code Philosopher | Readability, naming, complexity | Cyclomatic complexity > 15 in payment flow |
+| Data Integrity Guardian | Validation, constraints, migrations | Missing NOT NULL constraint on required field |
+| Error Resilience Analyzer | Error handling, retries, fallbacks | Unhandled timeout in external API call |
+| API Contract Validator | Breaking changes, versioning | Removed required field from response |
+| Dependency Health Checker | Vulnerabilities, licensing, freshness | CVE-2025-XXXX in transitive dependency |
+| Test Coverage Analyzer | Coverage gaps, edge cases, mocking | No tests for error path in checkout flow |
+| Documentation Completeness | Docs accuracy, examples, changelog | Public API endpoint undocumented |
+| Concurrency Safety | Race conditions, deadlocks, atomicity | Non-atomic read-modify-write on shared counter |
+
+After all agents complete, a **synthesis agent** deduplicates findings, resolves contradictions, and produces a prioritized report.
+
+```
+┌───────────────────────────────────────────────────────┐
+│                  Review Orchestrator                   │
+│                                                       │
+│  spawn-group "review"                                 │
+│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐   │
+│  │Security │ │ Perf    │ │  Arch   │ │  Code   │   │
+│  │Sentinel │ │ Oracle  │ │Strategy │ │  Phil   │   │
+│  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘   │
+│       │           │           │           │         │
+│  ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐   │
+│  │  Data   │ │ Error   │ │  API    │ │  Deps   │   │
+│  │Guardian │ │Resilien.│ │Contract │ │ Health  │   │
+│  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘   │
+│       │           │           │           │         │
+│  ┌────┴────┐ ┌────┴────┐ ┌────┴────┐               │
+│  │  Test   │ │  Docs   │ │Concurr. │               │
+│  │Coverage │ │Complete │ │ Safety  │               │
+│  └────┬────┘ └────┬────┘ └────┬────┘               │
+│       │           │           │                     │
+│  wait-group "review"                                 │
+│       ▼           ▼           ▼                     │
+│  ┌──────────────────────────────────────────┐       │
+│  │         Synthesis Agent                   │       │
+│  │  - Deduplicate findings                   │       │
+│  │  - Resolve contradictions                 │       │
+│  │  - Prioritize by severity                 │       │
+│  │  - Generate summary report                │       │
+│  └──────────────────────────────────────────┘       │
+│                      │                               │
+│                      ▼                               │
+│             Structured Findings                      │
+│             (P1 / P2 / P3)                           │
+└───────────────────────────────────────────────────────┘
+```
+
+---
+
+## Proposed Makima Implementation
+
+### 1. New Supervisor Commands
+
+#### `makima supervisor spawn-group`
+
+Spawns multiple tasks as a named group and returns immediately:
+
+```bash
+# Spawn a review group with 5 agents
+makima supervisor spawn-group "review" \
+  --tasks '[
+    {"name": "security-review", "plan": "Review for security vulnerabilities..."},
+    {"name": "performance-review", "plan": "Review for performance issues..."},
+    {"name": "architecture-review", "plan": "Review for architecture concerns..."}
+  ]' \
+  --share-worktree \
+  --read-only
+```
+
+**Key parameters:**
+- `--tasks` — JSON array of task definitions
+- `--share-worktree` — All tasks in the group share the supervisor's worktree (read-only access)
+- `--read-only` — Tasks cannot modify files, only produce output
+- `--max-concurrent N` — Limit parallel execution (default: unlimited)
+
+#### `makima supervisor wait-group`
+
+Waits for all tasks in a named group to complete:
+
+```bash
+# Wait for all review tasks, timeout after 10 minutes
+makima supervisor wait-group "review" --timeout 600
+
+# Returns JSON with all task results
+```
+
+**Output format:**
+```json
+{
+  "group": "review",
+  "status": "completed",
+  "tasks": [
+    {"name": "security-review", "status": "done", "output": "..."},
+    {"name": "performance-review", "status": "done", "output": "..."}
+  ],
+  "duration_seconds": 127
+}
+```
+
+#### `makima supervisor review`
+
+High-level command that orchestrates the full review pipeline:
+
+```bash
+# Run review with default agent config
+makima supervisor review
+
+# Run review with custom config
+makima supervisor review --config .makima/review-agents.yaml
+
+# Run only specific review categories
+makima supervisor review --only security,performance,architecture
+```
+
+### 2. Review Agent Configuration
+
+#### Repository-Level Configuration (`.makima/review-agents.yaml`)
+
+```yaml
+# .makima/review-agents.yaml
+version: 1
+review:
+  # Maximum number of concurrent review agents
+  max_concurrent: 8
+
+  # Timeout per agent (seconds)
+  agent_timeout: 300
+
+  # Auto-trigger review when phase transitions to 'review'
+  auto_trigger: true
+
+  # Finding severity that blocks merge
+  merge_blocking_severity: P1
+
+  agents:
+    - name: security-sentinel
+      enabled: true
+      plan: |
+        You are a Security Sentinel reviewing code changes.
+
+        Focus areas:
+        - Authentication and authorization flaws
+        - Injection vulnerabilities (SQL, XSS, command injection)
+        - Secret/credential exposure
+        - CSRF and session management
+        - Input validation gaps
+
+        Output format: One finding per section with severity (P1/P2/P3),
+        affected file/line, description, and suggested fix.
+      priority: critical  # Always runs
+
+    - name: performance-oracle
+      enabled: true
+      plan: |
+        You are a Performance Oracle reviewing code changes.
+
+        Focus areas:
+        - N+1 query patterns
+        - Memory leaks and unbounded growth
+        - Missing caching opportunities
+        - Algorithmic complexity issues
+        - Database index utilization
+
+        Output format: One finding per section with severity (P1/P2/P3),
+        affected file/line, description, and suggested fix.
+      priority: standard
+
+    - name: architecture-strategist
+      enabled: true
+      plan: |
+        You are an Architecture Strategist reviewing code changes.
+
+        Focus areas:
+        - SOLID principle violations
+        - Inappropriate coupling between modules
+        - Layering violations (e.g., handler accessing DB directly)
+        - Missing abstraction boundaries
+        - Inconsistency with existing patterns
+
+        Output format: One finding per section with severity (P1/P2/P3),
+        affected file/line, description, and suggested fix.
+      priority: standard
+
+    - name: test-coverage-analyzer
+      enabled: true
+      plan: |
+        You are a Test Coverage Analyzer reviewing code changes.
+
+        Focus areas:
+        - Missing test coverage for new code paths
+        - Untested error/edge cases
+        - Test quality (meaningful assertions vs superficial)
+        - Integration test gaps
+        - Mock appropriateness
+
+        Output format: One finding per section with severity (P1/P2/P3),
+        affected file/line, description, and suggested fix.
+      priority: standard
+
+    # Users can add custom agents here
+    - name: custom-domain-reviewer
+      enabled: false
+      plan: "Review for domain-specific business logic concerns..."
+      priority: optional
+```
+
+#### Contract-Level Override
+
+```yaml
+# In contract configuration or via CLI
+review:
+  agents:
+    # Disable agents not relevant to this contract
+    - name: concurrency-safety
+      enabled: false
+    # Add contract-specific reviewer
+    - name: migration-safety
+      enabled: true
+      plan: "Review database migrations for data loss risks..."
+```
+
+### 3. Synthesis Step
+
+After all review agents complete, a synthesis task:
+
+1. **Collects** all findings from group task outputs
+2. **Deduplicates** findings about the same issue from different perspectives
+3. **Resolves contradictions** (e.g., one agent says "add caching" while another says "caching adds complexity")
+4. **Prioritizes** by severity and cross-agent agreement
+5. **Produces** a structured review report as a contract file
+
+```bash
+# Synthesis is automatically run after wait-group completes
+makima supervisor synthesize-review "review" \
+  --output-format findings \
+  --create-contract-file
+```
+
+### 4. Auto-Review Trigger
+
+When a contract's phase transitions to `review`:
+
+```rust
+// In phase transition handler
+if new_phase == "review" && contract.review_config.auto_trigger {
+    // Spawn review group automatically
+    spawn_review_group(contract, review_config).await?;
+}
+```
+
+---
+
+## Integration with Existing Makima Features
+
+### Supervisor/Worker Hierarchy
+
+Review agents are spawned as **worker tasks** under the supervisor, using existing `spawn-task` infrastructure. The new `spawn-group`/`wait-group` commands are syntactic sugar over batch `spawn-task` + `wait` calls.
+
+### Git Worktree Isolation
+
+Review agents share the supervisor's worktree in **read-only mode** (a new capability). This avoids creating N separate worktrees for review-only tasks. Implementation:
+- New `supervisor_worktree_task_id` parameter (already exists in SpawnTask)
+- New `read_only: true` flag to prevent file modifications
+- Workers see the same code state that triggered the review
+
+### Contract Files
+
+The synthesized review report is stored as a **contract file** attached to the review phase:
+```rust
+File {
+    contract_id: contract.id,
+    contract_phase: "review",
+    name: "Review Report — 2026-02-09",
+    body: vec![
+        BodyElement::Heading { level: 1, text: "Review Summary" },
+        BodyElement::Paragraph { text: "3 P1 findings, 7 P2 findings, 12 P3 findings" },
+        // ... structured findings
+    ],
+}
+```
+
+### Phase Guards
+
+If `phase_guard` is enabled and P1 findings exist, the phase transition from Review to Execute (or Compound) is blocked until P1s are resolved. This integrates with the existing `advance-phase` confirmation flow.
+
+### Completion Gates
+
+Each review agent uses the existing `<COMPLETION_GATE>` mechanism to signal when its review is complete:
+```xml
+<COMPLETION_GATE>
+ready: true
+reason: "Security review complete. Found 2 P1 and 3 P2 findings."
+progress: "Reviewed 47 files across 12 modules."
+</COMPLETION_GATE>
+```
+
+### Circuit Breaker
+
+The existing CircuitBreaker protects against review agents getting stuck. If a review agent loops without progress for 3 iterations, it's terminated and its partial findings are included in synthesis.
+
+---
+
+## Implementation Plan
+
+### Phase 1: Group Task Infrastructure (5-7 days)
+
+| Task | Effort | Description |
+|------|--------|-------------|
+| `spawn-group` command | 2 days | Batch task spawning with named groups |
+| `wait-group` command | 1 day | Wait for all tasks in group |
+| Group tracking in DB | 1 day | Task group table, membership, status |
+| Shared worktree (read-only) | 1-2 days | Workers share supervisor worktree |
+| Tests | 1 day | Unit + integration tests |
+
+### Phase 2: Review Agent System (4-6 days)
+
+| Task | Effort | Description |
+|------|--------|-------------|
+| Review config YAML parser | 1 day | Parse `.makima/review-agents.yaml` |
+| `supervisor review` command | 2 days | Orchestrate review pipeline |
+| Synthesis agent logic | 1-2 days | Deduplicate, prioritize, format |
+| Review report as contract file | 1 day | Store structured output |
+
+### Phase 3: Automation & Polish (3-5 days)
+
+| Task | Effort | Description |
+|------|--------|-------------|
+| Auto-trigger on phase transition | 1 day | Hook into `advance-phase` |
+| P1 merge blocking | 1 day | Phase guard integration |
+| Default review agent templates | 1-2 days | Ship 8-10 built-in agents |
+| Documentation | 1 day | User guide and config reference |
+
+---
+
+## Configuration Examples
+
+### Minimal Setup (Zero Config)
+
+```bash
+# Uses built-in review agents with default settings
+makima supervisor review
+```
+
+### Custom Review for a Specific Contract
+
+```bash
+# Override for this contract only
+makima supervisor review \
+  --only security,performance \
+  --merge-blocking P1 \
+  --timeout 300
+```
+
+### Full Custom Configuration
+
+```yaml
+# .makima/review-agents.yaml
+version: 1
+review:
+  max_concurrent: 6
+  agent_timeout: 300
+  auto_trigger: true
+  merge_blocking_severity: P1
+
+  synthesis:
+    dedup_threshold: 0.8        # Similarity score for deduplication
+    min_agreement: 2             # Findings flagged by 2+ agents get priority boost
+    output_format: "findings"    # "findings" | "report" | "both"
+    create_contract_file: true
+
+  agents:
+    - name: security-sentinel
+      enabled: true
+      priority: critical
+      plan: |
+        ...
+    - name: performance-oracle
+      enabled: true
+      priority: standard
+      plan: |
+        ...
+    # ... more agents
+```
+
+---
+
+## Open Questions
+
+1. **Shared worktree read-only enforcement**: Should this be enforced at the filesystem level (mount read-only) or via convention (instructions to the agent)?
+2. **Review scope**: Should review agents see all files or only changed files (git diff)?
+3. **Incremental review**: When new commits are added during review, should agents re-review or only review the delta?
+4. **Agent output parsing**: Should agents output structured YAML findings, or should the synthesis step parse natural language?
+5. **Cost control**: With 10+ parallel agents, how do we manage API costs? Should there be a budget ceiling per review?
+6. **Finding deduplication**: What similarity threshold should trigger deduplication? How to handle partial overlaps?
+
+---
+
+## Alternatives Considered
+
+| Alternative | Pros | Cons | Decision |
+|-------------|------|------|----------|
+| Single comprehensive review agent | Simple, no coordination overhead | Misses perspective-specific issues | Rejected — diminishes review quality |
+| Sequential reviews (one after another) | Simpler orchestration | 5-10x slower; later reviews can't benefit from earlier ones | Rejected — latency unacceptable |
+| External review tools integration | Leverage existing static analysis | Limited to tool capabilities; no semantic review | Complement — can integrate alongside agent review |
+| User-configured number of agents | Maximum flexibility | Analysis paralysis for new users | Adopted — sensible defaults + customization |
+
+---
+
+## Priority & Complexity Assessment
+
+- **Priority: HIGH** — Multi-agent review is the highest-impact feature from the compound engineering plugin. It directly improves code quality with no change to developer workflow.
+- **Complexity: MEDIUM** — The core `spawn-group`/`wait-group` pattern is straightforward. The synthesis step requires careful design. Shared worktree read-only mode is a new capability.
+- **Risk: LOW-MEDIUM** — Main risks are resource consumption (manageable with concurrency limits) and synthesis quality (improvable iteratively).
-- 
cgit v1.2.3