path: root/docs/proposals/feature-multi-agent-review.md



# Feature Proposal: Multi-Agent Parallel Review System

> **Priority:** High
> **Complexity:** Medium
> **Estimated Effort:** 12-18 days
> **Status:** Proposal
> **Date:** 2026-02-09
> **Dependencies:** [Findings Tracking](feature-findings-tracking.md) (recommended)
> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Workflow Presets](feature-workflow-presets.md)

---

## Problem Statement

Makima's contract lifecycle includes a **Review** phase, but it currently has:

- **No automated review mechanism** — the review phase relies entirely on manual user inspection or a single supervisor task
- **Single-perspective review** — even when a review task is spawned, it examines code from one viewpoint
- **No structured review output** — findings are captured as unstructured text in task output
- **No review templates** — each review must be configured from scratch
- **No synthesis** — when multiple reviewers exist, there's no mechanism to deduplicate and prioritize findings

For complex contracts touching security, performance, and architecture, a single-pass review consistently misses category-specific issues that specialized reviewers would catch.

---

## How Compound Engineering Solves This

The compound engineering plugin spawns **12-15 specialized review agents in parallel**, each examining the code from a unique perspective:

| Agent | Focus Area | Example Findings |
|-------|-----------|-----------------|
| Security Sentinel | Auth, injection, secrets, CSRF | SQL injection in user input handler |
| Performance Oracle | N+1 queries, memory leaks, caching | Unbounded list growth in event handler |
| Architecture Strategist | Coupling, SOLID, layering | Service directly accessing repository internals |
| Code Philosopher | Readability, naming, complexity | Cyclomatic complexity > 15 in payment flow |
| Data Integrity Guardian | Validation, constraints, migrations | Missing NOT NULL constraint on required field |
| Error Resilience Analyzer | Error handling, retries, fallbacks | Unhandled timeout in external API call |
| API Contract Validator | Breaking changes, versioning | Removed required field from response |
| Dependency Health Checker | Vulnerabilities, licensing, freshness | CVE-2025-XXXX in transitive dependency |
| Test Coverage Analyzer | Coverage gaps, edge cases, mocking | No tests for error path in checkout flow |
| Documentation Completeness | Docs accuracy, examples, changelog | Public API endpoint undocumented |
| Concurrency Safety | Race conditions, deadlocks, atomicity | Non-atomic read-modify-write on shared counter |

After all agents complete, a **synthesis agent** deduplicates findings, resolves contradictions, and produces a prioritized report.

```
┌───────────────────────────────────────────────────────┐
│                  Review Orchestrator                   │
│                                                       │
│  spawn-group "review"                                 │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐   │
│  │Security │ │ Perf    │ │  Arch   │ │  Code   │   │
│  │Sentinel │ │ Oracle  │ │Strategy │ │  Phil   │   │
│  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘   │
│       │           │           │           │         │
│  ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐   │
│  │  Data   │ │ Error   │ │  API    │ │  Deps   │   │
│  │Guardian │ │Resilien.│ │Contract │ │ Health  │   │
│  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘   │
│       │           │           │           │         │
│  ┌────┴────┐ ┌────┴────┐ ┌────┴────┐               │
│  │  Test   │ │  Docs   │ │Concurr. │               │
│  │Coverage │ │Complete │ │ Safety  │               │
│  └────┬────┘ └────┬────┘ └────┬────┘               │
│       │           │           │                     │
│  wait-group "review"                                 │
│       ▼           ▼           ▼                     │
│  ┌──────────────────────────────────────────┐       │
│  │         Synthesis Agent                   │       │
│  │  - Deduplicate findings                   │       │
│  │  - Resolve contradictions                 │       │
│  │  - Prioritize by severity                 │       │
│  │  - Generate summary report                │       │
│  └──────────────────────────────────────────┘       │
│                      │                               │
│                      ▼                               │
│             Structured Findings                      │
│             (P1 / P2 / P3)                           │
└───────────────────────────────────────────────────────┘
```

---

## Proposed Makima Implementation

### 1. New Supervisor Commands

#### `makima supervisor spawn-group`

Spawns multiple tasks as a named group and returns immediately:

```bash
# Spawn a review group with 5 agents
makima supervisor spawn-group "review" \
  --tasks '[
    {"name": "security-review", "plan": "Review for security vulnerabilities..."},
    {"name": "performance-review", "plan": "Review for performance issues..."},
    {"name": "architecture-review", "plan": "Review for architecture concerns..."}
  ]' \
  --share-worktree \
  --read-only
```

**Key parameters:**
- `--tasks` — JSON array of task definitions
- `--share-worktree` — All tasks in the group share the supervisor's worktree (read-only access)
- `--read-only` — Tasks cannot modify files, only produce output
- `--max-concurrent N` — Limit parallel execution (default: unlimited)

#### `makima supervisor wait-group`

Waits for all tasks in a named group to complete:

```bash
# Wait for all review tasks, timeout after 10 minutes
makima supervisor wait-group "review" --timeout 600

# Returns JSON with all task results
```

**Output format:**
```json
{
  "group": "review",
  "status": "completed",
  "tasks": [
    {"name": "security-review", "status": "done", "output": "..."},
    {"name": "performance-review", "status": "done", "output": "..."}
  ],
  "duration_seconds": 127
}
```

#### `makima supervisor review`

High-level command that orchestrates the full review pipeline:

```bash
# Run review with default agent config
makima supervisor review

# Run review with custom config
makima supervisor review --config .makima/review-agents.yaml

# Run only specific review categories
makima supervisor review --only security,performance,architecture
```

### 2. Review Agent Configuration

#### Repository-Level Configuration (`.makima/review-agents.yaml`)

```yaml
# .makima/review-agents.yaml
version: 1
review:
  # Maximum number of concurrent review agents
  max_concurrent: 8

  # Timeout per agent (seconds)
  agent_timeout: 300

  # Auto-trigger review when phase transitions to 'review'
  auto_trigger: true

  # Finding severity that blocks merge
  merge_blocking_severity: P1

  agents:
    - name: security-sentinel
      enabled: true
      plan: |
        You are a Security Sentinel reviewing code changes.

        Focus areas:
        - Authentication and authorization flaws
        - Injection vulnerabilities (SQL, XSS, command injection)
        - Secret/credential exposure
        - CSRF and session management
        - Input validation gaps

        Output format: One finding per section with severity (P1/P2/P3),
        affected file/line, description, and suggested fix.
      priority: critical  # Always runs

    - name: performance-oracle
      enabled: true
      plan: |
        You are a Performance Oracle reviewing code changes.

        Focus areas:
        - N+1 query patterns
        - Memory leaks and unbounded growth
        - Missing caching opportunities
        - Algorithmic complexity issues
        - Database index utilization

        Output format: One finding per section with severity (P1/P2/P3),
        affected file/line, description, and suggested fix.
      priority: standard

    - name: architecture-strategist
      enabled: true
      plan: |
        You are an Architecture Strategist reviewing code changes.

        Focus areas:
        - SOLID principle violations
        - Inappropriate coupling between modules
        - Layering violations (e.g., handler accessing DB directly)
        - Missing abstraction boundaries
        - Inconsistency with existing patterns

        Output format: One finding per section with severity (P1/P2/P3),
        affected file/line, description, and suggested fix.
      priority: standard

    - name: test-coverage-analyzer
      enabled: true
      plan: |
        You are a Test Coverage Analyzer reviewing code changes.

        Focus areas:
        - Missing test coverage for new code paths
        - Untested error/edge cases
        - Test quality (meaningful assertions vs superficial)
        - Integration test gaps
        - Mock appropriateness

        Output format: One finding per section with severity (P1/P2/P3),
        affected file/line, description, and suggested fix.
      priority: standard

    # Users can add custom agents here
    - name: custom-domain-reviewer
      enabled: false
      plan: "Review for domain-specific business logic concerns..."
      priority: optional
```

#### Contract-Level Override

```yaml
# In contract configuration or via CLI
review:
  agents:
    # Disable agents not relevant to this contract
    - name: concurrency-safety
      enabled: false
    # Add contract-specific reviewer
    - name: migration-safety
      enabled: true
      plan: "Review database migrations for data loss risks..."
```

### 3. Synthesis Step

After all review agents complete, a synthesis task:

1. **Collects** all findings from group task outputs
2. **Deduplicates** findings about the same issue from different perspectives
3. **Resolves contradictions** (e.g., one agent says "add caching" while another says "caching adds complexity")
4. **Prioritizes** by severity and cross-agent agreement
5. **Produces** a structured review report as a contract file

```bash
# Synthesis is automatically run after wait-group completes
makima supervisor synthesize-review "review" \
  --output-format findings \
  --create-contract-file
```

### 4. Auto-Review Trigger

When a contract's phase transitions to `review`:

```rust
// In phase transition handler
if new_phase == "review" && contract.review_config.auto_trigger {
    // Spawn review group automatically
    spawn_review_group(contract, review_config).await?;
}
```

---

## Integration with Existing Makima Features

### Supervisor/Worker Hierarchy

Review agents are spawned as **worker tasks** under the supervisor, using existing `spawn-task` infrastructure. The new `spawn-group`/`wait-group` commands are syntactic sugar over batch `spawn-task` + `wait` calls.

### Git Worktree Isolation

Review agents share the supervisor's worktree in **read-only mode** (a new capability). This avoids creating N separate worktrees for review-only tasks. Implementation:
- New `supervisor_worktree_task_id` parameter (already exists in SpawnTask)
- New `read_only: true` flag to prevent file modifications
- Workers see the same code state that triggered the review

### Contract Files

The synthesized review report is stored as a **contract file** attached to the review phase:
```rust
File {
    contract_id: contract.id,
    contract_phase: "review",
    name: "Review Report — 2026-02-09",
    body: vec![
        BodyElement::Heading { level: 1, text: "Review Summary" },
        BodyElement::Paragraph { text: "3 P1 findings, 7 P2 findings, 12 P3 findings" },
        // ... structured findings
    ],
}
```

### Phase Guards

If `phase_guard` is enabled and P1 findings exist, the phase transition from Review to Execute (or Compound) is blocked until P1s are resolved. This integrates with the existing `advance-phase` confirmation flow.

### Completion Gates

Each review agent uses the existing `<COMPLETION_GATE>` mechanism to signal when its review is complete:
```xml
<COMPLETION_GATE>
ready: true
reason: "Security review complete. Found 2 P1 and 3 P2 findings."
progress: "Reviewed 47 files across 12 modules."
</COMPLETION_GATE>
```

### Circuit Breaker

The existing CircuitBreaker protects against review agents getting stuck. If a review agent loops without progress for 3 iterations, it's terminated and its partial findings are included in synthesis.

---

## Implementation Plan

### Phase 1: Group Task Infrastructure (5-7 days)

| Task | Effort | Description |
|------|--------|-------------|
| `spawn-group` command | 2 days | Batch task spawning with named groups |
| `wait-group` command | 1 day | Wait for all tasks in group |
| Group tracking in DB | 1 day | Task group table, membership, status |
| Shared worktree (read-only) | 1-2 days | Workers share supervisor worktree |
| Tests | 1 day | Unit + integration tests |

### Phase 2: Review Agent System (4-6 days)

| Task | Effort | Description |
|------|--------|-------------|
| Review config YAML parser | 1 day | Parse `.makima/review-agents.yaml` |
| `supervisor review` command | 2 days | Orchestrate review pipeline |
| Synthesis agent logic | 1-2 days | Deduplicate, prioritize, format |
| Review report as contract file | 1 day | Store structured output |

### Phase 3: Automation & Polish (3-5 days)

| Task | Effort | Description |
|------|--------|-------------|
| Auto-trigger on phase transition | 1 day | Hook into `advance-phase` |
| P1 merge blocking | 1 day | Phase guard integration |
| Default review agent templates | 1-2 days | Ship 8-10 built-in agents |
| Documentation | 1 day | User guide and config reference |

---

## Configuration Examples

### Minimal Setup (Zero Config)

```bash
# Uses built-in review agents with default settings
makima supervisor review
```

### Custom Review for a Specific Contract

```bash
# Override for this contract only
makima supervisor review \
  --only security,performance \
  --merge-blocking P1 \
  --timeout 300
```

### Full Custom Configuration

```yaml
# .makima/review-agents.yaml
version: 1
review:
  max_concurrent: 6
  agent_timeout: 300
  auto_trigger: true
  merge_blocking_severity: P1

  synthesis:
    dedup_threshold: 0.8        # Similarity score for deduplication
    min_agreement: 2             # Findings flagged by 2+ agents get priority boost
    output_format: "findings"    # "findings" | "report" | "both"
    create_contract_file: true

  agents:
    - name: security-sentinel
      enabled: true
      priority: critical
      plan: |
        ...
    - name: performance-oracle
      enabled: true
      priority: standard
      plan: |
        ...
    # ... more agents
```

---

## Open Questions

1. **Shared worktree read-only enforcement**: Should this be enforced at the filesystem level (mount read-only) or via convention (instructions to the agent)?
2. **Review scope**: Should review agents see all files or only changed files (git diff)?
3. **Incremental review**: When new commits are added during review, should agents re-review or only review the delta?
4. **Agent output parsing**: Should agents output structured YAML findings, or should the synthesis step parse natural language?
5. **Cost control**: With 10+ parallel agents, how do we manage API costs? Should there be a budget ceiling per review?
6. **Finding deduplication**: What similarity threshold should trigger deduplication? How to handle partial overlaps?

---

## Alternatives Considered

| Alternative | Pros | Cons | Decision |
|-------------|------|------|----------|
| Single comprehensive review agent | Simple, no coordination overhead | Misses perspective-specific issues | Rejected — diminishes review quality |
| Sequential reviews (one after another) | Simpler orchestration | 5-10x slower; later reviews can't benefit from earlier ones | Rejected — latency unacceptable |
| External review tools integration | Leverage existing static analysis | Limited to tool capabilities; no semantic review | Complement — can integrate alongside agent review |
| User-configured number of agents | Maximum flexibility | Analysis paralysis for new users | Adopted — sensible defaults + customization |

---

## Priority & Complexity Assessment

- **Priority: HIGH** — Multi-agent review is the highest-impact feature from the compound engineering plugin. It directly improves code quality with no change to developer workflow.
- **Complexity: MEDIUM** — The core `spawn-group`/`wait-group` pattern is straightforward. The synthesis step requires careful design. Shared worktree read-only mode is a new capability.
- **Risk: LOW-MEDIUM** — Main risks are resource consumption (manageable with concurrency limits) and synthesis quality (improvable iteratively).