# Feature Proposal: Multi-Agent Parallel Review System
> **Priority:** High
> **Complexity:** Medium
> **Estimated Effort:** 12-18 days
> **Status:** Proposal
> **Date:** 2026-02-09
> **Dependencies:** [Findings Tracking](feature-findings-tracking.md) (recommended)
> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Workflow Presets](feature-workflow-presets.md)
---
## Problem Statement
Makima's contract lifecycle includes a **Review** phase, but it currently has:
- **No automated review mechanism** — the review phase relies entirely on manual user inspection or a single supervisor task
- **Single-perspective review** — even when a review task is spawned, it examines code from one viewpoint
- **No structured review output** — findings are captured as unstructured text in task output
- **No review templates** — each review must be configured from scratch
- **No synthesis** — when multiple reviewers exist, there's no mechanism to deduplicate and prioritize findings
For complex contracts touching security, performance, and architecture, a single-pass review consistently misses category-specific issues that specialized reviewers would catch.
---
## How Compound Engineering Solves This
The compound engineering plugin spawns **12-15 specialized review agents in parallel**, each examining the code from a unique perspective:
| Agent | Focus Area | Example Findings |
|-------|-----------|-----------------|
| Security Sentinel | Auth, injection, secrets, CSRF | SQL injection in user input handler |
| Performance Oracle | N+1 queries, memory leaks, caching | Unbounded list growth in event handler |
| Architecture Strategist | Coupling, SOLID, layering | Service directly accessing repository internals |
| Code Philosopher | Readability, naming, complexity | Cyclomatic complexity > 15 in payment flow |
| Data Integrity Guardian | Validation, constraints, migrations | Missing NOT NULL constraint on required field |
| Error Resilience Analyzer | Error handling, retries, fallbacks | Unhandled timeout in external API call |
| API Contract Validator | Breaking changes, versioning | Removed required field from response |
| Dependency Health Checker | Vulnerabilities, licensing, freshness | CVE-2025-XXXX in transitive dependency |
| Test Coverage Analyzer | Coverage gaps, edge cases, mocking | No tests for error path in checkout flow |
| Documentation Completeness | Docs accuracy, examples, changelog | Public API endpoint undocumented |
| Concurrency Safety | Race conditions, deadlocks, atomicity | Non-atomic read-modify-write on shared counter |
After all agents complete, a **synthesis agent** deduplicates findings, resolves contradictions, and produces a prioritized report.
```
┌───────────────────────────────────────────────────────┐
│ Review Orchestrator │
│ │
│ spawn-group "review" │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Security │ │ Perf │ │ Arch │ │ Code │ │
│ │Sentinel │ │ Oracle │ │Strategy │ │ Phil │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │
│ │ Data │ │ Error │ │ API │ │ Deps │ │
│ │Guardian │ │Resilien.│ │Contract │ │ Health │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │
│ │ Test │ │ Docs │ │Concurr. │ │
│ │Coverage │ │Complete │ │ Safety │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ wait-group "review" │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ Synthesis Agent │ │
│ │ - Deduplicate findings │ │
│ │ - Resolve contradictions │ │
│ │ - Prioritize by severity │ │
│ │ - Generate summary report │ │
│ └──────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Structured Findings │
│ (P1 / P2 / P3) │
└───────────────────────────────────────────────────────┘
```
---
## Proposed Makima Implementation
### 1. New Supervisor Commands
#### `makima supervisor spawn-group`
Spawns multiple tasks as a named group and returns immediately:
```bash
# Spawn a review group with 5 agents
makima supervisor spawn-group "review" \
--tasks '[
{"name": "security-review", "plan": "Review for security vulnerabilities..."},
{"name": "performance-review", "plan": "Review for performance issues..."},
{"name": "architecture-review", "plan": "Review for architecture concerns..."}
]' \
--share-worktree \
--read-only
```
**Key parameters:**
- `--tasks` — JSON array of task definitions
- `--share-worktree` — All tasks in the group share the supervisor's worktree (read-only access)
- `--read-only` — Tasks cannot modify files, only produce output
- `--max-concurrent N` — Limit parallel execution (default: unlimited)
#### `makima supervisor wait-group`
Waits for all tasks in a named group to complete:
```bash
# Wait for all review tasks, timeout after 10 minutes
makima supervisor wait-group "review" --timeout 600
# Returns JSON with all task results
```
**Output format:**
```json
{
"group": "review",
"status": "completed",
"tasks": [
{"name": "security-review", "status": "done", "output": "..."},
{"name": "performance-review", "status": "done", "output": "..."}
],
"duration_seconds": 127
}
```
#### `makima supervisor review`
High-level command that orchestrates the full review pipeline:
```bash
# Run review with default agent config
makima supervisor review
# Run review with custom config
makima supervisor review --config .makima/review-agents.yaml
# Run only specific review categories
makima supervisor review --only security,performance,architecture
```
### 2. Review Agent Configuration
#### Repository-Level Configuration (`.makima/review-agents.yaml`)
```yaml
# .makima/review-agents.yaml
version: 1
review:
# Maximum number of concurrent review agents
max_concurrent: 8
# Timeout per agent (seconds)
agent_timeout: 300
# Auto-trigger review when phase transitions to 'review'
auto_trigger: true
# Finding severity that blocks merge
merge_blocking_severity: P1
agents:
- name: security-sentinel
enabled: true
plan: |
You are a Security Sentinel reviewing code changes.
Focus areas:
- Authentication and authorization flaws
- Injection vulnerabilities (SQL, XSS, command injection)
- Secret/credential exposure
- CSRF and session management
- Input validation gaps
Output format: One finding per section with severity (P1/P2/P3),
affected file/line, description, and suggested fix.
priority: critical # Always runs
- name: performance-oracle
enabled: true
plan: |
You are a Performance Oracle reviewing code changes.
Focus areas:
- N+1 query patterns
- Memory leaks and unbounded growth
- Missing caching opportunities
- Algorithmic complexity issues
- Database index utilization
Output format: One finding per section with severity (P1/P2/P3),
affected file/line, description, and suggested fix.
priority: standard
- name: architecture-strategist
enabled: true
plan: |
You are an Architecture Strategist reviewing code changes.
Focus areas:
- SOLID principle violations
- Inappropriate coupling between modules
- Layering violations (e.g., handler accessing DB directly)
- Missing abstraction boundaries
- Inconsistency with existing patterns
Output format: One finding per section with severity (P1/P2/P3),
affected file/line, description, and suggested fix.
priority: standard
- name: test-coverage-analyzer
enabled: true
plan: |
You are a Test Coverage Analyzer reviewing code changes.
Focus areas:
- Missing test coverage for new code paths
- Untested error/edge cases
- Test quality (meaningful assertions vs superficial)
- Integration test gaps
- Mock appropriateness
Output format: One finding per section with severity (P1/P2/P3),
affected file/line, description, and suggested fix.
priority: standard
# Users can add custom agents here
- name: custom-domain-reviewer
enabled: false
plan: "Review for domain-specific business logic concerns..."
priority: optional
```
#### Contract-Level Override
```yaml
# In contract configuration or via CLI
review:
agents:
# Disable agents not relevant to this contract
- name: concurrency-safety
enabled: false
# Add contract-specific reviewer
- name: migration-safety
enabled: true
plan: "Review database migrations for data loss risks..."
```
### 3. Synthesis Step
After all review agents complete, a synthesis task:
1. **Collects** all findings from group task outputs
2. **Deduplicates** findings about the same issue from different perspectives
3. **Resolves contradictions** (e.g., one agent says "add caching" while another says "caching adds complexity")
4. **Prioritizes** by severity and cross-agent agreement
5. **Produces** a structured review report as a contract file
```bash
# Synthesis is automatically run after wait-group completes
makima supervisor synthesize-review "review" \
--output-format findings \
--create-contract-file
```
### 4. Auto-Review Trigger
When a contract's phase transitions to `review`:
```rust
// In phase transition handler
if new_phase == "review" && contract.review_config.auto_trigger {
// Spawn review group automatically
spawn_review_group(contract, review_config).await?;
}
```
---
## Integration with Existing Makima Features
### Supervisor/Worker Hierarchy
Review agents are spawned as **worker tasks** under the supervisor, using existing `spawn-task` infrastructure. The new `spawn-group`/`wait-group` commands are syntactic sugar over batch `spawn-task` + `wait` calls.
### Git Worktree Isolation
Review agents share the supervisor's worktree in **read-only mode** (a new capability). This avoids creating N separate worktrees for review-only tasks. Implementation:
- New `supervisor_worktree_task_id` parameter (already exists in SpawnTask)
- New `read_only: true` flag to prevent file modifications
- Workers see the same code state that triggered the review
### Contract Files
The synthesized review report is stored as a **contract file** attached to the review phase:
```rust
File {
contract_id: contract.id,
contract_phase: "review",
name: "Review Report — 2026-02-09",
body: vec![
BodyElement::Heading { level: 1, text: "Review Summary" },
BodyElement::Paragraph { text: "3 P1 findings, 7 P2 findings, 12 P3 findings" },
// ... structured findings
],
}
```
### Phase Guards
If `phase_guard` is enabled and P1 findings exist, the phase transition from Review to Execute (or Compound) is blocked until P1s are resolved. This integrates with the existing `advance-phase` confirmation flow.
### Completion Gates
Each review agent uses the existing `<COMPLETION_GATE>` mechanism to signal when its review is complete:
```xml
<COMPLETION_GATE>
ready: true
reason: "Security review complete. Found 2 P1 and 3 P2 findings."
progress: "Reviewed 47 files across 12 modules."
</COMPLETION_GATE>
```
### Circuit Breaker
The existing CircuitBreaker protects against review agents getting stuck. If a review agent loops without progress for 3 iterations, it's terminated and its partial findings are included in synthesis.
---
## Implementation Plan
### Phase 1: Group Task Infrastructure (5-7 days)
| Task | Effort | Description |
|------|--------|-------------|
| `spawn-group` command | 2 days | Batch task spawning with named groups |
| `wait-group` command | 1 day | Wait for all tasks in group |
| Group tracking in DB | 1 day | Task group table, membership, status |
| Shared worktree (read-only) | 1-2 days | Workers share supervisor worktree |
| Tests | 1 day | Unit + integration tests |
### Phase 2: Review Agent System (4-6 days)
| Task | Effort | Description |
|------|--------|-------------|
| Review config YAML parser | 1 day | Parse `.makima/review-agents.yaml` |
| `supervisor review` command | 2 days | Orchestrate review pipeline |
| Synthesis agent logic | 1-2 days | Deduplicate, prioritize, format |
| Review report as contract file | 1 day | Store structured output |
### Phase 3: Automation & Polish (3-5 days)
| Task | Effort | Description |
|------|--------|-------------|
| Auto-trigger on phase transition | 1 day | Hook into `advance-phase` |
| P1 merge blocking | 1 day | Phase guard integration |
| Default review agent templates | 1-2 days | Ship 8-10 built-in agents |
| Documentation | 1 day | User guide and config reference |
---
## Configuration Examples
### Minimal Setup (Zero Config)
```bash
# Uses built-in review agents with default settings
makima supervisor review
```
### Custom Review for a Specific Contract
```bash
# Override for this contract only
makima supervisor review \
--only security,performance \
--merge-blocking P1 \
--timeout 300
```
### Full Custom Configuration
```yaml
# .makima/review-agents.yaml
version: 1
review:
max_concurrent: 6
agent_timeout: 300
auto_trigger: true
merge_blocking_severity: P1
synthesis:
dedup_threshold: 0.8 # Similarity score for deduplication
min_agreement: 2 # Findings flagged by 2+ agents get priority boost
output_format: "findings" # "findings" | "report" | "both"
create_contract_file: true
agents:
- name: security-sentinel
enabled: true
priority: critical
plan: |
...
- name: performance-oracle
enabled: true
priority: standard
plan: |
...
# ... more agents
```
---
## Open Questions
1. **Shared worktree read-only enforcement**: Should this be enforced at the filesystem level (mount read-only) or via convention (instructions to the agent)?
2. **Review scope**: Should review agents see all files or only changed files (git diff)?
3. **Incremental review**: When new commits are added during review, should agents re-review or only review the delta?
4. **Agent output parsing**: Should agents output structured YAML findings, or should the synthesis step parse natural language?
5. **Cost control**: With 10+ parallel agents, how do we manage API costs? Should there be a budget ceiling per review?
6. **Finding deduplication**: What similarity threshold should trigger deduplication? How to handle partial overlaps?
---
## Alternatives Considered
| Alternative | Pros | Cons | Decision |
|-------------|------|------|----------|
| Single comprehensive review agent | Simple, no coordination overhead | Misses perspective-specific issues | Rejected — diminishes review quality |
| Sequential reviews (one after another) | Simpler orchestration | 5-10x slower; later reviews can't benefit from earlier ones | Rejected — latency unacceptable |
| External review tools integration | Leverage existing static analysis | Limited to tool capabilities; no semantic review | Complement — can integrate alongside agent review |
| User-configured number of agents | Maximum flexibility | Analysis paralysis for new users | Adopted — sensible defaults + customization |
---
## Priority & Complexity Assessment
- **Priority: HIGH** — Multi-agent review is the highest-impact feature from the compound engineering plugin. It directly improves code quality with no change to developer workflow.
- **Complexity: MEDIUM** — The core `spawn-group`/`wait-group` pattern is straightforward. The synthesis step requires careful design. Shared worktree read-only mode is a new capability.
- **Risk: LOW-MEDIUM** — Main risks are resource consumption (manageable with concurrency limits) and synthesis quality (improvable iteratively).