# Feature Proposal: Multi-Agent Parallel Review System > **Priority:** High > **Complexity:** Medium > **Estimated Effort:** 12-18 days > **Status:** Proposal > **Date:** 2026-02-09 > **Dependencies:** [Findings Tracking](feature-findings-tracking.md) (recommended) > **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Workflow Presets](feature-workflow-presets.md) --- ## Problem Statement Makima's contract lifecycle includes a **Review** phase, but it currently has: - **No automated review mechanism** — the review phase relies entirely on manual user inspection or a single supervisor task - **Single-perspective review** — even when a review task is spawned, it examines code from one viewpoint - **No structured review output** — findings are captured as unstructured text in task output - **No review templates** — each review must be configured from scratch - **No synthesis** — when multiple reviewers exist, there's no mechanism to deduplicate and prioritize findings For complex contracts touching security, performance, and architecture, a single-pass review consistently misses category-specific issues that specialized reviewers would catch. --- ## How Compound Engineering Solves This The compound engineering plugin spawns **12-15 specialized review agents in parallel**, each examining the code from a unique perspective: | Agent | Focus Area | Example Findings | |-------|-----------|-----------------| | Security Sentinel | Auth, injection, secrets, CSRF | SQL injection in user input handler | | Performance Oracle | N+1 queries, memory leaks, caching | Unbounded list growth in event handler | | Architecture Strategist | Coupling, SOLID, layering | Service directly accessing repository internals | | Code Philosopher | Readability, naming, complexity | Cyclomatic complexity > 15 in payment flow | | Data Integrity Guardian | Validation, constraints, migrations | Missing NOT NULL constraint on required field | | Error Resilience Analyzer | Error handling, retries, fallbacks | Unhandled timeout in external API call | | API Contract Validator | Breaking changes, versioning | Removed required field from response | | Dependency Health Checker | Vulnerabilities, licensing, freshness | CVE-2025-XXXX in transitive dependency | | Test Coverage Analyzer | Coverage gaps, edge cases, mocking | No tests for error path in checkout flow | | Documentation Completeness | Docs accuracy, examples, changelog | Public API endpoint undocumented | | Concurrency Safety | Race conditions, deadlocks, atomicity | Non-atomic read-modify-write on shared counter | After all agents complete, a **synthesis agent** deduplicates findings, resolves contradictions, and produces a prioritized report. ``` ┌───────────────────────────────────────────────────────┐ │ Review Orchestrator │ │ │ │ spawn-group "review" │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │Security │ │ Perf │ │ Arch │ │ Code │ │ │ │Sentinel │ │ Oracle │ │Strategy │ │ Phil │ │ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ │ │ │ │ │ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │ │ │ Data │ │ Error │ │ API │ │ Deps │ │ │ │Guardian │ │Resilien.│ │Contract │ │ Health │ │ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ │ │ │ │ │ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │ │ │ Test │ │ Docs │ │Concurr. │ │ │ │Coverage │ │Complete │ │ Safety │ │ │ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ │ │ │ │ wait-group "review" │ │ ▼ ▼ ▼ │ │ ┌──────────────────────────────────────────┐ │ │ │ Synthesis Agent │ │ │ │ - Deduplicate findings │ │ │ │ - Resolve contradictions │ │ │ │ - Prioritize by severity │ │ │ │ - Generate summary report │ │ │ └──────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ Structured Findings │ │ (P1 / P2 / P3) │ └───────────────────────────────────────────────────────┘ ``` --- ## Proposed Makima Implementation ### 1. New Supervisor Commands #### `makima supervisor spawn-group` Spawns multiple tasks as a named group and returns immediately: ```bash # Spawn a review group with 5 agents makima supervisor spawn-group "review" \ --tasks '[ {"name": "security-review", "plan": "Review for security vulnerabilities..."}, {"name": "performance-review", "plan": "Review for performance issues..."}, {"name": "architecture-review", "plan": "Review for architecture concerns..."} ]' \ --share-worktree \ --read-only ``` **Key parameters:** - `--tasks` — JSON array of task definitions - `--share-worktree` — All tasks in the group share the supervisor's worktree (read-only access) - `--read-only` — Tasks cannot modify files, only produce output - `--max-concurrent N` — Limit parallel execution (default: unlimited) #### `makima supervisor wait-group` Waits for all tasks in a named group to complete: ```bash # Wait for all review tasks, timeout after 10 minutes makima supervisor wait-group "review" --timeout 600 # Returns JSON with all task results ``` **Output format:** ```json { "group": "review", "status": "completed", "tasks": [ {"name": "security-review", "status": "done", "output": "..."}, {"name": "performance-review", "status": "done", "output": "..."} ], "duration_seconds": 127 } ``` #### `makima supervisor review` High-level command that orchestrates the full review pipeline: ```bash # Run review with default agent config makima supervisor review # Run review with custom config makima supervisor review --config .makima/review-agents.yaml # Run only specific review categories makima supervisor review --only security,performance,architecture ``` ### 2. Review Agent Configuration #### Repository-Level Configuration (`.makima/review-agents.yaml`) ```yaml # .makima/review-agents.yaml version: 1 review: # Maximum number of concurrent review agents max_concurrent: 8 # Timeout per agent (seconds) agent_timeout: 300 # Auto-trigger review when phase transitions to 'review' auto_trigger: true # Finding severity that blocks merge merge_blocking_severity: P1 agents: - name: security-sentinel enabled: true plan: | You are a Security Sentinel reviewing code changes. Focus areas: - Authentication and authorization flaws - Injection vulnerabilities (SQL, XSS, command injection) - Secret/credential exposure - CSRF and session management - Input validation gaps Output format: One finding per section with severity (P1/P2/P3), affected file/line, description, and suggested fix. priority: critical # Always runs - name: performance-oracle enabled: true plan: | You are a Performance Oracle reviewing code changes. Focus areas: - N+1 query patterns - Memory leaks and unbounded growth - Missing caching opportunities - Algorithmic complexity issues - Database index utilization Output format: One finding per section with severity (P1/P2/P3), affected file/line, description, and suggested fix. priority: standard - name: architecture-strategist enabled: true plan: | You are an Architecture Strategist reviewing code changes. Focus areas: - SOLID principle violations - Inappropriate coupling between modules - Layering violations (e.g., handler accessing DB directly) - Missing abstraction boundaries - Inconsistency with existing patterns Output format: One finding per section with severity (P1/P2/P3), affected file/line, description, and suggested fix. priority: standard - name: test-coverage-analyzer enabled: true plan: | You are a Test Coverage Analyzer reviewing code changes. Focus areas: - Missing test coverage for new code paths - Untested error/edge cases - Test quality (meaningful assertions vs superficial) - Integration test gaps - Mock appropriateness Output format: One finding per section with severity (P1/P2/P3), affected file/line, description, and suggested fix. priority: standard # Users can add custom agents here - name: custom-domain-reviewer enabled: false plan: "Review for domain-specific business logic concerns..." priority: optional ``` #### Contract-Level Override ```yaml # In contract configuration or via CLI review: agents: # Disable agents not relevant to this contract - name: concurrency-safety enabled: false # Add contract-specific reviewer - name: migration-safety enabled: true plan: "Review database migrations for data loss risks..." ``` ### 3. Synthesis Step After all review agents complete, a synthesis task: 1. **Collects** all findings from group task outputs 2. **Deduplicates** findings about the same issue from different perspectives 3. **Resolves contradictions** (e.g., one agent says "add caching" while another says "caching adds complexity") 4. **Prioritizes** by severity and cross-agent agreement 5. **Produces** a structured review report as a contract file ```bash # Synthesis is automatically run after wait-group completes makima supervisor synthesize-review "review" \ --output-format findings \ --create-contract-file ``` ### 4. Auto-Review Trigger When a contract's phase transitions to `review`: ```rust // In phase transition handler if new_phase == "review" && contract.review_config.auto_trigger { // Spawn review group automatically spawn_review_group(contract, review_config).await?; } ``` --- ## Integration with Existing Makima Features ### Supervisor/Worker Hierarchy Review agents are spawned as **worker tasks** under the supervisor, using existing `spawn-task` infrastructure. The new `spawn-group`/`wait-group` commands are syntactic sugar over batch `spawn-task` + `wait` calls. ### Git Worktree Isolation Review agents share the supervisor's worktree in **read-only mode** (a new capability). This avoids creating N separate worktrees for review-only tasks. Implementation: - New `supervisor_worktree_task_id` parameter (already exists in SpawnTask) - New `read_only: true` flag to prevent file modifications - Workers see the same code state that triggered the review ### Contract Files The synthesized review report is stored as a **contract file** attached to the review phase: ```rust File { contract_id: contract.id, contract_phase: "review", name: "Review Report — 2026-02-09", body: vec![ BodyElement::Heading { level: 1, text: "Review Summary" }, BodyElement::Paragraph { text: "3 P1 findings, 7 P2 findings, 12 P3 findings" }, // ... structured findings ], } ``` ### Phase Guards If `phase_guard` is enabled and P1 findings exist, the phase transition from Review to Execute (or Compound) is blocked until P1s are resolved. This integrates with the existing `advance-phase` confirmation flow. ### Completion Gates Each review agent uses the existing `` mechanism to signal when its review is complete: ```xml ready: true reason: "Security review complete. Found 2 P1 and 3 P2 findings." progress: "Reviewed 47 files across 12 modules." ``` ### Circuit Breaker The existing CircuitBreaker protects against review agents getting stuck. If a review agent loops without progress for 3 iterations, it's terminated and its partial findings are included in synthesis. --- ## Implementation Plan ### Phase 1: Group Task Infrastructure (5-7 days) | Task | Effort | Description | |------|--------|-------------| | `spawn-group` command | 2 days | Batch task spawning with named groups | | `wait-group` command | 1 day | Wait for all tasks in group | | Group tracking in DB | 1 day | Task group table, membership, status | | Shared worktree (read-only) | 1-2 days | Workers share supervisor worktree | | Tests | 1 day | Unit + integration tests | ### Phase 2: Review Agent System (4-6 days) | Task | Effort | Description | |------|--------|-------------| | Review config YAML parser | 1 day | Parse `.makima/review-agents.yaml` | | `supervisor review` command | 2 days | Orchestrate review pipeline | | Synthesis agent logic | 1-2 days | Deduplicate, prioritize, format | | Review report as contract file | 1 day | Store structured output | ### Phase 3: Automation & Polish (3-5 days) | Task | Effort | Description | |------|--------|-------------| | Auto-trigger on phase transition | 1 day | Hook into `advance-phase` | | P1 merge blocking | 1 day | Phase guard integration | | Default review agent templates | 1-2 days | Ship 8-10 built-in agents | | Documentation | 1 day | User guide and config reference | --- ## Configuration Examples ### Minimal Setup (Zero Config) ```bash # Uses built-in review agents with default settings makima supervisor review ``` ### Custom Review for a Specific Contract ```bash # Override for this contract only makima supervisor review \ --only security,performance \ --merge-blocking P1 \ --timeout 300 ``` ### Full Custom Configuration ```yaml # .makima/review-agents.yaml version: 1 review: max_concurrent: 6 agent_timeout: 300 auto_trigger: true merge_blocking_severity: P1 synthesis: dedup_threshold: 0.8 # Similarity score for deduplication min_agreement: 2 # Findings flagged by 2+ agents get priority boost output_format: "findings" # "findings" | "report" | "both" create_contract_file: true agents: - name: security-sentinel enabled: true priority: critical plan: | ... - name: performance-oracle enabled: true priority: standard plan: | ... # ... more agents ``` --- ## Open Questions 1. **Shared worktree read-only enforcement**: Should this be enforced at the filesystem level (mount read-only) or via convention (instructions to the agent)? 2. **Review scope**: Should review agents see all files or only changed files (git diff)? 3. **Incremental review**: When new commits are added during review, should agents re-review or only review the delta? 4. **Agent output parsing**: Should agents output structured YAML findings, or should the synthesis step parse natural language? 5. **Cost control**: With 10+ parallel agents, how do we manage API costs? Should there be a budget ceiling per review? 6. **Finding deduplication**: What similarity threshold should trigger deduplication? How to handle partial overlaps? --- ## Alternatives Considered | Alternative | Pros | Cons | Decision | |-------------|------|------|----------| | Single comprehensive review agent | Simple, no coordination overhead | Misses perspective-specific issues | Rejected — diminishes review quality | | Sequential reviews (one after another) | Simpler orchestration | 5-10x slower; later reviews can't benefit from earlier ones | Rejected — latency unacceptable | | External review tools integration | Leverage existing static analysis | Limited to tool capabilities; no semantic review | Complement — can integrate alongside agent review | | User-configured number of agents | Maximum flexibility | Analysis paralysis for new users | Adopted — sensible defaults + customization | --- ## Priority & Complexity Assessment - **Priority: HIGH** — Multi-agent review is the highest-impact feature from the compound engineering plugin. It directly improves code quality with no change to developer workflow. - **Complexity: MEDIUM** — The core `spawn-group`/`wait-group` pattern is straightforward. The synthesis step requires careful design. Shared worktree read-only mode is a new capability. - **Risk: LOW-MEDIUM** — Main risks are resource consumption (manageable with concurrency limits) and synthesis quality (improvable iteratively).