summaryrefslogtreecommitdiff
path: root/docs/proposals/feature-multi-agent-review.md
diff options
context:
space:
mode:
authorsoryu <soryu@soryu.co>2026-02-09 16:39:51 +0000
committersoryu <soryu@soryu.co>2026-02-09 16:39:51 +0000
commit463ece225baa1093b4752b716fe5a17955398d9c (patch)
tree62e1bc3938c765d652e32567ead04d1633540a8e /docs/proposals/feature-multi-agent-review.md
parent339c1769379a851c4126021132573bd4b7994cf2 (diff)
downloadsoryu-makima/task-task-046b952d-046b952d.tar.gz
soryu-makima/task-task-046b952d-046b952d.zip
Add compound engineering feature proposals for makimamakima/task-task-046b952d-046b952d
Analyze the compound engineering plugin (https://github.com/EveryInc/compound-engineering-plugin) and propose 6 features inspired by its patterns for adoption into makima: - Multi-agent parallel review system (spawn-group/wait-group) - Knowledge accumulation / compound learning phase - Parallel plan deepening with research agents - Workflow presets / pipeline templates (LFG-style one-command pipelines) - Structured findings tracking with severity and lifecycle - Reusable task templates with meta-commands Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diffstat (limited to 'docs/proposals/feature-multi-agent-review.md')
-rw-r--r--docs/proposals/feature-multi-agent-review.md448
1 files changed, 448 insertions, 0 deletions
diff --git a/docs/proposals/feature-multi-agent-review.md b/docs/proposals/feature-multi-agent-review.md
new file mode 100644
index 0000000..d678756
--- /dev/null
+++ b/docs/proposals/feature-multi-agent-review.md
@@ -0,0 +1,448 @@
+# Feature Proposal: Multi-Agent Parallel Review System
+
+> **Priority:** High
+> **Complexity:** Medium
+> **Estimated Effort:** 12-18 days
+> **Status:** Proposal
+> **Date:** 2026-02-09
+> **Dependencies:** [Findings Tracking](feature-findings-tracking.md) (recommended)
+> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Workflow Presets](feature-workflow-presets.md)
+
+---
+
+## Problem Statement
+
+Makima's contract lifecycle includes a **Review** phase, but it currently has:
+
+- **No automated review mechanism** — the review phase relies entirely on manual user inspection or a single supervisor task
+- **Single-perspective review** — even when a review task is spawned, it examines code from one viewpoint
+- **No structured review output** — findings are captured as unstructured text in task output
+- **No review templates** — each review must be configured from scratch
+- **No synthesis** — when multiple reviewers exist, there's no mechanism to deduplicate and prioritize findings
+
+For complex contracts touching security, performance, and architecture, a single-pass review consistently misses category-specific issues that specialized reviewers would catch.
+
+---
+
+## How Compound Engineering Solves This
+
+The compound engineering plugin spawns **12-15 specialized review agents in parallel**, each examining the code from a unique perspective:
+
+| Agent | Focus Area | Example Findings |
+|-------|-----------|-----------------|
+| Security Sentinel | Auth, injection, secrets, CSRF | SQL injection in user input handler |
+| Performance Oracle | N+1 queries, memory leaks, caching | Unbounded list growth in event handler |
+| Architecture Strategist | Coupling, SOLID, layering | Service directly accessing repository internals |
+| Code Philosopher | Readability, naming, complexity | Cyclomatic complexity > 15 in payment flow |
+| Data Integrity Guardian | Validation, constraints, migrations | Missing NOT NULL constraint on required field |
+| Error Resilience Analyzer | Error handling, retries, fallbacks | Unhandled timeout in external API call |
+| API Contract Validator | Breaking changes, versioning | Removed required field from response |
+| Dependency Health Checker | Vulnerabilities, licensing, freshness | CVE-2025-XXXX in transitive dependency |
+| Test Coverage Analyzer | Coverage gaps, edge cases, mocking | No tests for error path in checkout flow |
+| Documentation Completeness | Docs accuracy, examples, changelog | Public API endpoint undocumented |
+| Concurrency Safety | Race conditions, deadlocks, atomicity | Non-atomic read-modify-write on shared counter |
+
+After all agents complete, a **synthesis agent** deduplicates findings, resolves contradictions, and produces a prioritized report.
+
+```
+┌───────────────────────────────────────────────────────┐
+│ Review Orchestrator │
+│ │
+│ spawn-group "review" │
+│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
+│ │Security │ │ Perf │ │ Arch │ │ Code │ │
+│ │Sentinel │ │ Oracle │ │Strategy │ │ Phil │ │
+│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
+│ │ │ │ │ │
+│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │
+│ │ Data │ │ Error │ │ API │ │ Deps │ │
+│ │Guardian │ │Resilien.│ │Contract │ │ Health │ │
+│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
+│ │ │ │ │ │
+│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │
+│ │ Test │ │ Docs │ │Concurr. │ │
+│ │Coverage │ │Complete │ │ Safety │ │
+│ └────┬────┘ └────┬────┘ └────┬────┘ │
+│ │ │ │ │
+│ wait-group "review" │
+│ ▼ ▼ ▼ │
+│ ┌──────────────────────────────────────────┐ │
+│ │ Synthesis Agent │ │
+│ │ - Deduplicate findings │ │
+│ │ - Resolve contradictions │ │
+│ │ - Prioritize by severity │ │
+│ │ - Generate summary report │ │
+│ └──────────────────────────────────────────┘ │
+│ │ │
+│ ▼ │
+│ Structured Findings │
+│ (P1 / P2 / P3) │
+└───────────────────────────────────────────────────────┘
+```
+
+---
+
+## Proposed Makima Implementation
+
+### 1. New Supervisor Commands
+
+#### `makima supervisor spawn-group`
+
+Spawns multiple tasks as a named group and returns immediately:
+
+```bash
+# Spawn a review group with 5 agents
+makima supervisor spawn-group "review" \
+ --tasks '[
+ {"name": "security-review", "plan": "Review for security vulnerabilities..."},
+ {"name": "performance-review", "plan": "Review for performance issues..."},
+ {"name": "architecture-review", "plan": "Review for architecture concerns..."}
+ ]' \
+ --share-worktree \
+ --read-only
+```
+
+**Key parameters:**
+- `--tasks` — JSON array of task definitions
+- `--share-worktree` — All tasks in the group share the supervisor's worktree (read-only access)
+- `--read-only` — Tasks cannot modify files, only produce output
+- `--max-concurrent N` — Limit parallel execution (default: unlimited)
+
+#### `makima supervisor wait-group`
+
+Waits for all tasks in a named group to complete:
+
+```bash
+# Wait for all review tasks, timeout after 10 minutes
+makima supervisor wait-group "review" --timeout 600
+
+# Returns JSON with all task results
+```
+
+**Output format:**
+```json
+{
+ "group": "review",
+ "status": "completed",
+ "tasks": [
+ {"name": "security-review", "status": "done", "output": "..."},
+ {"name": "performance-review", "status": "done", "output": "..."}
+ ],
+ "duration_seconds": 127
+}
+```
+
+#### `makima supervisor review`
+
+High-level command that orchestrates the full review pipeline:
+
+```bash
+# Run review with default agent config
+makima supervisor review
+
+# Run review with custom config
+makima supervisor review --config .makima/review-agents.yaml
+
+# Run only specific review categories
+makima supervisor review --only security,performance,architecture
+```
+
+### 2. Review Agent Configuration
+
+#### Repository-Level Configuration (`.makima/review-agents.yaml`)
+
+```yaml
+# .makima/review-agents.yaml
+version: 1
+review:
+ # Maximum number of concurrent review agents
+ max_concurrent: 8
+
+ # Timeout per agent (seconds)
+ agent_timeout: 300
+
+ # Auto-trigger review when phase transitions to 'review'
+ auto_trigger: true
+
+ # Finding severity that blocks merge
+ merge_blocking_severity: P1
+
+ agents:
+ - name: security-sentinel
+ enabled: true
+ plan: |
+ You are a Security Sentinel reviewing code changes.
+
+ Focus areas:
+ - Authentication and authorization flaws
+ - Injection vulnerabilities (SQL, XSS, command injection)
+ - Secret/credential exposure
+ - CSRF and session management
+ - Input validation gaps
+
+ Output format: One finding per section with severity (P1/P2/P3),
+ affected file/line, description, and suggested fix.
+ priority: critical # Always runs
+
+ - name: performance-oracle
+ enabled: true
+ plan: |
+ You are a Performance Oracle reviewing code changes.
+
+ Focus areas:
+ - N+1 query patterns
+ - Memory leaks and unbounded growth
+ - Missing caching opportunities
+ - Algorithmic complexity issues
+ - Database index utilization
+
+ Output format: One finding per section with severity (P1/P2/P3),
+ affected file/line, description, and suggested fix.
+ priority: standard
+
+ - name: architecture-strategist
+ enabled: true
+ plan: |
+ You are an Architecture Strategist reviewing code changes.
+
+ Focus areas:
+ - SOLID principle violations
+ - Inappropriate coupling between modules
+ - Layering violations (e.g., handler accessing DB directly)
+ - Missing abstraction boundaries
+ - Inconsistency with existing patterns
+
+ Output format: One finding per section with severity (P1/P2/P3),
+ affected file/line, description, and suggested fix.
+ priority: standard
+
+ - name: test-coverage-analyzer
+ enabled: true
+ plan: |
+ You are a Test Coverage Analyzer reviewing code changes.
+
+ Focus areas:
+ - Missing test coverage for new code paths
+ - Untested error/edge cases
+ - Test quality (meaningful assertions vs superficial)
+ - Integration test gaps
+ - Mock appropriateness
+
+ Output format: One finding per section with severity (P1/P2/P3),
+ affected file/line, description, and suggested fix.
+ priority: standard
+
+ # Users can add custom agents here
+ - name: custom-domain-reviewer
+ enabled: false
+ plan: "Review for domain-specific business logic concerns..."
+ priority: optional
+```
+
+#### Contract-Level Override
+
+```yaml
+# In contract configuration or via CLI
+review:
+ agents:
+ # Disable agents not relevant to this contract
+ - name: concurrency-safety
+ enabled: false
+ # Add contract-specific reviewer
+ - name: migration-safety
+ enabled: true
+ plan: "Review database migrations for data loss risks..."
+```
+
+### 3. Synthesis Step
+
+After all review agents complete, a synthesis task:
+
+1. **Collects** all findings from group task outputs
+2. **Deduplicates** findings about the same issue from different perspectives
+3. **Resolves contradictions** (e.g., one agent says "add caching" while another says "caching adds complexity")
+4. **Prioritizes** by severity and cross-agent agreement
+5. **Produces** a structured review report as a contract file
+
+```bash
+# Synthesis is automatically run after wait-group completes
+makima supervisor synthesize-review "review" \
+ --output-format findings \
+ --create-contract-file
+```
+
+### 4. Auto-Review Trigger
+
+When a contract's phase transitions to `review`:
+
+```rust
+// In phase transition handler
+if new_phase == "review" && contract.review_config.auto_trigger {
+ // Spawn review group automatically
+ spawn_review_group(contract, review_config).await?;
+}
+```
+
+---
+
+## Integration with Existing Makima Features
+
+### Supervisor/Worker Hierarchy
+
+Review agents are spawned as **worker tasks** under the supervisor, using existing `spawn-task` infrastructure. The new `spawn-group`/`wait-group` commands are syntactic sugar over batch `spawn-task` + `wait` calls.
+
+### Git Worktree Isolation
+
+Review agents share the supervisor's worktree in **read-only mode** (a new capability). This avoids creating N separate worktrees for review-only tasks. Implementation:
+- New `supervisor_worktree_task_id` parameter (already exists in SpawnTask)
+- New `read_only: true` flag to prevent file modifications
+- Workers see the same code state that triggered the review
+
+### Contract Files
+
+The synthesized review report is stored as a **contract file** attached to the review phase:
+```rust
+File {
+ contract_id: contract.id,
+ contract_phase: "review",
+ name: "Review Report — 2026-02-09",
+ body: vec![
+ BodyElement::Heading { level: 1, text: "Review Summary" },
+ BodyElement::Paragraph { text: "3 P1 findings, 7 P2 findings, 12 P3 findings" },
+ // ... structured findings
+ ],
+}
+```
+
+### Phase Guards
+
+If `phase_guard` is enabled and P1 findings exist, the phase transition from Review to Execute (or Compound) is blocked until P1s are resolved. This integrates with the existing `advance-phase` confirmation flow.
+
+### Completion Gates
+
+Each review agent uses the existing `<COMPLETION_GATE>` mechanism to signal when its review is complete:
+```xml
+<COMPLETION_GATE>
+ready: true
+reason: "Security review complete. Found 2 P1 and 3 P2 findings."
+progress: "Reviewed 47 files across 12 modules."
+</COMPLETION_GATE>
+```
+
+### Circuit Breaker
+
+The existing CircuitBreaker protects against review agents getting stuck. If a review agent loops without progress for 3 iterations, it's terminated and its partial findings are included in synthesis.
+
+---
+
+## Implementation Plan
+
+### Phase 1: Group Task Infrastructure (5-7 days)
+
+| Task | Effort | Description |
+|------|--------|-------------|
+| `spawn-group` command | 2 days | Batch task spawning with named groups |
+| `wait-group` command | 1 day | Wait for all tasks in group |
+| Group tracking in DB | 1 day | Task group table, membership, status |
+| Shared worktree (read-only) | 1-2 days | Workers share supervisor worktree |
+| Tests | 1 day | Unit + integration tests |
+
+### Phase 2: Review Agent System (4-6 days)
+
+| Task | Effort | Description |
+|------|--------|-------------|
+| Review config YAML parser | 1 day | Parse `.makima/review-agents.yaml` |
+| `supervisor review` command | 2 days | Orchestrate review pipeline |
+| Synthesis agent logic | 1-2 days | Deduplicate, prioritize, format |
+| Review report as contract file | 1 day | Store structured output |
+
+### Phase 3: Automation & Polish (3-5 days)
+
+| Task | Effort | Description |
+|------|--------|-------------|
+| Auto-trigger on phase transition | 1 day | Hook into `advance-phase` |
+| P1 merge blocking | 1 day | Phase guard integration |
+| Default review agent templates | 1-2 days | Ship 8-10 built-in agents |
+| Documentation | 1 day | User guide and config reference |
+
+---
+
+## Configuration Examples
+
+### Minimal Setup (Zero Config)
+
+```bash
+# Uses built-in review agents with default settings
+makima supervisor review
+```
+
+### Custom Review for a Specific Contract
+
+```bash
+# Override for this contract only
+makima supervisor review \
+ --only security,performance \
+ --merge-blocking P1 \
+ --timeout 300
+```
+
+### Full Custom Configuration
+
+```yaml
+# .makima/review-agents.yaml
+version: 1
+review:
+ max_concurrent: 6
+ agent_timeout: 300
+ auto_trigger: true
+ merge_blocking_severity: P1
+
+ synthesis:
+ dedup_threshold: 0.8 # Similarity score for deduplication
+ min_agreement: 2 # Findings flagged by 2+ agents get priority boost
+ output_format: "findings" # "findings" | "report" | "both"
+ create_contract_file: true
+
+ agents:
+ - name: security-sentinel
+ enabled: true
+ priority: critical
+ plan: |
+ ...
+ - name: performance-oracle
+ enabled: true
+ priority: standard
+ plan: |
+ ...
+ # ... more agents
+```
+
+---
+
+## Open Questions
+
+1. **Shared worktree read-only enforcement**: Should this be enforced at the filesystem level (mount read-only) or via convention (instructions to the agent)?
+2. **Review scope**: Should review agents see all files or only changed files (git diff)?
+3. **Incremental review**: When new commits are added during review, should agents re-review or only review the delta?
+4. **Agent output parsing**: Should agents output structured YAML findings, or should the synthesis step parse natural language?
+5. **Cost control**: With 10+ parallel agents, how do we manage API costs? Should there be a budget ceiling per review?
+6. **Finding deduplication**: What similarity threshold should trigger deduplication? How to handle partial overlaps?
+
+---
+
+## Alternatives Considered
+
+| Alternative | Pros | Cons | Decision |
+|-------------|------|------|----------|
+| Single comprehensive review agent | Simple, no coordination overhead | Misses perspective-specific issues | Rejected — diminishes review quality |
+| Sequential reviews (one after another) | Simpler orchestration | 5-10x slower; later reviews can't benefit from earlier ones | Rejected — latency unacceptable |
+| External review tools integration | Leverage existing static analysis | Limited to tool capabilities; no semantic review | Complement — can integrate alongside agent review |
+| User-configured number of agents | Maximum flexibility | Analysis paralysis for new users | Adopted — sensible defaults + customization |
+
+---
+
+## Priority & Complexity Assessment
+
+- **Priority: HIGH** — Multi-agent review is the highest-impact feature from the compound engineering plugin. It directly improves code quality with no change to developer workflow.
+- **Complexity: MEDIUM** — The core `spawn-group`/`wait-group` pattern is straightforward. The synthesis step requires careful design. Shared worktree read-only mode is a new capability.
+- **Risk: LOW-MEDIUM** — Main risks are resource consumption (manageable with concurrency limits) and synthesis quality (improvable iteratively).