From 76bb9da745f6c12c8e7e587a9096677bbf98f395 Mon Sep 17 00:00:00 2001 From: soryu Date: Mon, 9 Feb 2026 16:51:59 +0000 Subject: Add compound engineering feature proposals for makima (#58) Analyze the compound engineering plugin (https://github.com/EveryInc/compound-engineering-plugin) and propose 6 features inspired by its patterns for adoption into makima: - Multi-agent parallel review system (spawn-group/wait-group) - Knowledge accumulation / compound learning phase - Parallel plan deepening with research agents - Workflow presets / pipeline templates (LFG-style one-command pipelines) - Structured findings tracking with severity and lifecycle - Reusable task templates with meta-commands Co-authored-by: Claude Opus 4.6 --- docs/proposals/feature-multi-agent-review.md | 448 +++++++++++++++++++++++++++ 1 file changed, 448 insertions(+) create mode 100644 docs/proposals/feature-multi-agent-review.md (limited to 'docs/proposals/feature-multi-agent-review.md') diff --git a/docs/proposals/feature-multi-agent-review.md b/docs/proposals/feature-multi-agent-review.md new file mode 100644 index 0000000..d678756 --- /dev/null +++ b/docs/proposals/feature-multi-agent-review.md @@ -0,0 +1,448 @@ +# Feature Proposal: Multi-Agent Parallel Review System + +> **Priority:** High +> **Complexity:** Medium +> **Estimated Effort:** 12-18 days +> **Status:** Proposal +> **Date:** 2026-02-09 +> **Dependencies:** [Findings Tracking](feature-findings-tracking.md) (recommended) +> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Workflow Presets](feature-workflow-presets.md) + +--- + +## Problem Statement + +Makima's contract lifecycle includes a **Review** phase, but it currently has: + +- **No automated review mechanism** — the review phase relies entirely on manual user inspection or a single supervisor task +- **Single-perspective review** — even when a review task is spawned, it examines code from one viewpoint +- **No structured review output** — findings are captured as unstructured text in task output +- **No review templates** — each review must be configured from scratch +- **No synthesis** — when multiple reviewers exist, there's no mechanism to deduplicate and prioritize findings + +For complex contracts touching security, performance, and architecture, a single-pass review consistently misses category-specific issues that specialized reviewers would catch. + +--- + +## How Compound Engineering Solves This + +The compound engineering plugin spawns **12-15 specialized review agents in parallel**, each examining the code from a unique perspective: + +| Agent | Focus Area | Example Findings | +|-------|-----------|-----------------| +| Security Sentinel | Auth, injection, secrets, CSRF | SQL injection in user input handler | +| Performance Oracle | N+1 queries, memory leaks, caching | Unbounded list growth in event handler | +| Architecture Strategist | Coupling, SOLID, layering | Service directly accessing repository internals | +| Code Philosopher | Readability, naming, complexity | Cyclomatic complexity > 15 in payment flow | +| Data Integrity Guardian | Validation, constraints, migrations | Missing NOT NULL constraint on required field | +| Error Resilience Analyzer | Error handling, retries, fallbacks | Unhandled timeout in external API call | +| API Contract Validator | Breaking changes, versioning | Removed required field from response | +| Dependency Health Checker | Vulnerabilities, licensing, freshness | CVE-2025-XXXX in transitive dependency | +| Test Coverage Analyzer | Coverage gaps, edge cases, mocking | No tests for error path in checkout flow | +| Documentation Completeness | Docs accuracy, examples, changelog | Public API endpoint undocumented | +| Concurrency Safety | Race conditions, deadlocks, atomicity | Non-atomic read-modify-write on shared counter | + +After all agents complete, a **synthesis agent** deduplicates findings, resolves contradictions, and produces a prioritized report. + +``` +┌───────────────────────────────────────────────────────┐ +│ Review Orchestrator │ +│ │ +│ spawn-group "review" │ +│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ +│ │Security │ │ Perf │ │ Arch │ │ Code │ │ +│ │Sentinel │ │ Oracle │ │Strategy │ │ Phil │ │ +│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ +│ │ │ │ │ │ +│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │ +│ │ Data │ │ Error │ │ API │ │ Deps │ │ +│ │Guardian │ │Resilien.│ │Contract │ │ Health │ │ +│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ +│ │ │ │ │ │ +│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │ +│ │ Test │ │ Docs │ │Concurr. │ │ +│ │Coverage │ │Complete │ │ Safety │ │ +│ └────┬────┘ └────┬────┘ └────┬────┘ │ +│ │ │ │ │ +│ wait-group "review" │ +│ ▼ ▼ ▼ │ +│ ┌──────────────────────────────────────────┐ │ +│ │ Synthesis Agent │ │ +│ │ - Deduplicate findings │ │ +│ │ - Resolve contradictions │ │ +│ │ - Prioritize by severity │ │ +│ │ - Generate summary report │ │ +│ └──────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ Structured Findings │ +│ (P1 / P2 / P3) │ +└───────────────────────────────────────────────────────┘ +``` + +--- + +## Proposed Makima Implementation + +### 1. New Supervisor Commands + +#### `makima supervisor spawn-group` + +Spawns multiple tasks as a named group and returns immediately: + +```bash +# Spawn a review group with 5 agents +makima supervisor spawn-group "review" \ + --tasks '[ + {"name": "security-review", "plan": "Review for security vulnerabilities..."}, + {"name": "performance-review", "plan": "Review for performance issues..."}, + {"name": "architecture-review", "plan": "Review for architecture concerns..."} + ]' \ + --share-worktree \ + --read-only +``` + +**Key parameters:** +- `--tasks` — JSON array of task definitions +- `--share-worktree` — All tasks in the group share the supervisor's worktree (read-only access) +- `--read-only` — Tasks cannot modify files, only produce output +- `--max-concurrent N` — Limit parallel execution (default: unlimited) + +#### `makima supervisor wait-group` + +Waits for all tasks in a named group to complete: + +```bash +# Wait for all review tasks, timeout after 10 minutes +makima supervisor wait-group "review" --timeout 600 + +# Returns JSON with all task results +``` + +**Output format:** +```json +{ + "group": "review", + "status": "completed", + "tasks": [ + {"name": "security-review", "status": "done", "output": "..."}, + {"name": "performance-review", "status": "done", "output": "..."} + ], + "duration_seconds": 127 +} +``` + +#### `makima supervisor review` + +High-level command that orchestrates the full review pipeline: + +```bash +# Run review with default agent config +makima supervisor review + +# Run review with custom config +makima supervisor review --config .makima/review-agents.yaml + +# Run only specific review categories +makima supervisor review --only security,performance,architecture +``` + +### 2. Review Agent Configuration + +#### Repository-Level Configuration (`.makima/review-agents.yaml`) + +```yaml +# .makima/review-agents.yaml +version: 1 +review: + # Maximum number of concurrent review agents + max_concurrent: 8 + + # Timeout per agent (seconds) + agent_timeout: 300 + + # Auto-trigger review when phase transitions to 'review' + auto_trigger: true + + # Finding severity that blocks merge + merge_blocking_severity: P1 + + agents: + - name: security-sentinel + enabled: true + plan: | + You are a Security Sentinel reviewing code changes. + + Focus areas: + - Authentication and authorization flaws + - Injection vulnerabilities (SQL, XSS, command injection) + - Secret/credential exposure + - CSRF and session management + - Input validation gaps + + Output format: One finding per section with severity (P1/P2/P3), + affected file/line, description, and suggested fix. + priority: critical # Always runs + + - name: performance-oracle + enabled: true + plan: | + You are a Performance Oracle reviewing code changes. + + Focus areas: + - N+1 query patterns + - Memory leaks and unbounded growth + - Missing caching opportunities + - Algorithmic complexity issues + - Database index utilization + + Output format: One finding per section with severity (P1/P2/P3), + affected file/line, description, and suggested fix. + priority: standard + + - name: architecture-strategist + enabled: true + plan: | + You are an Architecture Strategist reviewing code changes. + + Focus areas: + - SOLID principle violations + - Inappropriate coupling between modules + - Layering violations (e.g., handler accessing DB directly) + - Missing abstraction boundaries + - Inconsistency with existing patterns + + Output format: One finding per section with severity (P1/P2/P3), + affected file/line, description, and suggested fix. + priority: standard + + - name: test-coverage-analyzer + enabled: true + plan: | + You are a Test Coverage Analyzer reviewing code changes. + + Focus areas: + - Missing test coverage for new code paths + - Untested error/edge cases + - Test quality (meaningful assertions vs superficial) + - Integration test gaps + - Mock appropriateness + + Output format: One finding per section with severity (P1/P2/P3), + affected file/line, description, and suggested fix. + priority: standard + + # Users can add custom agents here + - name: custom-domain-reviewer + enabled: false + plan: "Review for domain-specific business logic concerns..." + priority: optional +``` + +#### Contract-Level Override + +```yaml +# In contract configuration or via CLI +review: + agents: + # Disable agents not relevant to this contract + - name: concurrency-safety + enabled: false + # Add contract-specific reviewer + - name: migration-safety + enabled: true + plan: "Review database migrations for data loss risks..." +``` + +### 3. Synthesis Step + +After all review agents complete, a synthesis task: + +1. **Collects** all findings from group task outputs +2. **Deduplicates** findings about the same issue from different perspectives +3. **Resolves contradictions** (e.g., one agent says "add caching" while another says "caching adds complexity") +4. **Prioritizes** by severity and cross-agent agreement +5. **Produces** a structured review report as a contract file + +```bash +# Synthesis is automatically run after wait-group completes +makima supervisor synthesize-review "review" \ + --output-format findings \ + --create-contract-file +``` + +### 4. Auto-Review Trigger + +When a contract's phase transitions to `review`: + +```rust +// In phase transition handler +if new_phase == "review" && contract.review_config.auto_trigger { + // Spawn review group automatically + spawn_review_group(contract, review_config).await?; +} +``` + +--- + +## Integration with Existing Makima Features + +### Supervisor/Worker Hierarchy + +Review agents are spawned as **worker tasks** under the supervisor, using existing `spawn-task` infrastructure. The new `spawn-group`/`wait-group` commands are syntactic sugar over batch `spawn-task` + `wait` calls. + +### Git Worktree Isolation + +Review agents share the supervisor's worktree in **read-only mode** (a new capability). This avoids creating N separate worktrees for review-only tasks. Implementation: +- New `supervisor_worktree_task_id` parameter (already exists in SpawnTask) +- New `read_only: true` flag to prevent file modifications +- Workers see the same code state that triggered the review + +### Contract Files + +The synthesized review report is stored as a **contract file** attached to the review phase: +```rust +File { + contract_id: contract.id, + contract_phase: "review", + name: "Review Report — 2026-02-09", + body: vec![ + BodyElement::Heading { level: 1, text: "Review Summary" }, + BodyElement::Paragraph { text: "3 P1 findings, 7 P2 findings, 12 P3 findings" }, + // ... structured findings + ], +} +``` + +### Phase Guards + +If `phase_guard` is enabled and P1 findings exist, the phase transition from Review to Execute (or Compound) is blocked until P1s are resolved. This integrates with the existing `advance-phase` confirmation flow. + +### Completion Gates + +Each review agent uses the existing `` mechanism to signal when its review is complete: +```xml + +ready: true +reason: "Security review complete. Found 2 P1 and 3 P2 findings." +progress: "Reviewed 47 files across 12 modules." + +``` + +### Circuit Breaker + +The existing CircuitBreaker protects against review agents getting stuck. If a review agent loops without progress for 3 iterations, it's terminated and its partial findings are included in synthesis. + +--- + +## Implementation Plan + +### Phase 1: Group Task Infrastructure (5-7 days) + +| Task | Effort | Description | +|------|--------|-------------| +| `spawn-group` command | 2 days | Batch task spawning with named groups | +| `wait-group` command | 1 day | Wait for all tasks in group | +| Group tracking in DB | 1 day | Task group table, membership, status | +| Shared worktree (read-only) | 1-2 days | Workers share supervisor worktree | +| Tests | 1 day | Unit + integration tests | + +### Phase 2: Review Agent System (4-6 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Review config YAML parser | 1 day | Parse `.makima/review-agents.yaml` | +| `supervisor review` command | 2 days | Orchestrate review pipeline | +| Synthesis agent logic | 1-2 days | Deduplicate, prioritize, format | +| Review report as contract file | 1 day | Store structured output | + +### Phase 3: Automation & Polish (3-5 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Auto-trigger on phase transition | 1 day | Hook into `advance-phase` | +| P1 merge blocking | 1 day | Phase guard integration | +| Default review agent templates | 1-2 days | Ship 8-10 built-in agents | +| Documentation | 1 day | User guide and config reference | + +--- + +## Configuration Examples + +### Minimal Setup (Zero Config) + +```bash +# Uses built-in review agents with default settings +makima supervisor review +``` + +### Custom Review for a Specific Contract + +```bash +# Override for this contract only +makima supervisor review \ + --only security,performance \ + --merge-blocking P1 \ + --timeout 300 +``` + +### Full Custom Configuration + +```yaml +# .makima/review-agents.yaml +version: 1 +review: + max_concurrent: 6 + agent_timeout: 300 + auto_trigger: true + merge_blocking_severity: P1 + + synthesis: + dedup_threshold: 0.8 # Similarity score for deduplication + min_agreement: 2 # Findings flagged by 2+ agents get priority boost + output_format: "findings" # "findings" | "report" | "both" + create_contract_file: true + + agents: + - name: security-sentinel + enabled: true + priority: critical + plan: | + ... + - name: performance-oracle + enabled: true + priority: standard + plan: | + ... + # ... more agents +``` + +--- + +## Open Questions + +1. **Shared worktree read-only enforcement**: Should this be enforced at the filesystem level (mount read-only) or via convention (instructions to the agent)? +2. **Review scope**: Should review agents see all files or only changed files (git diff)? +3. **Incremental review**: When new commits are added during review, should agents re-review or only review the delta? +4. **Agent output parsing**: Should agents output structured YAML findings, or should the synthesis step parse natural language? +5. **Cost control**: With 10+ parallel agents, how do we manage API costs? Should there be a budget ceiling per review? +6. **Finding deduplication**: What similarity threshold should trigger deduplication? How to handle partial overlaps? + +--- + +## Alternatives Considered + +| Alternative | Pros | Cons | Decision | +|-------------|------|------|----------| +| Single comprehensive review agent | Simple, no coordination overhead | Misses perspective-specific issues | Rejected — diminishes review quality | +| Sequential reviews (one after another) | Simpler orchestration | 5-10x slower; later reviews can't benefit from earlier ones | Rejected — latency unacceptable | +| External review tools integration | Leverage existing static analysis | Limited to tool capabilities; no semantic review | Complement — can integrate alongside agent review | +| User-configured number of agents | Maximum flexibility | Analysis paralysis for new users | Adopted — sensible defaults + customization | + +--- + +## Priority & Complexity Assessment + +- **Priority: HIGH** — Multi-agent review is the highest-impact feature from the compound engineering plugin. It directly improves code quality with no change to developer workflow. +- **Complexity: MEDIUM** — The core `spawn-group`/`wait-group` pattern is straightforward. The synthesis step requires careful design. Shared worktree read-only mode is a new capability. +- **Risk: LOW-MEDIUM** — Main risks are resource consumption (manageable with concurrency limits) and synthesis quality (improvable iteratively). -- cgit v1.2.3