diff options
| author | soryu <soryu@soryu.co> | 2026-02-09 16:51:59 +0000 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2026-02-09 16:51:59 +0000 |
| commit | 76bb9da745f6c12c8e7e587a9096677bbf98f395 (patch) | |
| tree | 5bd856d1018c6fab4700b625e5ffefb344200bf4 /docs | |
| parent | 268cdce19b1e17128cb8806bee7e0ead1afaa95b (diff) | |
| download | soryu-76bb9da745f6c12c8e7e587a9096677bbf98f395.tar.gz soryu-76bb9da745f6c12c8e7e587a9096677bbf98f395.zip | |
Add compound engineering feature proposals for makima (#58)
Analyze the compound engineering plugin (https://github.com/EveryInc/compound-engineering-plugin)
and propose 6 features inspired by its patterns for adoption into makima:
- Multi-agent parallel review system (spawn-group/wait-group)
- Knowledge accumulation / compound learning phase
- Parallel plan deepening with research agents
- Workflow presets / pipeline templates (LFG-style one-command pipelines)
- Structured findings tracking with severity and lifecycle
- Reusable task templates with meta-commands
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/proposals/compound-engineering-analysis.md | 300 | ||||
| -rw-r--r-- | docs/proposals/feature-findings-tracking.md | 504 | ||||
| -rw-r--r-- | docs/proposals/feature-knowledge-accumulation.md | 539 | ||||
| -rw-r--r-- | docs/proposals/feature-multi-agent-review.md | 448 | ||||
| -rw-r--r-- | docs/proposals/feature-plan-deepening.md | 383 | ||||
| -rw-r--r-- | docs/proposals/feature-task-templates.md | 602 | ||||
| -rw-r--r-- | docs/proposals/feature-workflow-presets.md | 623 |
7 files changed, 3399 insertions, 0 deletions
diff --git a/docs/proposals/compound-engineering-analysis.md b/docs/proposals/compound-engineering-analysis.md new file mode 100644 index 0000000..5a8c6da --- /dev/null +++ b/docs/proposals/compound-engineering-analysis.md @@ -0,0 +1,300 @@ +# Compound Engineering Plugin — Analysis & Makima Feature Mapping + +> **Document Type:** Overview Analysis +> **Status:** Proposal +> **Date:** 2026-02-09 +> **Related Proposals:** [Multi-Agent Review](feature-multi-agent-review.md) · [Knowledge Accumulation](feature-knowledge-accumulation.md) · [Plan Deepening](feature-plan-deepening.md) · [Workflow Presets](feature-workflow-presets.md) · [Findings Tracking](feature-findings-tracking.md) · [Task Templates](feature-task-templates.md) + +--- + +## Executive Summary + +The [Compound Engineering Plugin](https://github.com/EveryInc/compound-engineering-plugin) is a Claude Code plugin comprising **29 agents, 25 commands, 16 skills, and 1 MCP server**. Its core innovation is a self-reinforcing engineering loop where every unit of work makes subsequent work easier—not harder. + +This document analyzes the plugin's architecture, maps its capabilities against makima's existing features, identifies gaps, and proposes a phased adoption strategy. The compound engineering plugin excels at **within-session orchestration** (parallel review agents, plan deepening, knowledge capture), while makima excels at **cross-session orchestration** (contract lifecycle, worktree isolation, DAG-based directives). Combining both creates a uniquely powerful system. + +--- + +## Core Philosophy + +> *"Each unit of engineering work should make subsequent units easier—not harder."* + +The plugin operationalizes this through a four-phase feedback loop where the critical **Compound** step captures learnings that feed back into future planning: + +``` +┌─────────────────────────────────────────────────────────┐ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ │ │ │ │ │ │ │ +│ │ PLAN │───▶│ WORK │───▶│ REVIEW │ │ +│ │ │ │ │ │ │ │ +│ └──────────┘ └──────────┘ └──────────┘ │ +│ ▲ │ │ +│ │ ┌──────────┐ │ │ +│ │ │ │ │ │ +│ └──────────│ COMPOUND │◀────────┘ │ +│ │ │ │ +│ Learnings fed └──────────┘ Captures solutions, │ +│ back into │ patterns, failures │ +│ future plans ▼ │ +│ docs/solutions/ │ +│ ├── build-errors/ │ +│ ├── test-failures/ │ +│ ├── api-patterns/ │ +│ └── ...9 categories │ +│ │ +└─────────────────────────────────────────────────────────┘ +``` + +This maps directly to makima's contract phases: **Research → Specify → Plan → Execute → Review** with a proposed new **Compound** phase inserted after Review. + +--- + +## Plugin Architecture Overview + +### Agent Categories (29 Total) + +| Category | Count | Examples | +|----------|-------|---------| +| Review Agents | 12-15 | Security Sentinel, Performance Oracle, Architecture Strategist, Code Philosopher, Data Integrity Guardian, Error Resilience Analyzer, API Contract Validator, Dependency Health Checker, Test Coverage Analyzer, Documentation Completeness, Concurrency Safety | +| Research Agents | 20-40 | Best practices, edge case analysis, dependency research, pattern matching | +| Learning Agents | 5 | Context extractor, solution documenter, prevention strategist, categorizer, doc linker | +| Pipeline Agents | ~5 | LFG orchestrator, SLFG parallelizer, phase coordinators | +| Meta Agents | 2-3 | Agent creator, skill healer, template generator | + +### Command Categories (25 Total) + +| Category | Key Commands | Description | +|----------|-------------|-------------| +| Planning | `/plan`, `/deepen-plan` | Create and enhance implementation plans | +| Execution | `/lfg`, `/slfg` | Full autonomous pipelines (serial/parallel) | +| Review | `/parallel-review`, `/review` | Multi-agent code review | +| Learning | `/compound`, `/search-learnings` | Capture and retrieve knowledge | +| Meta | `/create-agent-skill`, `/heal-skill` | Self-improving tooling | +| Findings | `/create-todo`, `/resolve-todo` | Structured issue tracking | + +### Skill Categories (16 Total) + +Skills provide specialized capabilities including code analysis, pattern detection, security scanning, performance profiling, and documentation generation. + +### MCP Server (1) + +Provides tool access for agents to interact with the file system, git, and external services during parallel execution. + +--- + +## Agent-Native Architecture Concepts + +The compound engineering plugin embraces an **agent-native** design philosophy: + +1. **Parallel-First**: Tasks that can be parallelized are always parallelized (review agents, research agents, learning sub-agents) +2. **Structured Output**: All agent outputs use YAML frontmatter + markdown, enabling machine parsing +3. **Swarm Orchestration**: Groups of agents with synchronization gates (spawn N → wait for all → synthesize) +4. **Self-Healing**: Meta-commands detect broken skills and auto-repair them +5. **Progressive Enhancement**: Plans start simple, then are "deepened" with research results + +--- + +## Mapping to Makima's Architecture + +### What Makima Already Has + +| Compound Engineering Feature | Makima Equivalent | Coverage | +|------------------------------|-------------------|----------| +| Plan → Work → Review loop | Contract phases (Research → Specify → Plan → Execute → Review) | ✅ Full | +| Task orchestration | Supervisor/worker hierarchy with `spawn-task` | ✅ Full | +| Parallel task execution | Multiple workers in separate worktrees | ✅ Full | +| Task isolation | Git worktree per task | ✅ Full | +| Phase transitions | `supervisor advance-phase` with phase guards | ✅ Full | +| Pipeline orchestration | Directive system with DAG dependencies | ✅ Full | +| User interaction during execution | `supervisor ask` with timeout/choices | ✅ Full | +| Task continuation | `continue_from_task_id`, `--continue` flag | ✅ Full | +| Branching/forking | `supervisor branch`, `task-fork`, `task-rewind` | ✅ Full | +| Circuit breakers | CircuitBreaker (max iterations, stuck detection) | ✅ Full | +| Completion gates | `<COMPLETION_GATE>` parsing in autonomous loop | ✅ Full | +| Document management | Contract files with versioning, structured body | ✅ Full | + +### What Makima Is Missing (Gaps) + +| Compound Engineering Feature | Makima Gap | Priority | Proposal | +|------------------------------|-----------|----------|----------| +| Multi-agent parallel review | No automated review, no review task templates | **High** | [feature-multi-agent-review.md](feature-multi-agent-review.md) | +| Compound learning / knowledge accumulation | No cross-contract knowledge capture | **High** | [feature-knowledge-accumulation.md](feature-knowledge-accumulation.md) | +| Plan deepening with research agents | Single-pass planning, no research integration | **Medium** | [feature-plan-deepening.md](feature-plan-deepening.md) | +| One-command pipelines (LFG/SLFG) | Manual orchestration per contract | **High** | [feature-workflow-presets.md](feature-workflow-presets.md) | +| Structured findings/TODOs | Unstructured review output | **Medium** | [feature-findings-tracking.md](feature-findings-tracking.md) | +| Reusable task/agent templates | Ad-hoc plans, no template reuse | **Medium** | [feature-task-templates.md](feature-task-templates.md) | + +--- + +## Feature Set Summary + +| # | Feature | Priority | Complexity | Effort | Proposal | +|---|---------|----------|------------|--------|----------| +| 1 | Multi-Agent Parallel Review | High | Medium | 12-18 days | [Link](feature-multi-agent-review.md) | +| 2 | Knowledge Accumulation | High | Medium | 10-15 days | [Link](feature-knowledge-accumulation.md) | +| 3 | Plan Deepening | Medium | Low | 5-8 days | [Link](feature-plan-deepening.md) | +| 4 | Workflow Presets | High | Medium | 10-15 days | [Link](feature-workflow-presets.md) | +| 5 | Findings Tracking | Medium | Low | 7-10 days | [Link](feature-findings-tracking.md) | +| 6 | Task Templates | Medium | Medium | 8-12 days | [Link](feature-task-templates.md) | +| | **Total** | | | **52-78 days** | | + +--- + +## Implementation Strategy + +### Recommended Phasing + +``` +Phase 1: Foundations (Weeks 1-4) +├── Workflow Presets ────────── Enables one-command pipelines +└── Findings Tracking ──────── Structured review output format + +Phase 2: Core Loop (Weeks 5-9) +├── Multi-Agent Review ──────── Automated parallel review +└── Knowledge Accumulation ──── Cross-contract learning + +Phase 3: Enhancement (Weeks 10-13) +├── Plan Deepening ──────────── Research-enhanced planning +└── Task Templates ──────────── Reusable patterns +``` + +**Rationale for ordering:** + +1. **Phase 1** builds infrastructure that Phase 2 depends on: + - Workflow Presets provide the pipeline framework that Review and Learning plug into + - Findings Tracking provides the structured output format that Review agents produce + +2. **Phase 2** implements the core compound loop: + - Multi-Agent Review produces structured findings + - Knowledge Accumulation closes the feedback loop + +3. **Phase 3** optimizes the system: + - Plan Deepening uses the knowledge base to enhance plans + - Task Templates codify proven patterns for reuse + +### Integration Points Between Features + +``` + ┌─────────────────┐ + │ Workflow Presets │ + │ (orchestrator) │ + └────────┬────────┘ + │ triggers phases + ┌──────────────┼──────────────┐ + ▼ ▼ ▼ + ┌────────────┐ ┌──────────────┐ ┌───────────┐ + │ Plan │ │ Multi-Agent │ │ Knowledge │ + │ Deepening │ │ Review │ │ Accum. │ + └─────┬──────┘ └──────┬───────┘ └─────┬─────┘ + │ │ │ + │ produces │ │ + │ ▼ │ + │ ┌──────────────┐ │ + │ │ Findings │ │ + │ │ Tracking │ │ + │ └──────────────┘ │ + │ │ + └──────── feeds into ──────────────┘ + │ + ┌────┴─────┐ + │ Task │ + │ Templates│ + └──────────┘ + codifies patterns +``` + +--- + +## Competitive Analysis + +### Compound Engineering Plugin Strengths + +| Strength | Detail | +|----------|--------| +| **Depth of review** | 12-15 specialized reviewers catch issues a single reviewer misses | +| **Knowledge compounding** | Learnings are never lost; they compound over time | +| **One-command pipelines** | `/lfg` runs full plan→work→review→compound cycle | +| **Self-improvement** | Meta-commands create new agents/skills on demand | +| **Swarm patterns** | Sophisticated parallel group management | + +### Makima Strengths + +| Strength | Detail | +|----------|--------| +| **True isolation** | Git worktrees provide real filesystem isolation, not just context isolation | +| **Persistent orchestration** | Contracts survive across sessions; plugin agents are ephemeral | +| **DAG execution** | Directives model complex dependency graphs natively | +| **User interaction** | Rich question/answer system with timeouts and multi-select | +| **Infrastructure** | Server-based architecture with WebSocket real-time communication | +| **Checkpoint/recovery** | Full task rewind, fork, and patch-based recovery | +| **Phase governance** | Phase guards require explicit user approval for transitions | + +### Combined Value Proposition + +| Dimension | Plugin Alone | Makima Alone | Combined | +|-----------|-------------|-------------|----------| +| Review quality | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐⭐ | +| Task isolation | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | +| Knowledge retention | ⭐⭐⭐⭐ | ⭐ | ⭐⭐⭐⭐⭐ | +| Persistent orchestration | ⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | +| Pipeline automation | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐⭐ | +| Self-improvement | ⭐⭐⭐⭐ | ⭐ | ⭐⭐⭐⭐ | + +--- + +## Risk Analysis + +### Technical Risks + +| Risk | Impact | Likelihood | Mitigation | +|------|--------|------------|------------| +| Parallel review agents overwhelm system resources | High | Medium | Implement concurrency limits; use makima's existing CircuitBreaker | +| Knowledge base grows unwieldy | Medium | High | Implement relevance decay, deduplication, and quality gates | +| Workflow presets too rigid for diverse use cases | Medium | Medium | Support variable substitution and optional steps | +| Review synthesis produces noisy/contradictory results | Medium | Medium | Weighted deduplication with priority-based conflict resolution | +| Template proliferation creates maintenance burden | Low | Medium | Template versioning and deprecation lifecycle | + +### Organizational Risks + +| Risk | Impact | Likelihood | Mitigation | +|------|--------|------------|------------| +| Scope creep across all 6 features | High | High | Strict phasing; each feature is independently shippable | +| Users don't adopt knowledge accumulation habits | Medium | Medium | Make it automatic (not opt-in); integrate with workflow presets | +| Configuration complexity deters users | Medium | Medium | Sensible defaults; progressive disclosure of configuration | + +--- + +## Success Metrics + +### Per-Feature Metrics + +| Feature | Key Metric | Target | +|---------|-----------|--------| +| Multi-Agent Review | Defects caught before merge | 40% increase vs single review | +| Knowledge Accumulation | Knowledge reuse rate | >30% of new contracts reference existing learnings | +| Plan Deepening | Plan revision rate after execution starts | <15% (down from estimated ~40%) | +| Workflow Presets | Time from contract creation to first commit | 50% reduction | +| Findings Tracking | Finding resolution rate | >85% of P1/P2 findings resolved | +| Task Templates | Template reuse rate | >25% of tasks use templates after 3 months | + +### System-Level Metrics + +- **Cycle time**: Time from contract creation to completion — target 30% reduction +- **Defect escape rate**: Issues found post-merge — target 50% reduction +- **Knowledge density**: Learnings per contract — target >2.5 after 6 months +- **User satisfaction**: Survey score — target >4.2/5.0 + +--- + +## Conclusion + +The compound engineering plugin represents a mature implementation of agent-native engineering workflows. Its greatest innovations—parallel multi-perspective review, knowledge compounding, and autonomous pipelines—address real gaps in makima's current capabilities. + +Makima's infrastructure advantages (true worktree isolation, persistent contracts, DAG-based directives, server architecture) provide a superior foundation for implementing these features. The proposed phased approach delivers incremental value while building toward the full compound engineering loop. + +The combined system would offer something neither tool provides alone: **persistent, isolated, knowledge-compounding engineering workflows with multi-agent review and one-command pipeline automation**. + +--- + +*Next steps: Review individual feature proposals for detailed implementation plans.* diff --git a/docs/proposals/feature-findings-tracking.md b/docs/proposals/feature-findings-tracking.md new file mode 100644 index 0000000..bb8a68e --- /dev/null +++ b/docs/proposals/feature-findings-tracking.md @@ -0,0 +1,504 @@ +# Feature Proposal: Structured Findings / Issues Tracking + +> **Priority:** Medium +> **Complexity:** Low +> **Estimated Effort:** 7-10 days +> **Status:** Proposal +> **Date:** 2026-02-09 +> **Dependencies:** None (standalone, but enhances [Multi-Agent Review](feature-multi-agent-review.md)) +> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Multi-Agent Review](feature-multi-agent-review.md) · [Workflow Presets](feature-workflow-presets.md) + +--- + +## Problem Statement + +Currently, review outputs in makima are **unstructured text** in task conversation history: + +- **No standard format** for reporting issues found during review +- **No severity classification** — all findings are treated equally +- **No lifecycle tracking** — findings are either "in the review output" or "hopefully fixed" +- **No verification** — there's no way to confirm a finding was actually resolved +- **No aggregation** — findings from multiple review tasks can't be collected and deduplicated +- **No blocking mechanism** — critical findings can't prevent phase transitions +- **No metrics** — no data on how many findings are produced, resolved, or escaped + +This makes the review phase a documentation exercise rather than a quality gate. + +--- + +## How Compound Engineering Solves This + +The compound engineering plugin uses **structured TODO/finding files** with YAML frontmatter and a defined lifecycle: + +### File Format + +```markdown +--- +id: SEC-001 +status: open +priority: P1 +category: security +title: SQL injection in user search endpoint +file: src/api/users.rs +line: 47 +agent: security-sentinel +created: 2026-02-09T10:30:00Z +updated: 2026-02-09T10:30:00Z +tags: [injection, input-validation, database] +--- + +# SQL Injection in User Search Endpoint + +## Finding +The `search_users` handler directly interpolates the `query` parameter into +a SQL string without parameterization. + +## Evidence +```rust +// src/api/users.rs:47 +let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query); +``` + +## Impact +An attacker can execute arbitrary SQL queries, potentially: +- Exfiltrating all user data +- Modifying or deleting records +- Escalating privileges + +## Recommendation +Use parameterized queries: +```rust +let results = sqlx::query("SELECT * FROM users WHERE name LIKE $1") + .bind(format!("%{}%", query)) + .fetch_all(&pool) + .await?; +``` + +## Resolution +_Not yet resolved_ +``` + +### File Naming Convention + +``` +findings/{issue_id}-{status}-{priority}-{description}.md +``` + +Example: `findings/SEC-001-open-P1-sql-injection-user-search.md` + +### Lifecycle + +``` +open ──▶ in-progress ──▶ resolved ──▶ verified + │ │ + └── wont-fix ◀────────────┘ +``` + +--- + +## Proposed Makima Implementation + +### 1. Finding Record Format + +Findings are stored as **contract files** with structured metadata and body: + +```rust +// Finding metadata (stored in file description as structured JSON) +#[derive(Serialize, Deserialize)] +pub struct FindingMetadata { + pub id: String, // "SEC-001", auto-generated + pub status: FindingStatus, // open, in_progress, resolved, verified, wont_fix + pub severity: FindingSeverity, // P1 (critical), P2 (major), P3 (minor) + pub category: String, // security, performance, architecture, etc. + pub title: String, // Short description + pub file_path: Option<String>, // Affected file + pub line_number: Option<u32>, // Affected line + pub source_agent: Option<String>, // Which review agent found this + pub source_task_id: Option<Uuid>, // Task that produced this finding + pub assigned_to: Option<Uuid>, // Task assigned to resolve this + pub created_at: DateTime<Utc>, + pub updated_at: DateTime<Utc>, + pub resolved_at: Option<DateTime<Utc>>, + pub verified_at: Option<DateTime<Utc>>, + pub tags: Vec<String>, +} + +pub enum FindingStatus { + Open, + InProgress, + Resolved, + Verified, + WontFix, +} + +pub enum FindingSeverity { + P1, // Critical — must fix before merge + P2, // Major — should fix, can defer with justification + P3, // Minor — nice to fix, can defer +} +``` + +### 2. Supervisor Commands + +#### Create a Finding + +```bash +# Create a finding from review output +makima supervisor finding create \ + --severity P1 \ + --category security \ + --title "SQL injection in user search endpoint" \ + --file src/api/users.rs \ + --line 47 \ + --description "Direct string interpolation in SQL query" + +# Output: Created finding SEC-001 (P1/security) +``` + +#### List Findings + +```bash +# List all findings for the current contract +makima supervisor finding list +# Output: +# ID SEVERITY STATUS CATEGORY TITLE +# SEC-001 P1 open security SQL injection in user search +# PERF-001 P2 in-progress performance N+1 query in order listing +# ARCH-001 P3 resolved architecture Handler accessing DB directly + +# Filter by severity +makima supervisor finding list --severity P1 + +# Filter by status +makima supervisor finding list --status open + +# Summary only +makima supervisor finding summary +# Output: +# Total: 12 findings +# P1: 2 open, 1 resolved +# P2: 3 open, 2 in-progress +# P3: 4 resolved +``` + +#### Update Finding Status + +```bash +# Mark as in-progress (assigned to a task) +makima supervisor finding update SEC-001 --status in-progress --assigned-to <task-id> + +# Mark as resolved +makima supervisor finding update SEC-001 --status resolved \ + --resolution "Replaced with parameterized query in commit abc123" + +# Mark as verified (after re-review) +makima supervisor finding update SEC-001 --status verified + +# Mark as won't fix +makima supervisor finding update SEC-001 --status wont-fix \ + --justification "Endpoint is internal-only, behind auth" +``` + +#### Auto-Create from Review Output + +```bash +# Parse review agent output and create findings automatically +makima supervisor finding parse-output --task-id <review-task-id> +``` + +This parses structured review output and creates individual finding records. + +### 3. Finding Lifecycle + +``` +┌────────────────────────────────────────────────────────────┐ +│ Finding Lifecycle │ +│ │ +│ ┌──────┐ ┌─────────────┐ ┌──────────┐ │ +│ │ │ │ │ │ │ │ +│ │ OPEN │───▶│ IN-PROGRESS │───▶│ RESOLVED │ │ +│ │ │ │ │ │ │ │ +│ └──┬───┘ └─────────────┘ └────┬─────┘ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌────┴─────┐ │ +│ │ │ │ │ │ │ +│ └───────▶│ WONT-FIX │ │ VERIFIED │ │ +│ │ │ │ │ │ +│ └─────────────┘ └──────────┘ │ +│ │ +│ Triggers: │ +│ open ─▶ in_progress : Task assigned to fix │ +│ in_progress ─▶ resolved : Fix committed │ +│ resolved ─▶ verified : Re-review confirms fix │ +│ open ─▶ wont_fix : Explicit decision with justification │ +│ resolved ─▶ wont_fix : Fix deemed unnecessary after review│ +└────────────────────────────────────────────────────────────┘ +``` + +### 4. P1/P2/P3 Severity System + +| Severity | Name | Description | Merge Policy | +|----------|------|-------------|--------------| +| **P1** | Critical | Security vulnerabilities, data loss risks, crash bugs | **Blocks merge** — must be resolved before contract completion | +| **P2** | Major | Performance issues, architectural concerns, significant tech debt | **Should fix** — can defer with explicit justification | +| **P3** | Minor | Style issues, minor improvements, documentation gaps | **Nice to fix** — can defer freely | + +### 5. Merge Blocking + +When findings exist, phase transitions and merge operations check for blockers: + +```rust +// In advance-phase handler +async fn check_findings_gate(contract_id: Uuid) -> Result<bool> { + let findings = get_findings(contract_id).await?; + let open_p1s = findings.iter() + .filter(|f| f.severity == P1 && f.status == Open) + .count(); + + if open_p1s > 0 { + warn!("{} open P1 findings block phase transition", open_p1s); + return Ok(false); + } + Ok(true) +} +``` + +### 6. Auto-Resolution Workflow + +When the Multi-Agent Review feature is available, findings drive an automated resolution cycle: + +``` +┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐ +│ Review │────▶│ Findings │────▶│ Resolve │────▶│ Verify │ +│ Phase │ │ Created │ │ Tasks │ │ Fixes │ +│ │ │ (P1/P2/P3)│ │ Spawned │ │ Pass? │ +└──────────┘ └───────────┘ └──────────┘ └────┬─────┘ + │ + Yes │ No + ┌────┴────┐ + ▼ ▼ + ┌──────────┐ Loop back + │ Findings │ to resolve + │ Verified │ + └──────────┘ +``` + +```bash +# Auto-resolve: spawn tasks to fix each P1/P2 finding +makima supervisor finding auto-resolve --severity P1,P2 + +# This spawns one task per finding: +# - Task plan includes the finding details and recommendation +# - Task is assigned to the finding (finding.assigned_to = task.id) +# - When task completes, finding status → resolved +# - Verification task confirms the fix +``` + +--- + +## Integration with Existing Makima Features + +### Contract Files + +Each finding is stored as a **contract file**: + +```rust +File { + contract_id: Some(contract.id), + contract_phase: Some("review"), + name: "Finding: SEC-001 — SQL injection in user search", + description: Some(serde_json::to_string(&finding_metadata)?), + body: vec![ + BodyElement::Heading { level: 1, text: finding.title }, + BodyElement::Heading { level: 2, text: "Finding" }, + BodyElement::Paragraph { text: finding.description }, + BodyElement::Heading { level: 2, text: "Evidence" }, + BodyElement::Code { language: Some("rust"), content: finding.evidence }, + BodyElement::Heading { level: 2, text: "Recommendation" }, + BodyElement::Paragraph { text: finding.recommendation }, + ], +} +``` + +### Phase Guards + +Findings integrate with existing phase guards: +- Phase guard checks finding gate before allowing transition +- User sees a summary of open findings when reviewing phase transition +- P1 findings produce a warning that requires explicit override + +### Supervisor Questions + +When P1 findings block a transition, the supervisor can ask: + +```bash +makima supervisor ask \ + "2 P1 findings are still open. How would you like to proceed?" \ + --choices "Fix findings first,Override and continue,Mark as won't-fix" \ + --context "SEC-001: SQL injection (P1), PERF-001: Memory leak (P1)" +``` + +### Task Assignment + +Findings reference tasks: +- `source_task_id`: The review task that discovered the finding +- `assigned_to`: The task spawned to resolve the finding + +```bash +# Spawn a fix task and assign the finding +makima supervisor spawn "fix-sec-001" \ + --plan "Fix SQL injection vulnerability in src/api/users.rs:47. Use parameterized queries." + +makima supervisor finding update SEC-001 \ + --status in-progress \ + --assigned-to <spawned-task-id> +``` + +### Autonomous Loop + +The autonomous loop can use findings as a completion gate condition: + +```xml +<COMPLETION_GATE> +ready: false +reason: "2 P1 findings still open" +progress: "Resolved 5/7 findings" +blockers: ["SEC-001: SQL injection", "PERF-001: Memory leak"] +</COMPLETION_GATE> +``` + +--- + +## Implementation Plan + +### Phase 1: Core Finding System (3-4 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Finding metadata schema | 0.5 days | FindingMetadata struct, validation | +| `finding create` command | 1 day | Create finding as contract file | +| `finding list/summary` commands | 0.5 days | Query and display findings | +| `finding update` command | 0.5 days | Status transitions, validation | +| Auto-ID generation | 0.5 days | Category-based IDs (SEC-001, PERF-002) | + +### Phase 2: Integration (2-3 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Phase guard integration | 0.5 days | Check P1 findings before transition | +| `finding parse-output` | 1 day | Parse review task output into findings | +| Merge blocking logic | 0.5 days | Block merge with open P1s | +| Finding assignment to tasks | 0.5 days | Track resolution via task ID | + +### Phase 3: Automation & Polish (2-3 days) + +| Task | Effort | Description | +|------|--------|-------------| +| `finding auto-resolve` | 1 day | Spawn fix tasks per finding | +| Verification workflow | 0.5 days | Re-review to verify fixes | +| Finding reports | 0.5 days | Summary contract file | +| Documentation | 0.5 days | User guide | +| Tests | 0.5 days | Unit + integration | + +--- + +## Configuration Examples + +### Finding Creation in Review Agent Output + +Review agents produce structured findings in their output: + +```markdown +## FINDING: SQL Injection in User Search + +- **Severity**: P1 +- **Category**: security +- **File**: src/api/users.rs +- **Line**: 47 +- **Tags**: injection, input-validation, database + +### Description +The `search_users` handler directly interpolates the `query` parameter... + +### Evidence +```rust +let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query); +``` + +### Recommendation +Use parameterized queries with sqlx::query().bind() +``` + +The synthesis step parses these into formal Finding records. + +### Merge Blocking Configuration + +```yaml +# .makima/review-agents.yaml (or contract config) +review: + findings: + merge_blocking_severity: P1 # P1 blocks merge + require_justification: P2 # P2 needs justification to defer + auto_resolve: true # Spawn fix tasks for P1/P2 + auto_resolve_severity: P1,P2 # Which severities to auto-resolve + verification: + enabled: true # Re-review after resolution + re_review_agents: # Which agents verify fixes + - security-sentinel # Security findings verified by security agent +``` + +### Finding Lifecycle Example + +```bash +# 1. Review creates finding +makima supervisor finding create --severity P1 --category security \ + --title "SQL injection in user search" --file src/api/users.rs --line 47 + +# 2. Auto-resolve spawns fix task +makima supervisor finding auto-resolve --severity P1 +# → Spawns task "fix-SEC-001" with plan based on finding details + +# 3. Fix task completes, finding auto-updated +# finding SEC-001: open → in-progress → resolved + +# 4. Verification re-reviews the fix +makima supervisor finding verify SEC-001 +# → Spawns verification task targeting the specific file/line + +# 5. Verification passes +# finding SEC-001: resolved → verified + +# 6. Phase transition allowed +makima supervisor advance-phase compound -y +``` + +--- + +## Open Questions + +1. **Finding storage**: Contract files vs. dedicated findings table in the database? Contract files are simpler but querying is less efficient. +2. **Cross-contract findings**: Should findings persist across contracts? (e.g., a P2 deferred from one contract carries to the next) +3. **Finding templates**: Should common finding types have templates? (e.g., "SQL injection" pre-fills category, severity, recommendation) +4. **External integration**: Should findings be exportable to GitHub Issues, Jira, or other issue trackers? +5. **Metric tracking**: How granular should finding metrics be? Per-contract? Per-repository? Per-category? +6. **False positive handling**: How should agents indicate confidence level? Should low-confidence findings be automatically P3? + +--- + +## Alternatives Considered + +| Alternative | Pros | Cons | Decision | +|-------------|------|------|----------| +| GitHub Issues integration | Rich UI, collaboration | External dependency; not all projects use GitHub | Deferred — consider as export target | +| Plain text findings | Simple | Not queryable, no lifecycle | Rejected — defeats the purpose | +| Dedicated findings DB table | Fast queries, rich indexing | New infrastructure, migration | Recommended for v2 | +| Contract file-based | Uses existing infrastructure | Slower queries for large sets | Adopted for v1 | +| Inline code comments | Close to code | Lost on next commit; hard to track | Rejected — not persistent | + +--- + +## Priority & Complexity Assessment + +- **Priority: MEDIUM** — Structured findings transform the review phase from documentation to a quality gate. Essential for the Multi-Agent Review feature to produce actionable output. +- **Complexity: LOW** — Finding records are simple structured data. Lifecycle state machine is straightforward. Main integration point (phase guards) already exists. +- **Risk: LOW** — Purely additive feature. Worst case: findings exist but aren't used (same as today). Can be adopted incrementally. diff --git a/docs/proposals/feature-knowledge-accumulation.md b/docs/proposals/feature-knowledge-accumulation.md new file mode 100644 index 0000000..faef06a --- /dev/null +++ b/docs/proposals/feature-knowledge-accumulation.md @@ -0,0 +1,539 @@ +# Feature Proposal: Knowledge Accumulation / Compound Learning System + +> **Priority:** High +> **Complexity:** Medium +> **Estimated Effort:** 10-15 days +> **Status:** Proposal +> **Date:** 2026-02-09 +> **Dependencies:** Contract Files system (existing) +> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Plan Deepening](feature-plan-deepening.md) · [Workflow Presets](feature-workflow-presets.md) + +--- + +## Problem Statement + +When a makima contract completes, the **knowledge generated during that contract is effectively lost**: + +- **Solutions to tricky problems** exist only in task conversation history, which is not searchable or surfaceable +- **Patterns discovered** during one contract cannot inform future contracts +- **Mistakes made** in one contract are likely to be repeated in similar future contracts +- **Best practices** established during execution are not codified anywhere retrievable +- **Contract files** capture deliverables but not the *meta-knowledge* about how those deliverables were produced + +This means every new contract starts from zero context, even when the team has solved similar problems before. Engineering effort does not compound. + +--- + +## How Compound Engineering Solves This + +The compound engineering plugin implements a `/compound` command that runs **5 parallel sub-agents** immediately after review: + +``` +┌─────────────────────────────────────────────────────────┐ +│ /compound │ +│ │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Context │ │ Solution │ │ Prevention │ │ +│ │ Extractor │ │ Documenter │ │ Strategist │ │ +│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ +│ │ │ │ │ +│ ┌──────┴──────┐ ┌──────┴──────┐ │ +│ │ Doc │ │ Category │ │ +│ │ Linker │ │ Classifier │ │ +│ └──────┬──────┘ └──────┬──────┘ │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌──────────────────────────────────────┐ │ +│ │ docs/solutions/[category]/file.md │ │ +│ │ │ │ +│ │ --- │ │ +│ │ category: build-errors │ │ +│ │ severity: medium │ │ +│ │ tags: [webpack, esm, cjs] │ │ +│ │ date: 2026-02-09 │ │ +│ │ contract: abc-123 │ │ +│ │ --- │ │ +│ │ │ │ +│ │ # Mixed ESM/CJS Import Resolution │ │ +│ │ │ │ +│ │ ## Problem │ │ +│ │ ... │ │ +│ │ ## Solution │ │ +│ │ ... │ │ +│ │ ## Prevention │ │ +│ │ ... │ │ +│ └──────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +### 9 Auto-Detected Categories + +| Category | Description | +|----------|-------------| +| `build-errors` | Compilation, bundling, dependency resolution | +| `test-failures` | Test setup, assertion patterns, mocking | +| `api-patterns` | API design, endpoint structure, versioning | +| `architecture-decisions` | Structural choices, trade-offs, patterns | +| `performance-optimizations` | Speed, memory, caching strategies | +| `security-practices` | Auth, input validation, secrets management | +| `debugging-techniques` | Investigation methods, logging strategies | +| `tooling-configurations` | Tool setup, config patterns, CI/CD | +| `domain-knowledge` | Business logic, domain-specific patterns | + +--- + +## Proposed Makima Implementation + +### 1. New "Compound" Phase + +Add an optional **compound** phase to the contract lifecycle, positioned after review: + +``` +Research → Specify → Plan → Execute → Review → Compound + ▲ + (new phase) +``` + +**Phase behavior:** +- **Auto-triggered** after review phase completes (configurable) +- **Short-lived** — typically completes in 1-3 minutes +- Extracts learnings from the contract's execution and review +- Stores them as searchable, categorized learning documents +- Can be skipped via configuration for trivial contracts + +### 2. New Supervisor Command: `makima supervisor compound` + +```bash +# Run compound learning for the current contract +makima supervisor compound + +# Compound with specific focus areas +makima supervisor compound --focus "security,performance" + +# Compound with explicit learnings +makima supervisor compound --learning "The retry logic needed exponential backoff, not fixed delay" +``` + +**Implementation:** + +```bash +# Under the hood, this spawns learning sub-agents +makima supervisor spawn-group "compound" \ + --tasks '[ + { + "name": "context-extractor", + "plan": "Extract the problem context, constraints, and environment details from the contract execution history..." + }, + { + "name": "solution-documenter", + "plan": "Document the solutions that were applied, including code patterns and configuration changes..." + }, + { + "name": "prevention-strategist", + "plan": "Identify what could prevent this class of problem in the future..." + }, + { + "name": "category-classifier", + "plan": "Classify these learnings into the appropriate category..." + }, + { + "name": "doc-linker", + "plan": "Link these learnings to existing documentation and related learnings..." + } + ]' +``` + +### 3. Learning Document Schema + +Each learning is stored as a **contract file** with structured content and metadata: + +```yaml +# Learning document metadata (stored in file description/metadata) +learning: + category: "build-errors" # One of 9 categories + severity: "medium" # low, medium, high, critical + tags: ["webpack", "esm", "cjs"] # Free-form tags + source_contract_id: "abc-123" # Contract that produced this learning + source_contract_name: "Fix webpack bundling" + repository: "github.com/org/repo" + date: "2026-02-09" + quality_score: 0.85 # 0-1, set by quality gate + access_count: 0 # Incremented on retrieval + last_accessed: null + relevance_decay: 0.95 # Per-month decay factor +``` + +**Document body structure:** + +```markdown +# Mixed ESM/CJS Import Resolution + +## Problem +When upgrading to webpack 5, mixed ESM and CommonJS imports caused +"Cannot use import statement outside a module" errors in production +but not development. + +## Root Cause +The `type: "module"` field in package.json applied ESM resolution +globally, but several dependencies only provided CJS exports. + +## Solution +1. Added `resolve.fullySpecified: false` to webpack config +2. Used `@babel/plugin-transform-modules-commonjs` for CJS deps +3. Created explicit `.cjs` extensions for config files + +## Code Pattern +```javascript +// webpack.config.cjs (note: .cjs extension) +module.exports = { + resolve: { + fullySpecified: false, + extensions: ['.js', '.mjs', '.cjs', '.json'] + } +}; +``` + +## Prevention +- Add webpack build check to CI before merging +- Document module system choice in project README +- Use `resolve.fullySpecified: false` by default in webpack 5 projects + +## Related +- docs/solutions/tooling-configurations/webpack-5-migration.md +- Contract: "Initial Webpack 5 Migration" (2026-01-15) +``` + +### 4. Storage Architecture + +Learnings are stored in two complementary locations: + +#### A. Contract Files (Structured, Persistent) + +```rust +// Each learning becomes a contract file +File { + contract_id: Some(source_contract.id), + contract_phase: Some("compound"), + name: "Learning: Mixed ESM/CJS Import Resolution", + description: Some("category=build-errors; tags=webpack,esm,cjs; severity=medium"), + body: vec![ + BodyElement::Heading { level: 1, text: "Mixed ESM/CJS Import Resolution" }, + BodyElement::Heading { level: 2, text: "Problem" }, + BodyElement::Paragraph { text: "..." }, + // ... structured content + ], + repo_file_path: Some("docs/solutions/build-errors/mixed-esm-cjs-resolution.md"), + repo_sync_status: Some("synced"), +} +``` + +#### B. Repository Files (Searchable, Portable) + +``` +docs/solutions/ +├── build-errors/ +│ ├── mixed-esm-cjs-resolution.md +│ └── docker-multi-stage-cache.md +├── test-failures/ +│ ├── async-test-timeout-patterns.md +│ └── mock-service-worker-setup.md +├── api-patterns/ +│ └── pagination-cursor-vs-offset.md +├── architecture-decisions/ +│ └── event-sourcing-tradeoffs.md +├── performance-optimizations/ +│ └── database-connection-pooling.md +├── security-practices/ +│ └── jwt-refresh-token-rotation.md +├── debugging-techniques/ +│ └── distributed-tracing-setup.md +├── tooling-configurations/ +│ └── github-actions-cache-strategy.md +└── domain-knowledge/ + └── payment-processing-idempotency.md +``` + +### 5. Auto-Surface Relevant Learnings + +When a new contract is created, automatically search for relevant learnings: + +```bash +# Supervisor plan template automatically includes: +# "Search existing learnings relevant to this task" + +makima supervisor search-learnings --query "webpack bundling errors" +makima supervisor search-learnings --category "build-errors" --tags "webpack" +makima supervisor search-learnings --repository "github.com/org/repo" +``` + +**Search algorithm:** + +``` +Relevance Score = + keyword_match_score * 0.4 + + category_match_score * 0.2 + + tag_overlap_score * 0.2 + + recency_score * 0.1 # Decays over time + + quality_score * 0.1 # Higher quality = more relevant +``` + +**Integration with plan phase:** + +``` +┌──────────────┐ ┌───────────────────┐ +│ New Contract │──────▶│ Plan Phase │ +│ Created │ │ │ +└──────────────┘ │ 1. Create plan │ + │ 2. Search for │◀── Learnings DB + │ relevant │ + │ learnings │ + │ 3. Inject context │ + │ into plan │ + └───────────────────┘ +``` + +### 6. Quality Control + +#### Relevance Decay + +Learnings lose relevance over time unless accessed: + +``` +effective_relevance = quality_score * (decay_factor ^ months_since_creation) + + access_bonus * recent_access_count +``` + +- Default decay factor: 0.95/month (learning at 60% relevance after 1 year) +- Access bonus: +0.05 per access (caps at +0.25) +- Learnings below 0.3 effective relevance are archived + +#### Deduplication + +When a new learning is created, check for existing similar learnings: + +``` +similarity = cosine_similarity(new_learning_embedding, existing_learning_embedding) +if similarity > 0.85: + merge_or_update(existing_learning, new_learning) +elif similarity > 0.70: + link_as_related(new_learning, existing_learning) +``` + +#### Quality Gate + +Before storing a learning, validate: + +| Check | Threshold | Action if Failed | +|-------|-----------|------------------| +| Has problem statement | Required | Reject | +| Has solution | Required | Reject | +| Has prevention strategy | Recommended | Warn, store with quality penalty | +| Code examples present | Recommended | Warn, store with quality penalty | +| Category valid | Required | Auto-classify | +| Not duplicate | >0.85 similarity | Merge with existing | +| Minimum length | >200 characters | Reject | + +--- + +## Integration with Existing Makima Features + +### Contract Phases + +The compound phase integrates into the existing phase system: + +```rust +// New phase variant +enum ContractPhase { + Research, + Specify, + Plan, + Execute, + Review, + Compound, // NEW +} +``` + +- Contracts with `contract_type: "specification"` get the full 6-phase cycle +- Contracts with `contract_type: "simple"` can opt-in via config +- Phase guard still applies: user must approve transition to compound + +### Contract Files + +Learnings are first-class contract files, leveraging existing: +- Versioning system +- Structured body format (`BodyElement` types) +- Repository file sync (`repo_file_path`, `repo_sync_status`) +- Phase association (`contract_phase: "compound"`) + +### Directive System + +For directive-based workflows, learnings can be captured per-step: + +```rust +DirectiveStep { + name: "compound-step-3", + description: "Capture learnings from database migration step", + depends_on: [step_3_id, review_step_id], + task_plan: "Extract and document learnings from the completed migration...", +} +``` + +### Supervisor CLI + +New commands integrate with existing CLI infrastructure: + +```bash +# In supervisor context +makima supervisor compound # Run compound phase +makima supervisor search-learnings "query" # Search knowledge base +makima supervisor list-learnings # List all learnings +makima supervisor learning-stats # Knowledge base statistics +``` + +--- + +## Implementation Plan + +### Phase 1: Core Infrastructure (4-5 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Add `compound` phase to contract lifecycle | 1 day | New phase enum, transition rules | +| Learning document schema | 1 day | Metadata structure, validation | +| `supervisor compound` command | 1-2 days | Spawn learning sub-agents | +| Repository file sync for learnings | 1 day | Write to `docs/solutions/` | + +### Phase 2: Search & Retrieval (3-5 days) + +| Task | Effort | Description | +|------|--------|-------------| +| `search-learnings` command | 1-2 days | Keyword + category search | +| Auto-surface in plan phase | 1-2 days | Inject relevant learnings into plans | +| Learning index | 1 day | Category/tag index for fast lookup | + +### Phase 3: Quality & Maintenance (3-5 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Quality gate validation | 1 day | Pre-storage checks | +| Relevance decay system | 1 day | Scheduled decay + access tracking | +| Deduplication check | 1-2 days | Similarity detection and merging | +| Documentation & defaults | 1 day | User guide, default categories | + +--- + +## Configuration Examples + +### Enable Compound Phase (Contract-Level) + +```yaml +# Contract configuration +compound: + enabled: true + auto_trigger: true # Auto-run after review completes + categories: # Override default categories + - build-errors + - test-failures + - api-patterns + - architecture-decisions + - performance-optimizations + - security-practices + - debugging-techniques + - tooling-configurations + - domain-knowledge + quality_gate: + min_length: 200 + require_problem: true + require_solution: true + require_prevention: false + storage: + contract_files: true # Store as contract files + repo_files: true # Also write to docs/solutions/ + repo_path: "docs/solutions" +``` + +### Repository-Level Configuration (`.makima/compound.yaml`) + +```yaml +# .makima/compound.yaml +version: 1 +compound: + # Default settings for all contracts in this repo + auto_trigger: true + + # Custom categories for this project + categories: + - build-errors + - test-failures + - api-patterns + - payment-processing # Custom domain category + - compliance-requirements # Custom domain category + + # Search settings + search: + max_results: 10 + min_relevance: 0.3 + include_archived: false + + # Decay settings + decay: + factor: 0.95 # Per month + archive_threshold: 0.3 + access_bonus: 0.05 + max_access_bonus: 0.25 +``` + +### Searching Learnings + +```bash +# Full-text search +makima supervisor search-learnings "webpack ESM import error" + +# Category filter +makima supervisor search-learnings --category build-errors + +# Tag filter +makima supervisor search-learnings --tags webpack,esm + +# Repository filter +makima supervisor search-learnings --repo github.com/org/repo + +# Combined +makima supervisor search-learnings "import error" \ + --category build-errors \ + --tags webpack \ + --min-relevance 0.5 \ + --limit 5 +``` + +--- + +## Open Questions + +1. **Cross-repository knowledge**: Should learnings be scoped to a single repository or shared across all repositories for an owner? +2. **Learning ownership**: Who owns a learning — the contract creator, the repository, or the organization? +3. **Privacy**: Are learnings visible to all users, or scoped by access control? +4. **Embedding model**: For similarity-based deduplication and search, which embedding model should be used? Trade-off between quality and cost. +5. **Storage limits**: Should there be a cap on the number of learnings per repository/owner? +6. **Manual curation**: Should users be able to manually create, edit, or delete learnings outside the compound phase? +7. **Export/import**: Should learnings be exportable/importable across makima instances? + +--- + +## Alternatives Considered + +| Alternative | Pros | Cons | Decision | +|-------------|------|------|----------| +| Store learnings only in contract files | Simple, uses existing infrastructure | Not easily searchable across contracts | Rejected — search is critical | +| Store learnings only in repo files | Portable, version-controlled, greppable | Lost if repo deleted; no cross-repo search | Partial — use as secondary storage | +| Use external knowledge base (e.g., vector DB) | Best search quality | Added infrastructure dependency | Deferred — consider for v2 | +| Manual-only knowledge capture | No noise | Knowledge rarely captured | Rejected — must be automatic | +| Full contract history indexing | Most complete | Massive storage, noise, privacy concerns | Rejected — too much signal-to-noise | + +--- + +## Priority & Complexity Assessment + +- **Priority: HIGH** — This is the defining feature of compound engineering. Without knowledge accumulation, every contract starts from scratch. This is the feature that creates compounding returns. +- **Complexity: MEDIUM** — Core capture and storage is straightforward using existing contract files and repo sync. Search quality and relevance decay require iterative refinement. +- **Risk: MEDIUM** — Primary risk is low adoption (users skip compound phase) mitigated by auto-trigger. Secondary risk is knowledge base noise mitigated by quality gates. diff --git a/docs/proposals/feature-multi-agent-review.md b/docs/proposals/feature-multi-agent-review.md new file mode 100644 index 0000000..d678756 --- /dev/null +++ b/docs/proposals/feature-multi-agent-review.md @@ -0,0 +1,448 @@ +# Feature Proposal: Multi-Agent Parallel Review System + +> **Priority:** High +> **Complexity:** Medium +> **Estimated Effort:** 12-18 days +> **Status:** Proposal +> **Date:** 2026-02-09 +> **Dependencies:** [Findings Tracking](feature-findings-tracking.md) (recommended) +> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Workflow Presets](feature-workflow-presets.md) + +--- + +## Problem Statement + +Makima's contract lifecycle includes a **Review** phase, but it currently has: + +- **No automated review mechanism** — the review phase relies entirely on manual user inspection or a single supervisor task +- **Single-perspective review** — even when a review task is spawned, it examines code from one viewpoint +- **No structured review output** — findings are captured as unstructured text in task output +- **No review templates** — each review must be configured from scratch +- **No synthesis** — when multiple reviewers exist, there's no mechanism to deduplicate and prioritize findings + +For complex contracts touching security, performance, and architecture, a single-pass review consistently misses category-specific issues that specialized reviewers would catch. + +--- + +## How Compound Engineering Solves This + +The compound engineering plugin spawns **12-15 specialized review agents in parallel**, each examining the code from a unique perspective: + +| Agent | Focus Area | Example Findings | +|-------|-----------|-----------------| +| Security Sentinel | Auth, injection, secrets, CSRF | SQL injection in user input handler | +| Performance Oracle | N+1 queries, memory leaks, caching | Unbounded list growth in event handler | +| Architecture Strategist | Coupling, SOLID, layering | Service directly accessing repository internals | +| Code Philosopher | Readability, naming, complexity | Cyclomatic complexity > 15 in payment flow | +| Data Integrity Guardian | Validation, constraints, migrations | Missing NOT NULL constraint on required field | +| Error Resilience Analyzer | Error handling, retries, fallbacks | Unhandled timeout in external API call | +| API Contract Validator | Breaking changes, versioning | Removed required field from response | +| Dependency Health Checker | Vulnerabilities, licensing, freshness | CVE-2025-XXXX in transitive dependency | +| Test Coverage Analyzer | Coverage gaps, edge cases, mocking | No tests for error path in checkout flow | +| Documentation Completeness | Docs accuracy, examples, changelog | Public API endpoint undocumented | +| Concurrency Safety | Race conditions, deadlocks, atomicity | Non-atomic read-modify-write on shared counter | + +After all agents complete, a **synthesis agent** deduplicates findings, resolves contradictions, and produces a prioritized report. + +``` +┌───────────────────────────────────────────────────────┐ +│ Review Orchestrator │ +│ │ +│ spawn-group "review" │ +│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ +│ │Security │ │ Perf │ │ Arch │ │ Code │ │ +│ │Sentinel │ │ Oracle │ │Strategy │ │ Phil │ │ +│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ +│ │ │ │ │ │ +│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │ +│ │ Data │ │ Error │ │ API │ │ Deps │ │ +│ │Guardian │ │Resilien.│ │Contract │ │ Health │ │ +│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ +│ │ │ │ │ │ +│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │ +│ │ Test │ │ Docs │ │Concurr. │ │ +│ │Coverage │ │Complete │ │ Safety │ │ +│ └────┬────┘ └────┬────┘ └────┬────┘ │ +│ │ │ │ │ +│ wait-group "review" │ +│ ▼ ▼ ▼ │ +│ ┌──────────────────────────────────────────┐ │ +│ │ Synthesis Agent │ │ +│ │ - Deduplicate findings │ │ +│ │ - Resolve contradictions │ │ +│ │ - Prioritize by severity │ │ +│ │ - Generate summary report │ │ +│ └──────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ Structured Findings │ +│ (P1 / P2 / P3) │ +└───────────────────────────────────────────────────────┘ +``` + +--- + +## Proposed Makima Implementation + +### 1. New Supervisor Commands + +#### `makima supervisor spawn-group` + +Spawns multiple tasks as a named group and returns immediately: + +```bash +# Spawn a review group with 5 agents +makima supervisor spawn-group "review" \ + --tasks '[ + {"name": "security-review", "plan": "Review for security vulnerabilities..."}, + {"name": "performance-review", "plan": "Review for performance issues..."}, + {"name": "architecture-review", "plan": "Review for architecture concerns..."} + ]' \ + --share-worktree \ + --read-only +``` + +**Key parameters:** +- `--tasks` — JSON array of task definitions +- `--share-worktree` — All tasks in the group share the supervisor's worktree (read-only access) +- `--read-only` — Tasks cannot modify files, only produce output +- `--max-concurrent N` — Limit parallel execution (default: unlimited) + +#### `makima supervisor wait-group` + +Waits for all tasks in a named group to complete: + +```bash +# Wait for all review tasks, timeout after 10 minutes +makima supervisor wait-group "review" --timeout 600 + +# Returns JSON with all task results +``` + +**Output format:** +```json +{ + "group": "review", + "status": "completed", + "tasks": [ + {"name": "security-review", "status": "done", "output": "..."}, + {"name": "performance-review", "status": "done", "output": "..."} + ], + "duration_seconds": 127 +} +``` + +#### `makima supervisor review` + +High-level command that orchestrates the full review pipeline: + +```bash +# Run review with default agent config +makima supervisor review + +# Run review with custom config +makima supervisor review --config .makima/review-agents.yaml + +# Run only specific review categories +makima supervisor review --only security,performance,architecture +``` + +### 2. Review Agent Configuration + +#### Repository-Level Configuration (`.makima/review-agents.yaml`) + +```yaml +# .makima/review-agents.yaml +version: 1 +review: + # Maximum number of concurrent review agents + max_concurrent: 8 + + # Timeout per agent (seconds) + agent_timeout: 300 + + # Auto-trigger review when phase transitions to 'review' + auto_trigger: true + + # Finding severity that blocks merge + merge_blocking_severity: P1 + + agents: + - name: security-sentinel + enabled: true + plan: | + You are a Security Sentinel reviewing code changes. + + Focus areas: + - Authentication and authorization flaws + - Injection vulnerabilities (SQL, XSS, command injection) + - Secret/credential exposure + - CSRF and session management + - Input validation gaps + + Output format: One finding per section with severity (P1/P2/P3), + affected file/line, description, and suggested fix. + priority: critical # Always runs + + - name: performance-oracle + enabled: true + plan: | + You are a Performance Oracle reviewing code changes. + + Focus areas: + - N+1 query patterns + - Memory leaks and unbounded growth + - Missing caching opportunities + - Algorithmic complexity issues + - Database index utilization + + Output format: One finding per section with severity (P1/P2/P3), + affected file/line, description, and suggested fix. + priority: standard + + - name: architecture-strategist + enabled: true + plan: | + You are an Architecture Strategist reviewing code changes. + + Focus areas: + - SOLID principle violations + - Inappropriate coupling between modules + - Layering violations (e.g., handler accessing DB directly) + - Missing abstraction boundaries + - Inconsistency with existing patterns + + Output format: One finding per section with severity (P1/P2/P3), + affected file/line, description, and suggested fix. + priority: standard + + - name: test-coverage-analyzer + enabled: true + plan: | + You are a Test Coverage Analyzer reviewing code changes. + + Focus areas: + - Missing test coverage for new code paths + - Untested error/edge cases + - Test quality (meaningful assertions vs superficial) + - Integration test gaps + - Mock appropriateness + + Output format: One finding per section with severity (P1/P2/P3), + affected file/line, description, and suggested fix. + priority: standard + + # Users can add custom agents here + - name: custom-domain-reviewer + enabled: false + plan: "Review for domain-specific business logic concerns..." + priority: optional +``` + +#### Contract-Level Override + +```yaml +# In contract configuration or via CLI +review: + agents: + # Disable agents not relevant to this contract + - name: concurrency-safety + enabled: false + # Add contract-specific reviewer + - name: migration-safety + enabled: true + plan: "Review database migrations for data loss risks..." +``` + +### 3. Synthesis Step + +After all review agents complete, a synthesis task: + +1. **Collects** all findings from group task outputs +2. **Deduplicates** findings about the same issue from different perspectives +3. **Resolves contradictions** (e.g., one agent says "add caching" while another says "caching adds complexity") +4. **Prioritizes** by severity and cross-agent agreement +5. **Produces** a structured review report as a contract file + +```bash +# Synthesis is automatically run after wait-group completes +makima supervisor synthesize-review "review" \ + --output-format findings \ + --create-contract-file +``` + +### 4. Auto-Review Trigger + +When a contract's phase transitions to `review`: + +```rust +// In phase transition handler +if new_phase == "review" && contract.review_config.auto_trigger { + // Spawn review group automatically + spawn_review_group(contract, review_config).await?; +} +``` + +--- + +## Integration with Existing Makima Features + +### Supervisor/Worker Hierarchy + +Review agents are spawned as **worker tasks** under the supervisor, using existing `spawn-task` infrastructure. The new `spawn-group`/`wait-group` commands are syntactic sugar over batch `spawn-task` + `wait` calls. + +### Git Worktree Isolation + +Review agents share the supervisor's worktree in **read-only mode** (a new capability). This avoids creating N separate worktrees for review-only tasks. Implementation: +- New `supervisor_worktree_task_id` parameter (already exists in SpawnTask) +- New `read_only: true` flag to prevent file modifications +- Workers see the same code state that triggered the review + +### Contract Files + +The synthesized review report is stored as a **contract file** attached to the review phase: +```rust +File { + contract_id: contract.id, + contract_phase: "review", + name: "Review Report — 2026-02-09", + body: vec![ + BodyElement::Heading { level: 1, text: "Review Summary" }, + BodyElement::Paragraph { text: "3 P1 findings, 7 P2 findings, 12 P3 findings" }, + // ... structured findings + ], +} +``` + +### Phase Guards + +If `phase_guard` is enabled and P1 findings exist, the phase transition from Review to Execute (or Compound) is blocked until P1s are resolved. This integrates with the existing `advance-phase` confirmation flow. + +### Completion Gates + +Each review agent uses the existing `<COMPLETION_GATE>` mechanism to signal when its review is complete: +```xml +<COMPLETION_GATE> +ready: true +reason: "Security review complete. Found 2 P1 and 3 P2 findings." +progress: "Reviewed 47 files across 12 modules." +</COMPLETION_GATE> +``` + +### Circuit Breaker + +The existing CircuitBreaker protects against review agents getting stuck. If a review agent loops without progress for 3 iterations, it's terminated and its partial findings are included in synthesis. + +--- + +## Implementation Plan + +### Phase 1: Group Task Infrastructure (5-7 days) + +| Task | Effort | Description | +|------|--------|-------------| +| `spawn-group` command | 2 days | Batch task spawning with named groups | +| `wait-group` command | 1 day | Wait for all tasks in group | +| Group tracking in DB | 1 day | Task group table, membership, status | +| Shared worktree (read-only) | 1-2 days | Workers share supervisor worktree | +| Tests | 1 day | Unit + integration tests | + +### Phase 2: Review Agent System (4-6 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Review config YAML parser | 1 day | Parse `.makima/review-agents.yaml` | +| `supervisor review` command | 2 days | Orchestrate review pipeline | +| Synthesis agent logic | 1-2 days | Deduplicate, prioritize, format | +| Review report as contract file | 1 day | Store structured output | + +### Phase 3: Automation & Polish (3-5 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Auto-trigger on phase transition | 1 day | Hook into `advance-phase` | +| P1 merge blocking | 1 day | Phase guard integration | +| Default review agent templates | 1-2 days | Ship 8-10 built-in agents | +| Documentation | 1 day | User guide and config reference | + +--- + +## Configuration Examples + +### Minimal Setup (Zero Config) + +```bash +# Uses built-in review agents with default settings +makima supervisor review +``` + +### Custom Review for a Specific Contract + +```bash +# Override for this contract only +makima supervisor review \ + --only security,performance \ + --merge-blocking P1 \ + --timeout 300 +``` + +### Full Custom Configuration + +```yaml +# .makima/review-agents.yaml +version: 1 +review: + max_concurrent: 6 + agent_timeout: 300 + auto_trigger: true + merge_blocking_severity: P1 + + synthesis: + dedup_threshold: 0.8 # Similarity score for deduplication + min_agreement: 2 # Findings flagged by 2+ agents get priority boost + output_format: "findings" # "findings" | "report" | "both" + create_contract_file: true + + agents: + - name: security-sentinel + enabled: true + priority: critical + plan: | + ... + - name: performance-oracle + enabled: true + priority: standard + plan: | + ... + # ... more agents +``` + +--- + +## Open Questions + +1. **Shared worktree read-only enforcement**: Should this be enforced at the filesystem level (mount read-only) or via convention (instructions to the agent)? +2. **Review scope**: Should review agents see all files or only changed files (git diff)? +3. **Incremental review**: When new commits are added during review, should agents re-review or only review the delta? +4. **Agent output parsing**: Should agents output structured YAML findings, or should the synthesis step parse natural language? +5. **Cost control**: With 10+ parallel agents, how do we manage API costs? Should there be a budget ceiling per review? +6. **Finding deduplication**: What similarity threshold should trigger deduplication? How to handle partial overlaps? + +--- + +## Alternatives Considered + +| Alternative | Pros | Cons | Decision | +|-------------|------|------|----------| +| Single comprehensive review agent | Simple, no coordination overhead | Misses perspective-specific issues | Rejected — diminishes review quality | +| Sequential reviews (one after another) | Simpler orchestration | 5-10x slower; later reviews can't benefit from earlier ones | Rejected — latency unacceptable | +| External review tools integration | Leverage existing static analysis | Limited to tool capabilities; no semantic review | Complement — can integrate alongside agent review | +| User-configured number of agents | Maximum flexibility | Analysis paralysis for new users | Adopted — sensible defaults + customization | + +--- + +## Priority & Complexity Assessment + +- **Priority: HIGH** — Multi-agent review is the highest-impact feature from the compound engineering plugin. It directly improves code quality with no change to developer workflow. +- **Complexity: MEDIUM** — The core `spawn-group`/`wait-group` pattern is straightforward. The synthesis step requires careful design. Shared worktree read-only mode is a new capability. +- **Risk: LOW-MEDIUM** — Main risks are resource consumption (manageable with concurrency limits) and synthesis quality (improvable iteratively). diff --git a/docs/proposals/feature-plan-deepening.md b/docs/proposals/feature-plan-deepening.md new file mode 100644 index 0000000..c2d8aeb --- /dev/null +++ b/docs/proposals/feature-plan-deepening.md @@ -0,0 +1,383 @@ +# Feature Proposal: Parallel Plan Deepening + +> **Priority:** Medium +> **Complexity:** Low +> **Estimated Effort:** 5-8 days +> **Status:** Proposal +> **Date:** 2026-02-09 +> **Dependencies:** [Knowledge Accumulation](feature-knowledge-accumulation.md) (recommended, not required) +> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Multi-Agent Review](feature-multi-agent-review.md) + +--- + +## Problem Statement + +Makima's planning phase currently suffers from **single-pass planning**: + +- A supervisor creates a plan based on its immediate analysis of the task +- **No systematic research** is conducted before finalizing the plan +- **Edge cases are discovered during execution**, requiring mid-stream plan changes +- **Best practices are not consulted** — the plan relies solely on the model's training knowledge +- **Existing project learnings** (if the knowledge accumulation feature exists) are not surfaced during planning +- **Revision rate is high** — an estimated ~40% of plans require significant changes after execution begins + +The result: plans are shallow, execution discovers problems that planning should have caught, and contracts take longer than necessary. + +--- + +## How Compound Engineering Solves This + +The compound engineering plugin's `/deepen-plan` command takes an existing plan and enhances it by spawning **20-40 parallel research agents**: + +``` +┌──────────────────────────────────────────────────────────────┐ +│ /deepen-plan │ +│ │ +│ Input: Initial plan (from /plan) │ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ │ Best │ │ Edge │ │ Dep. │ │ Pattern │ │ +│ │ Practice │ │ Case │ │ Research │ │ Matching │ │ +│ │ Agent 1 │ │ Agent 1 │ │ Agent 1 │ │ Agent 1 │ │ +│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ +│ │ │ │ │ │ +│ ┌────┴─────┐ ┌────┴─────┐ ┌────┴─────┐ ┌────┴─────┐ │ +│ │ Best │ │ Edge │ │ Security │ │ Existing │ │ +│ │ Practice │ │ Case │ │ Concerns │ │ Learning │ │ +│ │ Agent 2 │ │ Agent 2 │ │ Agent │ │ Agent │ │ +│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ +│ │ │ │ │ │ +│ ... (20-40 agents per plan item) ... │ +│ │ │ │ │ │ +│ ▼ ▼ ▼ ▼ │ +│ ┌──────────────────────────────────────────────────┐ │ +│ │ Synthesis Agent │ │ +│ │ - Merge research into plan │ │ +│ │ - Add edge case handling │ │ +│ │ - Insert best practice notes │ │ +│ │ - Flag risks and dependencies │ │ +│ └──────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ Enhanced Plan (Deepened) │ +│ - Original steps preserved │ +│ - Edge cases added per step │ +│ - Best practices annotated │ +│ - Risks flagged │ +│ - Dependencies clarified │ +└──────────────────────────────────────────────────────────────┘ +``` + +The key insight: **research is embarrassingly parallel**. Each plan item can be researched independently, and each research dimension (best practices, edge cases, security, etc.) is independent. + +--- + +## Proposed Makima Implementation + +### 1. New Supervisor Command: `makima supervisor deepen-plan` + +```bash +# Deepen the current contract's plan +makima supervisor deepen-plan + +# Deepen with specific focus areas +makima supervisor deepen-plan --focus "security,edge-cases,performance" + +# Deepen with explicit plan file reference +makima supervisor deepen-plan --plan-file plan.md + +# Control parallelism +makima supervisor deepen-plan --max-agents 10 + +# Include knowledge base search (requires Knowledge Accumulation feature) +makima supervisor deepen-plan --search-learnings +``` + +### 2. Research Agent Categories + +Each plan item is researched along multiple dimensions: + +| Agent Category | Purpose | Example Output | +|----------------|---------|----------------| +| **Best Practices** | Industry standards for the technology/pattern | "Use parameterized queries for all DB operations" | +| **Edge Cases** | Boundary conditions and error scenarios | "Handle concurrent modification of shared resource" | +| **Dependency Research** | Compatibility, versions, known issues | "Library X v3 has breaking changes from v2" | +| **Security Concerns** | Security implications of the planned approach | "JWT stored in localStorage is vulnerable to XSS" | +| **Performance Implications** | Performance characteristics and bottlenecks | "N+1 query risk with eager loading disabled" | +| **Pattern Matching** | Similar patterns in the existing codebase | "Module Y already implements this pattern; follow its conventions" | +| **Existing Learnings** | Prior solutions from knowledge base | "Similar issue solved in contract Z; see docs/solutions/..." | + +### 3. Deepening Flow + +``` +┌─────────────┐ ┌──────────────────┐ ┌────────────────┐ +│ Original │ │ Research Phase │ │ Enhanced Plan │ +│ Plan │────▶│ │────▶│ │ +│ │ │ Per plan item: │ │ Original + │ +│ Step 1 │ │ - Best practices │ │ annotations │ +│ Step 2 │ │ - Edge cases │ │ │ +│ Step 3 │ │ - Dependencies │ │ Step 1 │ +│ Step 4 │ │ - Security │ │ ├ Edge cases │ +│ │ │ - Performance │ │ ├ Best pracs │ +│ │ │ - Patterns │ │ └ Risks │ +│ │ │ - Learnings │ │ Step 2 │ +│ │ │ │ │ ├ Edge cases │ +│ │ │ All in parallel │ │ └ ... │ +└─────────────┘ └──────────────────┘ └────────────────┘ +``` + +**Implementation using existing infrastructure:** + +```bash +# Step 1: Parse plan into items +plan_items=$(makima supervisor get-plan-items) + +# Step 2: For each item, spawn research agents as a group +for item in $plan_items; do + makima supervisor spawn-group "deepen-${item.id}" \ + --tasks "[ + {\"name\": \"best-practices\", \"plan\": \"Research best practices for: ${item.description}\"}, + {\"name\": \"edge-cases\", \"plan\": \"Identify edge cases for: ${item.description}\"}, + {\"name\": \"security\", \"plan\": \"Analyze security implications of: ${item.description}\"}, + {\"name\": \"performance\", \"plan\": \"Assess performance implications of: ${item.description}\"} + ]" \ + --share-worktree \ + --read-only +done + +# Step 3: Wait for all groups +makima supervisor wait-group "deepen-*" --timeout 300 + +# Step 4: Synthesize results into enhanced plan +makima supervisor synthesize-plan +``` + +### 4. Enhanced Plan Format + +The deepened plan augments each step with structured annotations: + +```markdown +## Step 3: Implement JWT Authentication + +### Original Plan +Add JWT-based authentication middleware to the API gateway. +Generate tokens on login, validate on each request. + +### Research Findings + +#### Best Practices +- Use RS256 (asymmetric) for microservices, HS256 for monoliths +- Set short access token TTL (15 min) with refresh token rotation +- Include only essential claims (sub, exp, iat, roles) +- Never store sensitive data in JWT payload (it's base64, not encrypted) + +#### Edge Cases +- Token expiry during long-running requests +- Clock skew between services (use ±30s leeway) +- Concurrent refresh token rotation (race condition) +- Token size exceeding header limits (>8KB with many claims) + +#### Security Concerns +- **P2**: JWT in localStorage is XSS-vulnerable; prefer httpOnly cookies +- **P3**: Missing CSRF protection if using cookies +- **P2**: No token revocation mechanism for compromised tokens + +#### Performance Notes +- JWT validation is CPU-bound (RS256 ~1ms per validation) +- Consider caching decoded tokens for repeated validation +- Refresh token DB lookup adds latency (~5ms) + +#### Existing Learnings +- See: docs/solutions/security-practices/jwt-refresh-token-rotation.md +- Previous contract "Auth Service Refactor" used similar pattern + +### Risks +- [ ] Clock skew handling not in original plan +- [ ] Token revocation strategy needed +- [ ] CSRF protection if using cookie storage +``` + +### 5. Integration with Knowledge Base + +When the Knowledge Accumulation feature is available, `deepen-plan` automatically includes a **learning search agent** for each plan item: + +``` +Research Agent: "Search existing learnings relevant to JWT authentication" + +Results: +- docs/solutions/security-practices/jwt-refresh-token-rotation.md (relevance: 0.92) +- docs/solutions/api-patterns/authentication-middleware-pattern.md (relevance: 0.78) +- docs/solutions/debugging-techniques/token-expiry-debugging.md (relevance: 0.65) +``` + +These results are included in the deepened plan with direct links. + +--- + +## Integration with Existing Makima Features + +### Contract Phases + +Plan deepening occurs during the **Plan phase**, between initial plan creation and phase transition to Execute: + +``` +Plan Phase Timeline: + 1. Supervisor creates initial plan + 2. makima supervisor deepen-plan ← NEW + 3. User reviews deepened plan + 4. makima supervisor advance-phase execute +``` + +### Supervisor/Worker Hierarchy + +Research agents are spawned as **worker tasks** under the supervisor. Uses the existing `spawn-task` infrastructure with the proposed `spawn-group`/`wait-group` from the [Multi-Agent Review](feature-multi-agent-review.md) proposal. + +### Contract Files + +The deepened plan replaces or augments the plan document as a contract file: + +```rust +File { + contract_id: contract.id, + contract_phase: "plan", + name: "Implementation Plan (Deepened)", + body: vec![ + // Enhanced plan content with annotations + ], +} +``` + +### Directive System + +For directive-based workflows, plan deepening can be added as a step: + +```rust +DirectiveStep { + name: "deepen-plan", + description: "Enhance implementation plan with parallel research", + depends_on: [initial_plan_step_id], + task_plan: "Run deepen-plan on the initial plan...", +} +``` + +### Phase Guards + +If `phase_guard` is enabled, the user reviews the deepened plan before approving transition to execute. This is the natural checkpoint for plan quality. + +--- + +## Implementation Plan + +### Phase 1: Core Command (2-3 days) + +| Task | Effort | Description | +|------|--------|-------------| +| `deepen-plan` command | 1 day | Parse plan, spawn research groups | +| Research agent templates | 0.5 days | Default prompts for each category | +| Synthesis logic | 1 day | Merge research into annotated plan | +| Plan file update | 0.5 days | Write deepened plan as contract file | + +### Phase 2: Knowledge Integration (1-2 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Learning search agent | 0.5 days | Search knowledge base per plan item | +| Result integration | 0.5 days | Include learning links in plan | +| Fallback when no KB | 0.5 days | Graceful degradation without KB | + +### Phase 3: Configuration & Polish (2-3 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Config file support | 0.5 days | `.makima/deepen.yaml` | +| Focus area filtering | 0.5 days | `--focus` flag implementation | +| Concurrency control | 0.5 days | `--max-agents` limit | +| Documentation | 0.5 days | User guide | +| Tests | 1 day | Unit + integration | + +--- + +## Configuration Examples + +### Repository-Level Configuration + +```yaml +# .makima/deepen.yaml +version: 1 +deepen: + # Auto-deepen when plan is created + auto_trigger: false + + # Maximum agents per plan item + max_agents_per_item: 5 + + # Total maximum concurrent agents + max_concurrent: 20 + + # Timeout per research agent (seconds) + agent_timeout: 120 + + # Research dimensions to include + dimensions: + - best-practices + - edge-cases + - security + - performance + - dependencies + - patterns + - learnings # Requires Knowledge Accumulation + + # Minimum plan items to trigger deepening + min_plan_items: 3 + + # Search learnings (requires Knowledge Accumulation) + search_learnings: true + search_min_relevance: 0.5 +``` + +### Inline Usage + +```bash +# Quick deepen with defaults +makima supervisor deepen-plan + +# Focused deepen for security-sensitive work +makima supervisor deepen-plan --focus security,edge-cases + +# Deepen with more agents for complex plans +makima supervisor deepen-plan --max-agents 30 + +# Deepen without knowledge base search +makima supervisor deepen-plan --no-learnings +``` + +--- + +## Open Questions + +1. **Plan format parsing**: How should the system parse existing plans to identify discrete items? Markdown headers? Numbered lists? YAML structure? +2. **Research depth vs. cost**: 20-40 agents per deepening is expensive. Should there be a "lite" mode with fewer agents? +3. **Deepening multiple times**: Can a plan be deepened iteratively? Should subsequent deepenings build on previous research? +4. **User-provided context**: Should users be able to provide additional context (e.g., "this project uses PostgreSQL, not MySQL") to guide research? +5. **Codebase analysis**: Should research agents analyze the existing codebase to find relevant patterns, or only reason from general knowledge? +6. **Conflicting research**: When research agents disagree (e.g., one says "use Redis" and another says "avoid Redis"), how should the synthesis handle it? + +--- + +## Alternatives Considered + +| Alternative | Pros | Cons | Decision | +|-------------|------|------|----------| +| Sequential research (one agent) | Simple, cheaper | Slow; misses multi-perspective insights | Rejected — parallel is core value | +| Automatic deepening (always on) | No manual step | Adds latency to every plan; unnecessary for simple tasks | Optional auto-trigger | +| Web search integration | Real-time information | Inconsistent quality; potential hallucination from web results | Deferred — consider for v2 | +| User-provided research questions | Targeted research | Requires user to know what to ask | Complement — support alongside auto-research | +| LLM-only research (no task spawning) | Simpler, no infrastructure | Limited by single context window; no parallelism | Rejected — defeats the purpose | + +--- + +## Priority & Complexity Assessment + +- **Priority: MEDIUM** — Plan deepening significantly improves plan quality, but it's enhancement over an already-functional planning workflow. The compound engineering plugin's data shows ~40% plan revision reduction. +- **Complexity: LOW** — This feature is largely a composition of existing primitives (task spawning, group waiting, plan file updates). The main new work is research agent prompts and synthesis logic. +- **Risk: LOW** — Worst case is slightly better plans. No system changes required. Can be adopted incrementally. diff --git a/docs/proposals/feature-task-templates.md b/docs/proposals/feature-task-templates.md new file mode 100644 index 0000000..98abde9 --- /dev/null +++ b/docs/proposals/feature-task-templates.md @@ -0,0 +1,602 @@ +# Feature Proposal: Reusable Task Templates & Meta-Commands + +> **Priority:** Medium +> **Complexity:** Medium +> **Estimated Effort:** 8-12 days +> **Status:** Proposal +> **Date:** 2026-02-09 +> **Dependencies:** None (standalone, but complements [Workflow Presets](feature-workflow-presets.md)) +> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Workflow Presets](feature-workflow-presets.md) · [Multi-Agent Review](feature-multi-agent-review.md) + +--- + +## Problem Statement + +Makima tasks are created with **ad-hoc plans** every time: + +- **No plan reuse** — even when spawning the same type of task (e.g., "add API endpoint"), the plan is written from scratch +- **No standardization** — different supervisors produce different quality plans for the same task type +- **No best practices encoding** — hard-won knowledge about how to structure certain tasks isn't captured +- **No variable substitution** — plans can't be parameterized for reuse +- **No validation** — there's no way to verify a plan includes required steps before execution +- **No meta-creation** — the system cannot create its own task templates or improve its own capabilities + +The compound engineering plugin addresses this with meta-commands (`/create-agent-skill`, `/heal-skill`) that allow the system to create and repair its own specialized capabilities. + +--- + +## How Compound Engineering Solves This + +### `/create-agent-skill` + +Creates new specialized agents and skills on demand: + +```bash +/create-agent-skill "database migration reviewer" +``` + +This generates: +1. An agent definition file with specialized prompts +2. A skill file that exposes the agent as a command +3. Registration in the agent/skill registry + +### `/heal-skill` + +When a skill breaks (e.g., after a dependency change), this meta-command: +1. Analyzes the error +2. Identifies the root cause +3. Patches the skill definition +4. Tests the fix + +The key insight: **the system should be able to improve and extend itself**. + +--- + +## Proposed Makima Implementation + +### 1. Task Recipe Format + +Task recipes are parameterized plan templates with validation and metadata: + +```yaml +# .makima/recipes/api-endpoint.yaml +name: api-endpoint +description: "Create a new REST API endpoint" +version: 1 +author: "team" +tags: [api, backend, rest] + +# Input variables +variables: + endpoint_name: + required: true + description: "Name of the endpoint (e.g., 'users', 'orders')" + validation: "^[a-z][a-z0-9-]*$" + + http_method: + required: true + description: "HTTP method" + enum: [GET, POST, PUT, PATCH, DELETE] + default: GET + + resource_name: + required: true + description: "Name of the resource/model" + + requires_auth: + required: false + default: true + description: "Whether the endpoint requires authentication" + + database_table: + required: false + description: "Database table name (if applicable)" + +# Plan template with variable substitution +plan: | + ## Task: Create {{ http_method }} /api/{{ endpoint_name }} Endpoint + + ### Step 1: Define Route + Add the `{{ http_method }} /api/{{ endpoint_name }}` route to the router. + {% if requires_auth %} + Apply authentication middleware to this route. + {% endif %} + + ### Step 2: Create Handler + Create the handler function for {{ endpoint_name }}. + {% if database_table %} + The handler should query the `{{ database_table }}` table. + {% endif %} + + ### Step 3: Request/Response Models + Define request and response types for the {{ resource_name }} resource. + Include validation for all input fields. + + ### Step 4: Error Handling + Implement proper error responses: + - 400 for validation errors + - 401 for authentication failures + {% if requires_auth %} + - 403 for authorization failures + {% endif %} + - 404 for not found + - 500 for server errors + + ### Step 5: Tests + Write tests covering: + - Happy path + - Input validation + {% if requires_auth %} + - Authentication required + - Authorization check + {% endif %} + - Error cases + - Edge cases + + ### Step 6: Documentation + Update API documentation with: + - Endpoint URL and method + - Request/response schemas + - Example requests and responses + - Error codes + +# Validation rules — checks that must pass before execution +validation: + - check: "file_exists" + path: "src/api/mod.rs" + message: "API module must exist" + - check: "grep" + pattern: "Router" + path: "src/api/mod.rs" + message: "Router must be defined in API module" + +# Expected outputs +outputs: + files: + - "src/api/{{ endpoint_name }}.rs" + - "src/api/{{ endpoint_name }}_test.rs" + tests: + - "cargo test {{ endpoint_name }}" + +# Metadata for recipe discovery +metadata: + estimated_time: "30-60 minutes" + difficulty: "easy" + example_usage: | + makima recipe run api-endpoint \ + --var endpoint_name=users \ + --var http_method=GET \ + --var resource_name=User \ + --var database_table=users +``` + +### 2. Recipe Registry + +Recipes are discovered from three sources (same hierarchy as workflow presets): + +| Level | Location | Scope | +|-------|----------|-------| +| Built-in | Shipped with makima | All users | +| Repository | `.makima/recipes/` | All users of the repo | +| User | `~/.makima/recipes/` | Single user | + +**Precedence**: User > Repository > Built-in (same name overrides) + +### 3. Supervisor Commands + +#### List Available Recipes + +```bash +makima recipe list + +# Output: +# NAME DESCRIPTION SOURCE TAGS +# api-endpoint Create a new REST API endpoint built-in api, backend +# db-migration Create a database migration built-in database +# react-component Create a React component built-in frontend, react +# unit-test Create unit tests for a module built-in testing +# bug-fix Structured bug fix workflow built-in debugging +# custom-validator Create input validation module repo validation +``` + +#### Run a Recipe + +```bash +# Run with explicit variables +makima recipe run api-endpoint \ + --var endpoint_name=users \ + --var http_method=GET \ + --var resource_name=User \ + --var database_table=users + +# Run with interactive variable input +makima recipe run api-endpoint + +# Preview the generated plan (dry run) +makima recipe preview api-endpoint \ + --var endpoint_name=users \ + --var http_method=GET +``` + +#### Create a Recipe + +```bash +# Create recipe from scratch +makima recipe create --name "my-recipe" --edit + +# Generate recipe from a completed task (meta-creation) +makima recipe create --from-task <task-id> --name "my-recipe" + +# Generate recipe from a plan file +makima recipe create --from-plan plan.md --name "my-recipe" +``` + +#### Validate a Recipe + +```bash +# Validate recipe file +makima recipe validate .makima/recipes/my-recipe.yaml + +# Validate recipe variables +makima recipe validate api-endpoint \ + --var endpoint_name=users \ + --var http_method=GET +``` + +### 4. Meta-Commands: Self-Improving Templates + +The most powerful aspect of the compound engineering plugin is its ability to **create its own capabilities**. Makima can implement similar meta-commands: + +#### `makima recipe generate` + +The system analyzes completed tasks and suggests recipe templates: + +```bash +# Analyze recent tasks and suggest recipes +makima recipe generate --analyze-last 20 + +# Output: +# Detected patterns: +# 1. "API endpoint creation" — 7 tasks followed similar pattern +# Suggested recipe: api-endpoint (confidence: 0.89) +# Variables: endpoint_name, http_method, resource_name +# +# 2. "Database migration" — 4 tasks followed similar pattern +# Suggested recipe: db-migration (confidence: 0.76) +# Variables: table_name, migration_type +# +# Generate these recipes? [y/N] +``` + +#### `makima recipe heal` + +When a recipe fails repeatedly, the system can analyze and fix it: + +```bash +# Analyze recipe failures and suggest fixes +makima recipe heal api-endpoint + +# Output: +# Analyzed 3 recent failures of 'api-endpoint': +# Root cause: Step 1 references 'src/api/mod.rs' but project uses 'src/routes/mod.rs' +# Suggested fix: Change validation path and plan references +# Apply fix? [y/N] +``` + +#### `makima recipe evolve` + +Improve recipes based on review findings: + +```bash +# Check if review findings suggest recipe improvements +makima recipe evolve api-endpoint --from-findings + +# Output: +# Review findings from tasks using 'api-endpoint' recipe: +# - SEC-001: "Missing rate limiting" (3 occurrences) +# - PERF-001: "Missing pagination" (2 occurrences) +# +# Suggested additions to recipe: +# 1. Add "Rate Limiting" step after Step 1 +# 2. Add pagination to Step 2 for GET endpoints +# Apply improvements? [y/N] +``` + +### 5. Built-In Recipes + +#### `api-endpoint` + +Creates a REST API endpoint with handler, models, validation, tests, and docs. + +#### `db-migration` + +Creates a database migration with up/down scripts, validation, and rollback plan. + +```yaml +name: db-migration +variables: + table_name: { required: true } + migration_type: { required: true, enum: [create-table, alter-table, add-index, seed-data] } +plan: | + ## Create Database Migration: {{ migration_type }} on {{ table_name }} + ### Step 1: Create migration file + ### Step 2: Write up migration + ### Step 3: Write down migration (rollback) + ### Step 4: Test migration on clean database + ### Step 5: Test rollback + ### Step 6: Document migration in changelog +``` + +#### `react-component` + +Creates a React component with props, state, styling, and tests. + +#### `unit-test` + +Generates unit tests for an existing module by analyzing its public API. + +#### `bug-fix` + +Structured bug fix workflow: reproduce → root cause → fix → test → document. + +```yaml +name: bug-fix +variables: + bug_description: { required: true } + reproduction_steps: { required: false } + affected_area: { required: false } +plan: | + ## Bug Fix: {{ bug_description }} + + ### Step 1: Reproduce + {% if reproduction_steps %} + Follow these reproduction steps: {{ reproduction_steps }} + {% else %} + Identify and document reproduction steps. + {% endif %} + + ### Step 2: Root Cause Analysis + Trace the code path to identify the root cause. + {% if affected_area %} + Start in: {{ affected_area }} + {% endif %} + + ### Step 3: Implement Fix + Fix the root cause, not just the symptom. + + ### Step 4: Write Regression Test + Create a test that would have caught this bug. + + ### Step 5: Verify Fix + Run the reproduction steps and confirm the bug is fixed. + Run the full test suite to check for regressions. + + ### Step 6: Document + Document what caused the bug and how it was fixed. +``` + +--- + +## Integration with Existing Makima Features + +### Supervisor Task Spawning + +Recipes generate plans that are passed to `spawn-task`: + +```rust +// Recipe execution +let plan = recipe.render_plan(&variables)?; +let task = spawn_task(SpawnTaskRequest { + task_name: format!("{} ({})", recipe.name, variables.get("primary_var")), + plan, + // ... other params from context +})?; +``` + +### Contract Files + +Recipe definitions can be stored as contract files for versioning: + +```rust +File { + contract_id: None, // Global, not contract-specific + name: "Recipe: api-endpoint", + body: vec![ + BodyElement::Code { language: Some("yaml"), content: recipe_yaml }, + ], +} +``` + +### Workflow Presets + +Recipes and presets are complementary: +- **Presets** define the high-level workflow (which phases, what triggers) +- **Recipes** define the low-level task plans (what each task does) + +A preset can reference recipes: + +```yaml +# In a preset +phases: + execute: + recipe: api-endpoint # Use the api-endpoint recipe for this phase's tasks + recipe_vars: + endpoint_name: "{{ task_description }}" +``` + +### Knowledge Accumulation + +Recipes can be **evolved** based on learnings: +- When compound learning captures a pattern, check if it maps to an existing recipe +- If so, suggest recipe improvements +- If not, suggest creating a new recipe + +### Directive System + +For directive-based workflows, recipes can be used as task plan sources: + +```rust +DirectiveStep { + name: "create-users-endpoint", + task_plan: recipe.render_plan(&variables)?, // Generated from recipe + // ... +} +``` + +--- + +## Implementation Plan + +### Phase 1: Core Recipe System (3-4 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Recipe YAML schema | 0.5 days | Define format, validation rules | +| YAML parser with Jinja-like templating | 1 day | Variable substitution, conditionals | +| `recipe list` command | 0.5 days | Discover and list recipes | +| `recipe run` command | 1 day | Parse, validate, render, spawn task | +| `recipe preview` command | 0.5 days | Dry-run display | + +### Phase 2: Recipe Management (2-3 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Multi-level discovery | 0.5 days | Built-in, repo, user resolution | +| `recipe create` command | 1 day | Create from scratch or from task | +| `recipe validate` command | 0.5 days | YAML validation, variable check | +| Built-in recipe definitions | 1 day | Write 5 default recipes | + +### Phase 3: Meta-Commands (3-5 days) + +| Task | Effort | Description | +|------|--------|-------------| +| `recipe generate` | 1.5 days | Pattern detection from task history | +| `recipe heal` | 1 day | Failure analysis and auto-fix | +| `recipe evolve` | 1 day | Improve recipes from findings/learnings | +| Recipe versioning | 0.5 days | Version tracking, deprecation | +| Documentation | 0.5 days | User guide, recipe authoring guide | + +--- + +## Configuration Examples + +### Running a Recipe + +```bash +# Simple usage +makima recipe run api-endpoint \ + --var endpoint_name=orders \ + --var http_method=POST \ + --var resource_name=Order \ + --var requires_auth=true \ + --var database_table=orders + +# This spawns a task with the rendered plan: +# "## Task: Create POST /api/orders Endpoint +# ### Step 1: Define Route +# Add the POST /api/orders route to the router. +# Apply authentication middleware to this route. +# ..." +``` + +### Creating a Recipe from a Completed Task + +```bash +# After completing a successful task +makima recipe create --from-task abc-123 --name "graphql-resolver" + +# Analyzes the task's plan and execution to generate: +# .makima/recipes/graphql-resolver.yaml +# with variables extracted from repeated patterns +``` + +### Recipe with Validation + +```yaml +# .makima/recipes/react-component.yaml +name: react-component +variables: + component_name: + required: true + validation: "^[A-Z][a-zA-Z]*$" # PascalCase + use_typescript: + required: false + default: true + include_tests: + required: false + default: true + styling: + required: false + enum: [css-modules, styled-components, tailwind] + default: css-modules + +validation: + - check: "file_exists" + path: "src/components" + message: "Components directory must exist" + - check: "not_exists" + path: "src/components/{{ component_name }}" + message: "Component {{ component_name }} already exists" + +plan: | + ## Create React Component: {{ component_name }} + + ### Step 1: Component File + Create `src/components/{{ component_name }}/{{ component_name }}.{{ 'tsx' if use_typescript else 'jsx' }}` + with the component skeleton. + + ### Step 2: Styling + {% if styling == 'css-modules' %} + Create `{{ component_name }}.module.css` with base styles. + {% elif styling == 'styled-components' %} + Create styled components in the component file. + {% elif styling == 'tailwind' %} + Use Tailwind CSS classes directly in the component. + {% endif %} + + {% if include_tests %} + ### Step 3: Tests + Create `{{ component_name }}.test.{{ 'tsx' if use_typescript else 'jsx' }}` + with tests for rendering, props, and user interactions. + {% endif %} + + ### Step {{ '4' if include_tests else '3' }}: Export + Add {{ component_name }} to the components index file. + +outputs: + files: + - "src/components/{{ component_name }}/{{ component_name }}.{{ 'tsx' if use_typescript else 'jsx' }}" + - "src/components/{{ component_name }}/index.{{ 'ts' if use_typescript else 'js' }}" +``` + +--- + +## Open Questions + +1. **Templating language**: Should we use a full Jinja2-like syntax or a simpler `{{ variable }}` substitution? Jinja adds power but complexity. +2. **Recipe dependencies**: Can recipes depend on other recipes? (e.g., "api-endpoint requires db-migration to have run first") +3. **Recipe testing**: How do you test that a recipe produces valid plans? Should recipes have test cases? +4. **Recipe marketplace**: Should there be a community registry for sharing recipes? +5. **Pattern detection**: How sophisticated should `recipe generate` be? Simple plan comparison, or full semantic analysis? +6. **Recipe scope**: Should recipes generate just plans, or also pre-create file scaffolding (like code generators)? +7. **Backwards compatibility**: When a recipe is updated, what happens to tasks that were created with the old version? + +--- + +## Alternatives Considered + +| Alternative | Pros | Cons | Decision | +|-------------|------|------|----------| +| Plan library (copy-paste) | Simple | No variables, no validation | Rejected — not reusable enough | +| Code generators (scaffolding) | Creates actual files | Over-prescriptive; doesn't handle logic | Complement — recipes can reference generators | +| LLM-only planning | Maximum flexibility | Inconsistent; no standardization | Current state — recipes improve on this | +| Cookiecutter-style templates | Familiar | Wrong level (project-level vs task-level) | Rejected — different abstraction | +| Hardcoded task types | Fast | Not extensible; limited variety | Rejected — need flexibility | + +--- + +## Priority & Complexity Assessment + +- **Priority: MEDIUM** — Task templates improve consistency and speed but aren't required for makima to function. They become increasingly valuable as the system is used more (patterns emerge). +- **Complexity: MEDIUM** — YAML parsing and variable substitution are straightforward. Meta-commands (generate, heal, evolve) require sophisticated analysis of task history and are the main complexity drivers. +- **Risk: LOW-MEDIUM** — Core recipe system is low risk. Meta-commands (auto-generation, healing) involve AI-driven analysis that may produce variable quality. Mitigated by requiring human approval before applying changes. diff --git a/docs/proposals/feature-workflow-presets.md b/docs/proposals/feature-workflow-presets.md new file mode 100644 index 0000000..1468a8a --- /dev/null +++ b/docs/proposals/feature-workflow-presets.md @@ -0,0 +1,623 @@ +# Feature Proposal: Workflow Presets / Pipeline Templates + +> **Priority:** High +> **Complexity:** Medium +> **Estimated Effort:** 10-15 days +> **Status:** Proposal +> **Date:** 2026-02-09 +> **Dependencies:** None (foundational feature) +> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Multi-Agent Review](feature-multi-agent-review.md) · [Knowledge Accumulation](feature-knowledge-accumulation.md) + +--- + +## Problem Statement + +Every makima contract currently requires **manual orchestration**: + +- Users must decide which contract type to use (simple, specification, execute) +- Supervisors must manually spawn tasks, wait for results, advance phases +- There are **no pre-built pipelines** for common workflows (full feature development, quick bug fix, refactoring, investigation) +- The supervisor plan must encode the full orchestration logic every time +- **Repetitive patterns** (plan → execute → test → review) are re-invented for each contract +- New users face a steep learning curve to orchestrate contracts effectively + +The compound engineering plugin's `/lfg` (Let's F***ing Go) and `/slfg` (Super LFG) commands solve this with **one-command full pipelines** that chain all phases automatically. + +--- + +## How Compound Engineering Solves This + +### LFG Pipeline (Serial) + +```bash +/lfg "Implement user authentication" +``` + +Automatically chains: +``` +Plan → Deepen Plan → Work → Review → Resolve Findings → Test → Compound → Done +``` + +### SLFG Pipeline (Parallel) + +```bash +/slfg "Implement user authentication" +``` + +Same as LFG but parallelizes independent steps: +``` +Plan ──▶ Deepen Plan ──▶ Work ──▶ ┌─ Review ─────┐ ──▶ Test ──▶ Compound + │ (parallel) │ + └──────────────┘ +``` + +The key insight: **most engineering workflows follow predictable patterns** that can be templated and reused. + +--- + +## Proposed Makima Implementation + +### 1. Preset Definition Format + +Presets are defined in YAML and describe a complete workflow: + +```yaml +# .makima/presets/full-pipeline.yaml +name: full-pipeline +description: "Complete feature development pipeline with review and learning" +contract_type: specification +version: 1 + +# Variables that can be substituted at runtime +variables: + task_description: + required: true + description: "What to build" + repository: + required: false + description: "Target repository URL" + base_branch: + required: false + default: "main" + description: "Branch to work from" + +# Phase configuration +phases: + research: + enabled: true + deliverables: + - id: research-notes + name: "Research Notes" + priority: required + supervisor_plan: | + Research the requirements for: {{ task_description }} + - Analyze the existing codebase for relevant patterns + - Identify dependencies and constraints + - Document findings as research notes + + plan: + enabled: true + deliverables: + - id: plan-document + name: "Implementation Plan" + priority: required + supervisor_plan: | + Create an implementation plan for: {{ task_description }} + Based on the research findings. + # Auto-deepen plan (requires Plan Deepening feature) + deepen: true + deepen_focus: + - edge-cases + - security + - performance + + execute: + enabled: true + deliverables: + - id: implementation + name: "Implementation" + priority: required + supervisor_plan: | + Execute the plan for: {{ task_description }} + Follow the deepened plan step by step. + # Spawn configuration + max_concurrent_tasks: 3 + completion_action: "branch" + + review: + enabled: true + deliverables: + - id: review-report + name: "Review Report" + priority: required + # Auto-review configuration (requires Multi-Agent Review feature) + auto_review: true + review_agents: + - security-sentinel + - performance-oracle + - architecture-strategist + - test-coverage-analyzer + merge_blocking_severity: P1 + + compound: + enabled: true + # Auto-compound (requires Knowledge Accumulation feature) + auto_compound: true + categories: + - architecture-decisions + - security-practices + - performance-optimizations + +# Hooks +hooks: + on_phase_complete: + execute: + - run: "makima supervisor spawn 'run-tests' --plan 'Run the full test suite'" + - wait_for: "run-tests" + on_contract_complete: + - run: "makima supervisor compound" +``` + +### 2. Built-In Presets + +#### `full-pipeline` — Complete Feature Development + +``` +Research → Plan → Deepen → Execute → Test → Review → Resolve → Compound +``` + +Best for: New features, major changes, complex implementations. + +#### `quick-fix` — Rapid Bug Fix + +``` +Execute → Test → Done +``` + +Best for: Small bug fixes, typo corrections, config changes. + +```yaml +# .makima/presets/quick-fix.yaml +name: quick-fix +description: "Fast bug fix with minimal ceremony" +contract_type: simple + +phases: + plan: + enabled: true + deliverables: + - id: fix-plan + name: "Fix Plan" + priority: required + supervisor_plan: | + Quick analysis and fix plan for: {{ task_description }} + Keep it brief — identify the bug and the fix. + + execute: + enabled: true + deliverables: + - id: fix + name: "Bug Fix" + priority: required + supervisor_plan: | + Fix the bug: {{ task_description }} + Run relevant tests after fixing. + completion_action: "branch" +``` + +#### `refactor` — Code Refactoring + +``` +Research → Plan → Deepen → Execute → Test → Review → Done +``` + +Best for: Code restructuring, pattern changes, dependency updates. + +```yaml +# .makima/presets/refactor.yaml +name: refactor +description: "Systematic refactoring with safety checks" +contract_type: specification + +phases: + research: + enabled: true + supervisor_plan: | + Analyze the codebase to understand the current structure for: {{ task_description }} + Document all files that will be affected. + Identify dependencies and potential breaking changes. + + plan: + enabled: true + deepen: true + deepen_focus: + - edge-cases + - patterns + supervisor_plan: | + Create a step-by-step refactoring plan for: {{ task_description }} + Ensure each step maintains a working state (no big-bang changes). + + execute: + enabled: true + supervisor_plan: | + Execute the refactoring plan for: {{ task_description }} + After each significant change, run tests to verify nothing is broken. + completion_action: "branch" + + review: + enabled: true + auto_review: true + review_agents: + - architecture-strategist + - test-coverage-analyzer + merge_blocking_severity: P1 +``` + +#### `investigation` — Research & Analysis + +``` +Research → Document → Done +``` + +Best for: Bug investigation, feasibility analysis, technology evaluation. + +```yaml +# .makima/presets/investigation.yaml +name: investigation +description: "Research-focused workflow for analysis and documentation" +contract_type: simple + +phases: + plan: + enabled: true + supervisor_plan: | + Plan the investigation for: {{ task_description }} + Define what questions need answering and what to examine. + + execute: + enabled: true + deliverables: + - id: investigation-report + name: "Investigation Report" + priority: required + supervisor_plan: | + Investigate: {{ task_description }} + Document findings thoroughly. + Create actionable recommendations. + completion_action: "none" +``` + +### 3. Preset Discovery & Usage + +#### CLI Commands + +```bash +# List available presets +makima preset list +# Output: +# NAME DESCRIPTION SOURCE +# full-pipeline Complete feature development pipeline built-in +# quick-fix Fast bug fix with minimal ceremony built-in +# refactor Systematic refactoring with safety checks built-in +# investigation Research-focused analysis workflow built-in +# custom-deploy Deployment pipeline with staging .makima/presets/ + +# Run a preset +makima preset run full-pipeline \ + --var task_description="Add user authentication with JWT" \ + --var repository="github.com/org/repo" + +# Run with interactive variable input +makima preset run full-pipeline + +# Preview what a preset will do (dry run) +makima preset preview full-pipeline \ + --var task_description="Add user authentication with JWT" + +# Create a new preset from an existing contract +makima preset create --from-contract <contract-id> --name "my-workflow" + +# Validate a preset file +makima preset validate .makima/presets/my-preset.yaml +``` + +#### Under the Hood + +When `makima preset run full-pipeline` executes: + +``` +1. Parse preset YAML +2. Substitute variables +3. Create contract with specified type +4. Configure phases from preset +5. Create supervisor task with generated plan +6. Supervisor executes phases according to preset configuration +7. Auto-triggers (review, compound) fire at appropriate phase transitions +``` + +``` +┌─────────────────────────────────────────────────────────┐ +│ Preset Engine │ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ +│ │ Parse │───▶│ Variable │───▶│ Create Contract │ │ +│ │ YAML │ │ Subst. │ │ + Supervisor │ │ +│ └──────────┘ └──────────┘ └────────┬─────────┘ │ +│ │ │ +│ ┌────────────────────────┐│ │ +│ │ Phase Orchestration ││ │ +│ │ │▼ │ +│ │ research ──▶ plan ──▶ execute │ +│ │ │ │ │ +│ │ deepen-plan │ │ +│ │ (if enabled) │ │ +│ │ ▼ │ +│ │ review ──▶ compound│ +│ │ (auto) (auto) │ +│ └────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +### 4. Custom Preset Creation + +Users create presets at three levels: + +| Level | Location | Scope | +|-------|----------|-------| +| Built-in | Shipped with makima | All users | +| Repository | `.makima/presets/` | All users of the repo | +| User | `~/.makima/presets/` | Single user | + +**Precedence**: User > Repository > Built-in (same name overrides) + +#### Creating from Existing Contract + +```bash +# Analyze a successful contract and generate a preset from it +makima preset create --from-contract abc-123 --name "my-api-workflow" + +# This generates: +# ~/.makima/presets/my-api-workflow.yaml +# with phases, timings, and patterns extracted from the contract +``` + +--- + +## Integration with Existing Makima Features + +### Contract System + +Presets create contracts with the appropriate type: +```rust +// Preset specifies contract_type +let contract = create_contract(CreateContractRequest { + name: format!("{} ({})", task_description, preset.name), + contract_type: preset.contract_type.clone(), // "simple", "specification", "execute" + phase: preset.first_enabled_phase(), + autonomous_loop: true, + phase_guard: preset.phase_guard, + // ... +}); +``` + +### Supervisor Plans + +The preset generates a comprehensive supervisor plan by combining phase-specific instructions: + +```rust +let supervisor_plan = preset.generate_supervisor_plan(&variables); +// This produces a plan like: +// "You are orchestrating a full-pipeline workflow. +// Phase 1 (Research): ... +// Phase 2 (Plan): ... +// ..." +``` + +### Directive System Integration + +For complex presets, phases can be modeled as directive steps with dependencies: + +```rust +// Each phase becomes a directive step +let steps = preset.phases.iter().map(|phase| { + DirectiveStep { + name: phase.name.clone(), + description: Some(phase.description.clone()), + task_plan: Some(phase.supervisor_plan.clone()), + depends_on: phase.dependencies(), + // ... + } +}).collect(); +``` + +This allows parallel phases (e.g., independent review agents) to execute concurrently while respecting dependencies. + +### Hooks System + +Presets define hooks that trigger at phase transitions: + +```yaml +hooks: + on_phase_complete: + execute: + - run: "makima supervisor spawn 'tests' --plan 'Run test suite'" + - wait_for: "tests" + on_review_complete: + - condition: "findings.p1_count == 0" + run: "makima supervisor advance-phase compound -y" + - condition: "findings.p1_count > 0" + run: "makima supervisor ask 'P1 findings detected. Continue?' --choices 'Fix first,Continue anyway'" +``` + +### Autonomous Loop + +Presets work with the existing autonomous loop: +- Each phase uses `<COMPLETION_GATE>` to signal completion +- Circuit breaker prevents stuck phases +- `autonomous_loop: true` on the contract enables automatic continuation + +--- + +## Implementation Plan + +### Phase 1: Core Preset Engine (4-5 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Preset YAML schema definition | 0.5 days | Define YAML format, validation rules | +| YAML parser with variable substitution | 1 day | Parse presets, substitute `{{ variables }}` | +| `preset list` command | 0.5 days | Discover and list available presets | +| `preset run` command | 1.5 days | Create contract + supervisor from preset | +| `preset preview` command | 0.5 days | Dry-run display | +| Built-in preset definitions | 1 day | Write 4 default presets | + +### Phase 2: Custom Presets (3-5 days) + +| Task | Effort | Description | +|------|--------|-------------| +| User/repo preset discovery | 1 day | Multi-level preset resolution | +| `preset create` command | 1.5 days | Generate preset from existing contract | +| `preset validate` command | 0.5 days | Validate preset YAML | +| Preset versioning | 1 day | Version field, migration support | + +### Phase 3: Integration & Polish (3-5 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Hooks system | 1.5 days | Phase transition hooks | +| Auto-trigger integration | 1 day | Wire to review/compound auto-triggers | +| Directive system integration | 1 day | Complex presets as directive DAGs | +| Documentation | 0.5 days | User guide, preset authoring guide | + +--- + +## Configuration Examples + +### Running a Preset + +```bash +# Simplest usage — one command to run a full pipeline +makima preset run full-pipeline --var task_description="Add OAuth2 login" + +# This creates: +# - Contract: "Add OAuth2 login (full-pipeline)" +# - Supervisor task with complete phase orchestration +# - Auto-review enabled +# - Auto-compound enabled +# - All phases configured with deliverables +``` + +### Creating a Custom Preset + +```yaml +# .makima/presets/api-feature.yaml +name: api-feature +description: "API feature development with schema validation" +contract_type: specification +version: 1 + +variables: + feature_name: + required: true + description: "Name of the API feature" + api_version: + required: false + default: "v1" + description: "API version" + +phases: + research: + enabled: true + supervisor_plan: | + Research existing API patterns in the codebase for {{ api_version }}. + Document the current API schema structure. + Identify relevant endpoints and data models for {{ feature_name }}. + + plan: + enabled: true + deepen: true + deepen_focus: + - api-patterns + - security + - edge-cases + supervisor_plan: | + Plan the {{ feature_name }} API feature for {{ api_version }}. + Include: endpoint design, request/response schemas, validation rules, + error handling, and test cases. + + execute: + enabled: true + max_concurrent_tasks: 2 + supervisor_plan: | + Implement the {{ feature_name }} API feature. + Follow the plan. Create endpoints, handlers, validators, and tests. + Run tests after implementation. + completion_action: "branch" + + review: + enabled: true + auto_review: true + review_agents: + - security-sentinel + - api-contract-validator + - test-coverage-analyzer + merge_blocking_severity: P1 + + compound: + enabled: true + auto_compound: true + categories: + - api-patterns + - security-practices +``` + +### Listing Presets + +``` +$ makima preset list + +BUILT-IN PRESETS + full-pipeline Complete feature development pipeline with review and learning + quick-fix Fast bug fix with minimal ceremony + refactor Systematic refactoring with safety checks + investigation Research-focused analysis workflow + +REPOSITORY PRESETS (.makima/presets/) + api-feature API feature development with schema validation + migration Database migration with rollback plan + +USER PRESETS (~/.makima/presets/) + my-workflow Custom workflow for frontend development +``` + +--- + +## Open Questions + +1. **Preset inheritance**: Should presets be able to extend other presets? (e.g., `extends: full-pipeline` with overrides) +2. **Conditional phases**: Should phases be conditionally enabled based on runtime conditions? (e.g., skip review for changes under 50 lines) +3. **Preset parameters validation**: How strict should variable validation be? Allow arbitrary variables or enforce a schema? +4. **Preset sharing**: Should presets be sharable via a registry or marketplace? +5. **Preset analytics**: Should we track which presets are most used and their success rates? +6. **Rollback**: If a preset-driven workflow fails mid-phase, how should recovery work? +7. **Interactive mode**: Should presets support interactive steps where the user provides input mid-pipeline? + +--- + +## Alternatives Considered + +| Alternative | Pros | Cons | Decision | +|-------------|------|------|----------| +| Hardcoded pipelines | Simple, predictable | Not customizable; one-size-fits-all | Rejected — need flexibility | +| Pure CLI scripting | Maximum flexibility | Not portable; error-prone; no validation | Rejected — too fragile | +| GUI workflow builder | Visual, intuitive | High development cost; not scriptable | Deferred — consider for UI | +| Contract type expansion | Minimal new concepts | Doesn't solve orchestration; just adds phase combos | Partial — presets use contract types | +| Makefile-style approach | Familiar to developers | Wrong abstraction level; no variable substitution | Rejected — YAML is better fit | + +--- + +## Priority & Complexity Assessment + +- **Priority: HIGH** — Workflow presets are the **gateway feature** that makes all other features accessible. Without presets, users must manually orchestrate review, deepening, and compounding. With presets, these features are activated with a single command. +- **Complexity: MEDIUM** — YAML parsing and variable substitution are straightforward. Hooks system and directive integration add complexity. Main challenge is designing a preset schema that's flexible enough for diverse workflows without being overwhelming. +- **Risk: LOW** — Presets are purely additive. They don't change existing behavior. Users can always fall back to manual orchestration. |
