1 files changed, 504 insertions, 0 deletions
diff --git a/docs/proposals/feature-findings-tracking.md b/docs/proposals/feature-findings-tracking.md
new file mode 100644
index 0000000..bb8a68e
--- /dev/null
+++ b/docs/proposals/feature-findings-tracking.md
@@ -0,0 +1,504 @@
+# Feature Proposal: Structured Findings / Issues Tracking
+
+> **Priority:** Medium
+> **Complexity:** Low
+> **Estimated Effort:** 7-10 days
+> **Status:** Proposal
+> **Date:** 2026-02-09
+> **Dependencies:** None (standalone, but enhances [Multi-Agent Review](feature-multi-agent-review.md))
+> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Multi-Agent Review](feature-multi-agent-review.md) · [Workflow Presets](feature-workflow-presets.md)
+
+---
+
+## Problem Statement
+
+Currently, review outputs in makima are **unstructured text** in task conversation history:
+
+- **No standard format** for reporting issues found during review
+- **No severity classification** — all findings are treated equally
+- **No lifecycle tracking** — findings are either "in the review output" or "hopefully fixed"
+- **No verification** — there's no way to confirm a finding was actually resolved
+- **No aggregation** — findings from multiple review tasks can't be collected and deduplicated
+- **No blocking mechanism** — critical findings can't prevent phase transitions
+- **No metrics** — no data on how many findings are produced, resolved, or escaped
+
+This makes the review phase a documentation exercise rather than a quality gate.
+
+---
+
+## How Compound Engineering Solves This
+
+The compound engineering plugin uses **structured TODO/finding files** with YAML frontmatter and a defined lifecycle:
+
+### File Format
+
+```markdown
+---
+id: SEC-001
+status: open
+priority: P1
+category: security
+title: SQL injection in user search endpoint
+file: src/api/users.rs
+line: 47
+agent: security-sentinel
+created: 2026-02-09T10:30:00Z
+updated: 2026-02-09T10:30:00Z
+tags: [injection, input-validation, database]
+---
+
+# SQL Injection in User Search Endpoint
+
+## Finding
+The `search_users` handler directly interpolates the `query` parameter into
+a SQL string without parameterization.
+
+## Evidence
+```rust
+// src/api/users.rs:47
+let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query);
+```
+
+## Impact
+An attacker can execute arbitrary SQL queries, potentially:
+- Exfiltrating all user data
+- Modifying or deleting records
+- Escalating privileges
+
+## Recommendation
+Use parameterized queries:
+```rust
+let results = sqlx::query("SELECT * FROM users WHERE name LIKE $1")
+    .bind(format!("%{}%", query))
+    .fetch_all(&pool)
+    .await?;
+```
+
+## Resolution
+_Not yet resolved_
+```
+
+### File Naming Convention
+
+```
+findings/{issue_id}-{status}-{priority}-{description}.md
+```
+
+Example: `findings/SEC-001-open-P1-sql-injection-user-search.md`
+
+### Lifecycle
+
+```
+open ──▶ in-progress ──▶ resolved ──▶ verified
+  │                         │
+  └── wont-fix ◀────────────┘
+```
+
+---
+
+## Proposed Makima Implementation
+
+### 1. Finding Record Format
+
+Findings are stored as **contract files** with structured metadata and body:
+
+```rust
+// Finding metadata (stored in file description as structured JSON)
+#[derive(Serialize, Deserialize)]
+pub struct FindingMetadata {
+    pub id: String,                    // "SEC-001", auto-generated
+    pub status: FindingStatus,         // open, in_progress, resolved, verified, wont_fix
+    pub severity: FindingSeverity,     // P1 (critical), P2 (major), P3 (minor)
+    pub category: String,             // security, performance, architecture, etc.
+    pub title: String,                // Short description
+    pub file_path: Option<String>,    // Affected file
+    pub line_number: Option<u32>,     // Affected line
+    pub source_agent: Option<String>, // Which review agent found this
+    pub source_task_id: Option<Uuid>, // Task that produced this finding
+    pub assigned_to: Option<Uuid>,    // Task assigned to resolve this
+    pub created_at: DateTime<Utc>,
+    pub updated_at: DateTime<Utc>,
+    pub resolved_at: Option<DateTime<Utc>>,
+    pub verified_at: Option<DateTime<Utc>>,
+    pub tags: Vec<String>,
+}
+
+pub enum FindingStatus {
+    Open,
+    InProgress,
+    Resolved,
+    Verified,
+    WontFix,
+}
+
+pub enum FindingSeverity {
+    P1,  // Critical — must fix before merge
+    P2,  // Major — should fix, can defer with justification
+    P3,  // Minor — nice to fix, can defer
+}
+```
+
+### 2. Supervisor Commands
+
+#### Create a Finding
+
+```bash
+# Create a finding from review output
+makima supervisor finding create \
+  --severity P1 \
+  --category security \
+  --title "SQL injection in user search endpoint" \
+  --file src/api/users.rs \
+  --line 47 \
+  --description "Direct string interpolation in SQL query"
+
+# Output: Created finding SEC-001 (P1/security)
+```
+
+#### List Findings
+
+```bash
+# List all findings for the current contract
+makima supervisor finding list
+# Output:
+# ID       SEVERITY  STATUS       CATEGORY      TITLE
+# SEC-001  P1        open         security      SQL injection in user search
+# PERF-001 P2        in-progress  performance   N+1 query in order listing
+# ARCH-001 P3        resolved     architecture  Handler accessing DB directly
+
+# Filter by severity
+makima supervisor finding list --severity P1
+
+# Filter by status
+makima supervisor finding list --status open
+
+# Summary only
+makima supervisor finding summary
+# Output:
+# Total: 12 findings
+# P1: 2 open, 1 resolved
+# P2: 3 open, 2 in-progress
+# P3: 4 resolved
+```
+
+#### Update Finding Status
+
+```bash
+# Mark as in-progress (assigned to a task)
+makima supervisor finding update SEC-001 --status in-progress --assigned-to <task-id>
+
+# Mark as resolved
+makima supervisor finding update SEC-001 --status resolved \
+  --resolution "Replaced with parameterized query in commit abc123"
+
+# Mark as verified (after re-review)
+makima supervisor finding update SEC-001 --status verified
+
+# Mark as won't fix
+makima supervisor finding update SEC-001 --status wont-fix \
+  --justification "Endpoint is internal-only, behind auth"
+```
+
+#### Auto-Create from Review Output
+
+```bash
+# Parse review agent output and create findings automatically
+makima supervisor finding parse-output --task-id <review-task-id>
+```
+
+This parses structured review output and creates individual finding records.
+
+### 3. Finding Lifecycle
+
+```
+┌────────────────────────────────────────────────────────────┐
+│                    Finding Lifecycle                        │
+│                                                            │
+│  ┌──────┐    ┌─────────────┐    ┌──────────┐              │
+│  │      │    │             │    │          │              │
+│  │ OPEN │───▶│ IN-PROGRESS │───▶│ RESOLVED │              │
+│  │      │    │             │    │          │              │
+│  └──┬───┘    └─────────────┘    └────┬─────┘              │
+│     │                                │                     │
+│     │        ┌─────────────┐    ┌────┴─────┐              │
+│     │        │             │    │          │              │
+│     └───────▶│  WONT-FIX   │    │ VERIFIED │              │
+│              │             │    │          │              │
+│              └─────────────┘    └──────────┘              │
+│                                                            │
+│  Triggers:                                                 │
+│  open ─▶ in_progress : Task assigned to fix                │
+│  in_progress ─▶ resolved : Fix committed                   │
+│  resolved ─▶ verified : Re-review confirms fix             │
+│  open ─▶ wont_fix : Explicit decision with justification   │
+│  resolved ─▶ wont_fix : Fix deemed unnecessary after review│
+└────────────────────────────────────────────────────────────┘
+```
+
+### 4. P1/P2/P3 Severity System
+
+| Severity | Name | Description | Merge Policy |
+|----------|------|-------------|--------------|
+| **P1** | Critical | Security vulnerabilities, data loss risks, crash bugs | **Blocks merge** — must be resolved before contract completion |
+| **P2** | Major | Performance issues, architectural concerns, significant tech debt | **Should fix** — can defer with explicit justification |
+| **P3** | Minor | Style issues, minor improvements, documentation gaps | **Nice to fix** — can defer freely |
+
+### 5. Merge Blocking
+
+When findings exist, phase transitions and merge operations check for blockers:
+
+```rust
+// In advance-phase handler
+async fn check_findings_gate(contract_id: Uuid) -> Result<bool> {
+    let findings = get_findings(contract_id).await?;
+    let open_p1s = findings.iter()
+        .filter(|f| f.severity == P1 && f.status == Open)
+        .count();
+
+    if open_p1s > 0 {
+        warn!("{} open P1 findings block phase transition", open_p1s);
+        return Ok(false);
+    }
+    Ok(true)
+}
+```
+
+### 6. Auto-Resolution Workflow
+
+When the Multi-Agent Review feature is available, findings drive an automated resolution cycle:
+
+```
+┌──────────┐     ┌───────────┐     ┌──────────┐     ┌──────────┐
+│  Review  │────▶│ Findings  │────▶│ Resolve  │────▶│ Verify   │
+│  Phase   │     │ Created   │     │ Tasks    │     │ Fixes    │
+│          │     │ (P1/P2/P3)│     │ Spawned  │     │ Pass?    │
+└──────────┘     └───────────┘     └──────────┘     └────┬─────┘
+                                                         │
+                                                    Yes  │  No
+                                                    ┌────┴────┐
+                                                    ▼         ▼
+                                              ┌──────────┐  Loop back
+                                              │ Findings │  to resolve
+                                              │ Verified │
+                                              └──────────┘
+```
+
+```bash
+# Auto-resolve: spawn tasks to fix each P1/P2 finding
+makima supervisor finding auto-resolve --severity P1,P2
+
+# This spawns one task per finding:
+# - Task plan includes the finding details and recommendation
+# - Task is assigned to the finding (finding.assigned_to = task.id)
+# - When task completes, finding status → resolved
+# - Verification task confirms the fix
+```
+
+---
+
+## Integration with Existing Makima Features
+
+### Contract Files
+
+Each finding is stored as a **contract file**:
+
+```rust
+File {
+    contract_id: Some(contract.id),
+    contract_phase: Some("review"),
+    name: "Finding: SEC-001 — SQL injection in user search",
+    description: Some(serde_json::to_string(&finding_metadata)?),
+    body: vec![
+        BodyElement::Heading { level: 1, text: finding.title },
+        BodyElement::Heading { level: 2, text: "Finding" },
+        BodyElement::Paragraph { text: finding.description },
+        BodyElement::Heading { level: 2, text: "Evidence" },
+        BodyElement::Code { language: Some("rust"), content: finding.evidence },
+        BodyElement::Heading { level: 2, text: "Recommendation" },
+        BodyElement::Paragraph { text: finding.recommendation },
+    ],
+}
+```
+
+### Phase Guards
+
+Findings integrate with existing phase guards:
+- Phase guard checks finding gate before allowing transition
+- User sees a summary of open findings when reviewing phase transition
+- P1 findings produce a warning that requires explicit override
+
+### Supervisor Questions
+
+When P1 findings block a transition, the supervisor can ask:
+
+```bash
+makima supervisor ask \
+  "2 P1 findings are still open. How would you like to proceed?" \
+  --choices "Fix findings first,Override and continue,Mark as won't-fix" \
+  --context "SEC-001: SQL injection (P1), PERF-001: Memory leak (P1)"
+```
+
+### Task Assignment
+
+Findings reference tasks:
+- `source_task_id`: The review task that discovered the finding
+- `assigned_to`: The task spawned to resolve the finding
+
+```bash
+# Spawn a fix task and assign the finding
+makima supervisor spawn "fix-sec-001" \
+  --plan "Fix SQL injection vulnerability in src/api/users.rs:47. Use parameterized queries."
+
+makima supervisor finding update SEC-001 \
+  --status in-progress \
+  --assigned-to <spawned-task-id>
+```
+
+### Autonomous Loop
+
+The autonomous loop can use findings as a completion gate condition:
+
+```xml
+<COMPLETION_GATE>
+ready: false
+reason: "2 P1 findings still open"
+progress: "Resolved 5/7 findings"
+blockers: ["SEC-001: SQL injection", "PERF-001: Memory leak"]
+</COMPLETION_GATE>
+```
+
+---
+
+## Implementation Plan
+
+### Phase 1: Core Finding System (3-4 days)
+
+| Task | Effort | Description |
+|------|--------|-------------|
+| Finding metadata schema | 0.5 days | FindingMetadata struct, validation |
+| `finding create` command | 1 day | Create finding as contract file |
+| `finding list/summary` commands | 0.5 days | Query and display findings |
+| `finding update` command | 0.5 days | Status transitions, validation |
+| Auto-ID generation | 0.5 days | Category-based IDs (SEC-001, PERF-002) |
+
+### Phase 2: Integration (2-3 days)
+
+| Task | Effort | Description |
+|------|--------|-------------|
+| Phase guard integration | 0.5 days | Check P1 findings before transition |
+| `finding parse-output` | 1 day | Parse review task output into findings |
+| Merge blocking logic | 0.5 days | Block merge with open P1s |
+| Finding assignment to tasks | 0.5 days | Track resolution via task ID |
+
+### Phase 3: Automation & Polish (2-3 days)
+
+| Task | Effort | Description |
+|------|--------|-------------|
+| `finding auto-resolve` | 1 day | Spawn fix tasks per finding |
+| Verification workflow | 0.5 days | Re-review to verify fixes |
+| Finding reports | 0.5 days | Summary contract file |
+| Documentation | 0.5 days | User guide |
+| Tests | 0.5 days | Unit + integration |
+
+---
+
+## Configuration Examples
+
+### Finding Creation in Review Agent Output
+
+Review agents produce structured findings in their output:
+
+```markdown
+## FINDING: SQL Injection in User Search
+
+- **Severity**: P1
+- **Category**: security
+- **File**: src/api/users.rs
+- **Line**: 47
+- **Tags**: injection, input-validation, database
+
+### Description
+The `search_users` handler directly interpolates the `query` parameter...
+
+### Evidence
+```rust
+let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query);
+```
+
+### Recommendation
+Use parameterized queries with sqlx::query().bind()
+```
+
+The synthesis step parses these into formal Finding records.
+
+### Merge Blocking Configuration
+
+```yaml
+# .makima/review-agents.yaml (or contract config)
+review:
+  findings:
+    merge_blocking_severity: P1     # P1 blocks merge
+    require_justification: P2       # P2 needs justification to defer
+    auto_resolve: true              # Spawn fix tasks for P1/P2
+    auto_resolve_severity: P1,P2    # Which severities to auto-resolve
+    verification:
+      enabled: true                 # Re-review after resolution
+      re_review_agents:             # Which agents verify fixes
+        - security-sentinel         # Security findings verified by security agent
+```
+
+### Finding Lifecycle Example
+
+```bash
+# 1. Review creates finding
+makima supervisor finding create --severity P1 --category security \
+  --title "SQL injection in user search" --file src/api/users.rs --line 47
+
+# 2. Auto-resolve spawns fix task
+makima supervisor finding auto-resolve --severity P1
+# → Spawns task "fix-SEC-001" with plan based on finding details
+
+# 3. Fix task completes, finding auto-updated
+# finding SEC-001: open → in-progress → resolved
+
+# 4. Verification re-reviews the fix
+makima supervisor finding verify SEC-001
+# → Spawns verification task targeting the specific file/line
+
+# 5. Verification passes
+# finding SEC-001: resolved → verified
+
+# 6. Phase transition allowed
+makima supervisor advance-phase compound -y
+```
+
+---
+
+## Open Questions
+
+1. **Finding storage**: Contract files vs. dedicated findings table in the database? Contract files are simpler but querying is less efficient.
+2. **Cross-contract findings**: Should findings persist across contracts? (e.g., a P2 deferred from one contract carries to the next)
+3. **Finding templates**: Should common finding types have templates? (e.g., "SQL injection" pre-fills category, severity, recommendation)
+4. **External integration**: Should findings be exportable to GitHub Issues, Jira, or other issue trackers?
+5. **Metric tracking**: How granular should finding metrics be? Per-contract? Per-repository? Per-category?
+6. **False positive handling**: How should agents indicate confidence level? Should low-confidence findings be automatically P3?
+
+---
+
+## Alternatives Considered
+
+| Alternative | Pros | Cons | Decision |
+|-------------|------|------|----------|
+| GitHub Issues integration | Rich UI, collaboration | External dependency; not all projects use GitHub | Deferred — consider as export target |
+| Plain text findings | Simple | Not queryable, no lifecycle | Rejected — defeats the purpose |
+| Dedicated findings DB table | Fast queries, rich indexing | New infrastructure, migration | Recommended for v2 |
+| Contract file-based | Uses existing infrastructure | Slower queries for large sets | Adopted for v1 |
+| Inline code comments | Close to code | Lost on next commit; hard to track | Rejected — not persistent |
+
+---
+
+## Priority & Complexity Assessment
+
+- **Priority: MEDIUM** — Structured findings transform the review phase from documentation to a quality gate. Essential for the Multi-Agent Review feature to produce actionable output.
+- **Complexity: LOW** — Finding records are simple structured data. Lifecycle state machine is straightforward. Main integration point (phase guards) already exists.
+- **Risk: LOW** — Purely additive feature. Worst case: findings exist but aren't used (same as today). Can be adopted incrementally.