path: root/docs/proposals/feature-findings-tracking.md



# Feature Proposal: Structured Findings / Issues Tracking

> **Priority:** Medium
> **Complexity:** Low
> **Estimated Effort:** 7-10 days
> **Status:** Proposal
> **Date:** 2026-02-09
> **Dependencies:** None (standalone, but enhances [Multi-Agent Review](feature-multi-agent-review.md))
> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Multi-Agent Review](feature-multi-agent-review.md) · [Workflow Presets](feature-workflow-presets.md)

---

## Problem Statement

Currently, review outputs in makima are **unstructured text** in task conversation history:

- **No standard format** for reporting issues found during review
- **No severity classification** — all findings are treated equally
- **No lifecycle tracking** — findings are either "in the review output" or "hopefully fixed"
- **No verification** — there's no way to confirm a finding was actually resolved
- **No aggregation** — findings from multiple review tasks can't be collected and deduplicated
- **No blocking mechanism** — critical findings can't prevent phase transitions
- **No metrics** — no data on how many findings are produced, resolved, or escaped

This makes the review phase a documentation exercise rather than a quality gate.

---

## How Compound Engineering Solves This

The compound engineering plugin uses **structured TODO/finding files** with YAML frontmatter and a defined lifecycle:

### File Format

```markdown
---
id: SEC-001
status: open
priority: P1
category: security
title: SQL injection in user search endpoint
file: src/api/users.rs
line: 47
agent: security-sentinel
created: 2026-02-09T10:30:00Z
updated: 2026-02-09T10:30:00Z
tags: [injection, input-validation, database]
---

# SQL Injection in User Search Endpoint

## Finding
The `search_users` handler directly interpolates the `query` parameter into
a SQL string without parameterization.

## Evidence
```rust
// src/api/users.rs:47
let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query);
```

## Impact
An attacker can execute arbitrary SQL queries, potentially:
- Exfiltrating all user data
- Modifying or deleting records
- Escalating privileges

## Recommendation
Use parameterized queries:
```rust
let results = sqlx::query("SELECT * FROM users WHERE name LIKE $1")
    .bind(format!("%{}%", query))
    .fetch_all(&pool)
    .await?;
```

## Resolution
_Not yet resolved_
```

### File Naming Convention

```
findings/{issue_id}-{status}-{priority}-{description}.md
```

Example: `findings/SEC-001-open-P1-sql-injection-user-search.md`

### Lifecycle

```
open ──▶ in-progress ──▶ resolved ──▶ verified
  │                         │
  └── wont-fix ◀────────────┘
```

---

## Proposed Makima Implementation

### 1. Finding Record Format

Findings are stored as **contract files** with structured metadata and body:

```rust
// Finding metadata (stored in file description as structured JSON)
#[derive(Serialize, Deserialize)]
pub struct FindingMetadata {
    pub id: String,                    // "SEC-001", auto-generated
    pub status: FindingStatus,         // open, in_progress, resolved, verified, wont_fix
    pub severity: FindingSeverity,     // P1 (critical), P2 (major), P3 (minor)
    pub category: String,             // security, performance, architecture, etc.
    pub title: String,                // Short description
    pub file_path: Option<String>,    // Affected file
    pub line_number: Option<u32>,     // Affected line
    pub source_agent: Option<String>, // Which review agent found this
    pub source_task_id: Option<Uuid>, // Task that produced this finding
    pub assigned_to: Option<Uuid>,    // Task assigned to resolve this
    pub created_at: DateTime<Utc>,
    pub updated_at: DateTime<Utc>,
    pub resolved_at: Option<DateTime<Utc>>,
    pub verified_at: Option<DateTime<Utc>>,
    pub tags: Vec<String>,
}

pub enum FindingStatus {
    Open,
    InProgress,
    Resolved,
    Verified,
    WontFix,
}

pub enum FindingSeverity {
    P1,  // Critical — must fix before merge
    P2,  // Major — should fix, can defer with justification
    P3,  // Minor — nice to fix, can defer
}
```

### 2. Supervisor Commands

#### Create a Finding

```bash
# Create a finding from review output
makima supervisor finding create \
  --severity P1 \
  --category security \
  --title "SQL injection in user search endpoint" \
  --file src/api/users.rs \
  --line 47 \
  --description "Direct string interpolation in SQL query"

# Output: Created finding SEC-001 (P1/security)
```

#### List Findings

```bash
# List all findings for the current contract
makima supervisor finding list
# Output:
# ID       SEVERITY  STATUS       CATEGORY      TITLE
# SEC-001  P1        open         security      SQL injection in user search
# PERF-001 P2        in-progress  performance   N+1 query in order listing
# ARCH-001 P3        resolved     architecture  Handler accessing DB directly

# Filter by severity
makima supervisor finding list --severity P1

# Filter by status
makima supervisor finding list --status open

# Summary only
makima supervisor finding summary
# Output:
# Total: 12 findings
# P1: 2 open, 1 resolved
# P2: 3 open, 2 in-progress
# P3: 4 resolved
```

#### Update Finding Status

```bash
# Mark as in-progress (assigned to a task)
makima supervisor finding update SEC-001 --status in-progress --assigned-to <task-id>

# Mark as resolved
makima supervisor finding update SEC-001 --status resolved \
  --resolution "Replaced with parameterized query in commit abc123"

# Mark as verified (after re-review)
makima supervisor finding update SEC-001 --status verified

# Mark as won't fix
makima supervisor finding update SEC-001 --status wont-fix \
  --justification "Endpoint is internal-only, behind auth"
```

#### Auto-Create from Review Output

```bash
# Parse review agent output and create findings automatically
makima supervisor finding parse-output --task-id <review-task-id>
```

This parses structured review output and creates individual finding records.

### 3. Finding Lifecycle

```
┌────────────────────────────────────────────────────────────┐
│                    Finding Lifecycle                        │
│                                                            │
│  ┌──────┐    ┌─────────────┐    ┌──────────┐              │
│  │      │    │             │    │          │              │
│  │ OPEN │───▶│ IN-PROGRESS │───▶│ RESOLVED │              │
│  │      │    │             │    │          │              │
│  └──┬───┘    └─────────────┘    └────┬─────┘              │
│     │                                │                     │
│     │        ┌─────────────┐    ┌────┴─────┐              │
│     │        │             │    │          │              │
│     └───────▶│  WONT-FIX   │    │ VERIFIED │              │
│              │             │    │          │              │
│              └─────────────┘    └──────────┘              │
│                                                            │
│  Triggers:                                                 │
│  open ─▶ in_progress : Task assigned to fix                │
│  in_progress ─▶ resolved : Fix committed                   │
│  resolved ─▶ verified : Re-review confirms fix             │
│  open ─▶ wont_fix : Explicit decision with justification   │
│  resolved ─▶ wont_fix : Fix deemed unnecessary after review│
└────────────────────────────────────────────────────────────┘
```

### 4. P1/P2/P3 Severity System

| Severity | Name | Description | Merge Policy |
|----------|------|-------------|--------------|
| **P1** | Critical | Security vulnerabilities, data loss risks, crash bugs | **Blocks merge** — must be resolved before contract completion |
| **P2** | Major | Performance issues, architectural concerns, significant tech debt | **Should fix** — can defer with explicit justification |
| **P3** | Minor | Style issues, minor improvements, documentation gaps | **Nice to fix** — can defer freely |

### 5. Merge Blocking

When findings exist, phase transitions and merge operations check for blockers:

```rust
// In advance-phase handler
async fn check_findings_gate(contract_id: Uuid) -> Result<bool> {
    let findings = get_findings(contract_id).await?;
    let open_p1s = findings.iter()
        .filter(|f| f.severity == P1 && f.status == Open)
        .count();

    if open_p1s > 0 {
        warn!("{} open P1 findings block phase transition", open_p1s);
        return Ok(false);
    }
    Ok(true)
}
```

### 6. Auto-Resolution Workflow

When the Multi-Agent Review feature is available, findings drive an automated resolution cycle:

```
┌──────────┐     ┌───────────┐     ┌──────────┐     ┌──────────┐
│  Review  │────▶│ Findings  │────▶│ Resolve  │────▶│ Verify   │
│  Phase   │     │ Created   │     │ Tasks    │     │ Fixes    │
│          │     │ (P1/P2/P3)│     │ Spawned  │     │ Pass?    │
└──────────┘     └───────────┘     └──────────┘     └────┬─────┘
                                                         │
                                                    Yes  │  No
                                                    ┌────┴────┐
                                                    ▼         ▼
                                              ┌──────────┐  Loop back
                                              │ Findings │  to resolve
                                              │ Verified │
                                              └──────────┘
```

```bash
# Auto-resolve: spawn tasks to fix each P1/P2 finding
makima supervisor finding auto-resolve --severity P1,P2

# This spawns one task per finding:
# - Task plan includes the finding details and recommendation
# - Task is assigned to the finding (finding.assigned_to = task.id)
# - When task completes, finding status → resolved
# - Verification task confirms the fix
```

---

## Integration with Existing Makima Features

### Contract Files

Each finding is stored as a **contract file**:

```rust
File {
    contract_id: Some(contract.id),
    contract_phase: Some("review"),
    name: "Finding: SEC-001 — SQL injection in user search",
    description: Some(serde_json::to_string(&finding_metadata)?),
    body: vec![
        BodyElement::Heading { level: 1, text: finding.title },
        BodyElement::Heading { level: 2, text: "Finding" },
        BodyElement::Paragraph { text: finding.description },
        BodyElement::Heading { level: 2, text: "Evidence" },
        BodyElement::Code { language: Some("rust"), content: finding.evidence },
        BodyElement::Heading { level: 2, text: "Recommendation" },
        BodyElement::Paragraph { text: finding.recommendation },
    ],
}
```

### Phase Guards

Findings integrate with existing phase guards:
- Phase guard checks finding gate before allowing transition
- User sees a summary of open findings when reviewing phase transition
- P1 findings produce a warning that requires explicit override

### Supervisor Questions

When P1 findings block a transition, the supervisor can ask:

```bash
makima supervisor ask \
  "2 P1 findings are still open. How would you like to proceed?" \
  --choices "Fix findings first,Override and continue,Mark as won't-fix" \
  --context "SEC-001: SQL injection (P1), PERF-001: Memory leak (P1)"
```

### Task Assignment

Findings reference tasks:
- `source_task_id`: The review task that discovered the finding
- `assigned_to`: The task spawned to resolve the finding

```bash
# Spawn a fix task and assign the finding
makima supervisor spawn "fix-sec-001" \
  --plan "Fix SQL injection vulnerability in src/api/users.rs:47. Use parameterized queries."

makima supervisor finding update SEC-001 \
  --status in-progress \
  --assigned-to <spawned-task-id>
```

### Autonomous Loop

The autonomous loop can use findings as a completion gate condition:

```xml
<COMPLETION_GATE>
ready: false
reason: "2 P1 findings still open"
progress: "Resolved 5/7 findings"
blockers: ["SEC-001: SQL injection", "PERF-001: Memory leak"]
</COMPLETION_GATE>
```

---

## Implementation Plan

### Phase 1: Core Finding System (3-4 days)

| Task | Effort | Description |
|------|--------|-------------|
| Finding metadata schema | 0.5 days | FindingMetadata struct, validation |
| `finding create` command | 1 day | Create finding as contract file |
| `finding list/summary` commands | 0.5 days | Query and display findings |
| `finding update` command | 0.5 days | Status transitions, validation |
| Auto-ID generation | 0.5 days | Category-based IDs (SEC-001, PERF-002) |

### Phase 2: Integration (2-3 days)

| Task | Effort | Description |
|------|--------|-------------|
| Phase guard integration | 0.5 days | Check P1 findings before transition |
| `finding parse-output` | 1 day | Parse review task output into findings |
| Merge blocking logic | 0.5 days | Block merge with open P1s |
| Finding assignment to tasks | 0.5 days | Track resolution via task ID |

### Phase 3: Automation & Polish (2-3 days)

| Task | Effort | Description |
|------|--------|-------------|
| `finding auto-resolve` | 1 day | Spawn fix tasks per finding |
| Verification workflow | 0.5 days | Re-review to verify fixes |
| Finding reports | 0.5 days | Summary contract file |
| Documentation | 0.5 days | User guide |
| Tests | 0.5 days | Unit + integration |

---

## Configuration Examples

### Finding Creation in Review Agent Output

Review agents produce structured findings in their output:

```markdown
## FINDING: SQL Injection in User Search

- **Severity**: P1
- **Category**: security
- **File**: src/api/users.rs
- **Line**: 47
- **Tags**: injection, input-validation, database

### Description
The `search_users` handler directly interpolates the `query` parameter...

### Evidence
```rust
let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query);
```

### Recommendation
Use parameterized queries with sqlx::query().bind()
```

The synthesis step parses these into formal Finding records.

### Merge Blocking Configuration

```yaml
# .makima/review-agents.yaml (or contract config)
review:
  findings:
    merge_blocking_severity: P1     # P1 blocks merge
    require_justification: P2       # P2 needs justification to defer
    auto_resolve: true              # Spawn fix tasks for P1/P2
    auto_resolve_severity: P1,P2    # Which severities to auto-resolve
    verification:
      enabled: true                 # Re-review after resolution
      re_review_agents:             # Which agents verify fixes
        - security-sentinel         # Security findings verified by security agent
```

### Finding Lifecycle Example

```bash
# 1. Review creates finding
makima supervisor finding create --severity P1 --category security \
  --title "SQL injection in user search" --file src/api/users.rs --line 47

# 2. Auto-resolve spawns fix task
makima supervisor finding auto-resolve --severity P1
# → Spawns task "fix-SEC-001" with plan based on finding details

# 3. Fix task completes, finding auto-updated
# finding SEC-001: open → in-progress → resolved

# 4. Verification re-reviews the fix
makima supervisor finding verify SEC-001
# → Spawns verification task targeting the specific file/line

# 5. Verification passes
# finding SEC-001: resolved → verified

# 6. Phase transition allowed
makima supervisor advance-phase compound -y
```

---

## Open Questions

1. **Finding storage**: Contract files vs. dedicated findings table in the database? Contract files are simpler but querying is less efficient.
2. **Cross-contract findings**: Should findings persist across contracts? (e.g., a P2 deferred from one contract carries to the next)
3. **Finding templates**: Should common finding types have templates? (e.g., "SQL injection" pre-fills category, severity, recommendation)
4. **External integration**: Should findings be exportable to GitHub Issues, Jira, or other issue trackers?
5. **Metric tracking**: How granular should finding metrics be? Per-contract? Per-repository? Per-category?
6. **False positive handling**: How should agents indicate confidence level? Should low-confidence findings be automatically P3?

---

## Alternatives Considered

| Alternative | Pros | Cons | Decision |
|-------------|------|------|----------|
| GitHub Issues integration | Rich UI, collaboration | External dependency; not all projects use GitHub | Deferred — consider as export target |
| Plain text findings | Simple | Not queryable, no lifecycle | Rejected — defeats the purpose |
| Dedicated findings DB table | Fast queries, rich indexing | New infrastructure, migration | Recommended for v2 |
| Contract file-based | Uses existing infrastructure | Slower queries for large sets | Adopted for v1 |
| Inline code comments | Close to code | Lost on next commit; hard to track | Rejected — not persistent |

---

## Priority & Complexity Assessment

- **Priority: MEDIUM** — Structured findings transform the review phase from documentation to a quality gate. Essential for the Multi-Agent Review feature to produce actionable output.
- **Complexity: LOW** — Finding records are simple structured data. Lifecycle state machine is straightforward. Main integration point (phase guards) already exists.
- **Risk: LOW** — Purely additive feature. Worst case: findings exist but aren't used (same as today). Can be adopted incrementally.