summaryrefslogblamecommitdiff
path: root/docs/proposals/feature-findings-tracking.md
blob: bb8a68e797da0769f1a545b4ef882189cea4b439 (plain) (tree)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504























































































































































































































































































































































































































































































































                                                                                                                                                                                          
# Feature Proposal: Structured Findings / Issues Tracking

> **Priority:** Medium
> **Complexity:** Low
> **Estimated Effort:** 7-10 days
> **Status:** Proposal
> **Date:** 2026-02-09
> **Dependencies:** None (standalone, but enhances [Multi-Agent Review](feature-multi-agent-review.md))
> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Multi-Agent Review](feature-multi-agent-review.md) · [Workflow Presets](feature-workflow-presets.md)

---

## Problem Statement

Currently, review outputs in makima are **unstructured text** in task conversation history:

- **No standard format** for reporting issues found during review
- **No severity classification** — all findings are treated equally
- **No lifecycle tracking** — findings are either "in the review output" or "hopefully fixed"
- **No verification** — there's no way to confirm a finding was actually resolved
- **No aggregation** — findings from multiple review tasks can't be collected and deduplicated
- **No blocking mechanism** — critical findings can't prevent phase transitions
- **No metrics** — no data on how many findings are produced, resolved, or escaped

This makes the review phase a documentation exercise rather than a quality gate.

---

## How Compound Engineering Solves This

The compound engineering plugin uses **structured TODO/finding files** with YAML frontmatter and a defined lifecycle:

### File Format

```markdown
---
id: SEC-001
status: open
priority: P1
category: security
title: SQL injection in user search endpoint
file: src/api/users.rs
line: 47
agent: security-sentinel
created: 2026-02-09T10:30:00Z
updated: 2026-02-09T10:30:00Z
tags: [injection, input-validation, database]
---

# SQL Injection in User Search Endpoint

## Finding
The `search_users` handler directly interpolates the `query` parameter into
a SQL string without parameterization.

## Evidence
```rust
// src/api/users.rs:47
let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query);
```

## Impact
An attacker can execute arbitrary SQL queries, potentially:
- Exfiltrating all user data
- Modifying or deleting records
- Escalating privileges

## Recommendation
Use parameterized queries:
```rust
let results = sqlx::query("SELECT * FROM users WHERE name LIKE $1")
    .bind(format!("%{}%", query))
    .fetch_all(&pool)
    .await?;
```

## Resolution
_Not yet resolved_
```

### File Naming Convention

```
findings/{issue_id}-{status}-{priority}-{description}.md
```

Example: `findings/SEC-001-open-P1-sql-injection-user-search.md`

### Lifecycle

```
open ──▶ in-progress ──▶ resolved ──▶ verified
  │                         │
  └── wont-fix ◀────────────┘
```

---

## Proposed Makima Implementation

### 1. Finding Record Format

Findings are stored as **contract files** with structured metadata and body:

```rust
// Finding metadata (stored in file description as structured JSON)
#[derive(Serialize, Deserialize)]
pub struct FindingMetadata {
    pub id: String,                    // "SEC-001", auto-generated
    pub status: FindingStatus,         // open, in_progress, resolved, verified, wont_fix
    pub severity: FindingSeverity,     // P1 (critical), P2 (major), P3 (minor)
    pub category: String,             // security, performance, architecture, etc.
    pub title: String,                // Short description
    pub file_path: Option<String>,    // Affected file
    pub line_number: Option<u32>,     // Affected line
    pub source_agent: Option<String>, // Which review agent found this
    pub source_task_id: Option<Uuid>, // Task that produced this finding
    pub assigned_to: Option<Uuid>,    // Task assigned to resolve this
    pub created_at: DateTime<Utc>,
    pub updated_at: DateTime<Utc>,
    pub resolved_at: Option<DateTime<Utc>>,
    pub verified_at: Option<DateTime<Utc>>,
    pub tags: Vec<String>,
}

pub enum FindingStatus {
    Open,
    InProgress,
    Resolved,
    Verified,
    WontFix,
}

pub enum FindingSeverity {
    P1,  // Critical — must fix before merge
    P2,  // Major — should fix, can defer with justification
    P3,  // Minor — nice to fix, can defer
}
```

### 2. Supervisor Commands

#### Create a Finding

```bash
# Create a finding from review output
makima supervisor finding create \
  --severity P1 \
  --category security \
  --title "SQL injection in user search endpoint" \
  --file src/api/users.rs \
  --line 47 \
  --description "Direct string interpolation in SQL query"

# Output: Created finding SEC-001 (P1/security)
```

#### List Findings

```bash
# List all findings for the current contract
makima supervisor finding list
# Output:
# ID       SEVERITY  STATUS       CATEGORY      TITLE
# SEC-001  P1        open         security      SQL injection in user search
# PERF-001 P2        in-progress  performance   N+1 query in order listing
# ARCH-001 P3        resolved     architecture  Handler accessing DB directly

# Filter by severity
makima supervisor finding list --severity P1

# Filter by status
makima supervisor finding list --status open

# Summary only
makima supervisor finding summary
# Output:
# Total: 12 findings
# P1: 2 open, 1 resolved
# P2: 3 open, 2 in-progress
# P3: 4 resolved
```

#### Update Finding Status

```bash
# Mark as in-progress (assigned to a task)
makima supervisor finding update SEC-001 --status in-progress --assigned-to <task-id>

# Mark as resolved
makima supervisor finding update SEC-001 --status resolved \
  --resolution "Replaced with parameterized query in commit abc123"

# Mark as verified (after re-review)
makima supervisor finding update SEC-001 --status verified

# Mark as won't fix
makima supervisor finding update SEC-001 --status wont-fix \
  --justification "Endpoint is internal-only, behind auth"
```

#### Auto-Create from Review Output

```bash
# Parse review agent output and create findings automatically
makima supervisor finding parse-output --task-id <review-task-id>
```

This parses structured review output and creates individual finding records.

### 3. Finding Lifecycle

```
┌────────────────────────────────────────────────────────────┐
│                    Finding Lifecycle                        │
│                                                            │
│  ┌──────┐    ┌─────────────┐    ┌──────────┐              │
│  │      │    │             │    │          │              │
│  │ OPEN │───▶│ IN-PROGRESS │───▶│ RESOLVED │              │
│  │      │    │             │    │          │              │
│  └──┬───┘    └─────────────┘    └────┬─────┘              │
│     │                                │                     │
│     │        ┌─────────────┐    ┌────┴─────┐              │
│     │        │             │    │          │              │
│     └───────▶│  WONT-FIX   │    │ VERIFIED │              │
│              │             │    │          │              │
│              └─────────────┘    └──────────┘              │
│                                                            │
│  Triggers:                                                 │
│  open ─▶ in_progress : Task assigned to fix                │
│  in_progress ─▶ resolved : Fix committed                   │
│  resolved ─▶ verified : Re-review confirms fix             │
│  open ─▶ wont_fix : Explicit decision with justification   │
│  resolved ─▶ wont_fix : Fix deemed unnecessary after review│
└────────────────────────────────────────────────────────────┘
```

### 4. P1/P2/P3 Severity System

| Severity | Name | Description | Merge Policy |
|----------|------|-------------|--------------|
| **P1** | Critical | Security vulnerabilities, data loss risks, crash bugs | **Blocks merge** — must be resolved before contract completion |
| **P2** | Major | Performance issues, architectural concerns, significant tech debt | **Should fix** — can defer with explicit justification |
| **P3** | Minor | Style issues, minor improvements, documentation gaps | **Nice to fix** — can defer freely |

### 5. Merge Blocking

When findings exist, phase transitions and merge operations check for blockers:

```rust
// In advance-phase handler
async fn check_findings_gate(contract_id: Uuid) -> Result<bool> {
    let findings = get_findings(contract_id).await?;
    let open_p1s = findings.iter()
        .filter(|f| f.severity == P1 && f.status == Open)
        .count();

    if open_p1s > 0 {
        warn!("{} open P1 findings block phase transition", open_p1s);
        return Ok(false);
    }
    Ok(true)
}
```

### 6. Auto-Resolution Workflow

When the Multi-Agent Review feature is available, findings drive an automated resolution cycle:

```
┌──────────┐     ┌───────────┐     ┌──────────┐     ┌──────────┐
│  Review  │────▶│ Findings  │────▶│ Resolve  │────▶│ Verify   │
│  Phase   │     │ Created   │     │ Tasks    │     │ Fixes    │
│          │     │ (P1/P2/P3)│     │ Spawned  │     │ Pass?    │
└──────────┘     └───────────┘     └──────────┘     └────┬─────┘
                                                         │
                                                    Yes  │  No
                                                    ┌────┴────┐
                                                    ▼         ▼
                                              ┌──────────┐  Loop back
                                              │ Findings │  to resolve
                                              │ Verified │
                                              └──────────┘
```

```bash
# Auto-resolve: spawn tasks to fix each P1/P2 finding
makima supervisor finding auto-resolve --severity P1,P2

# This spawns one task per finding:
# - Task plan includes the finding details and recommendation
# - Task is assigned to the finding (finding.assigned_to = task.id)
# - When task completes, finding status → resolved
# - Verification task confirms the fix
```

---

## Integration with Existing Makima Features

### Contract Files

Each finding is stored as a **contract file**:

```rust
File {
    contract_id: Some(contract.id),
    contract_phase: Some("review"),
    name: "Finding: SEC-001 — SQL injection in user search",
    description: Some(serde_json::to_string(&finding_metadata)?),
    body: vec![
        BodyElement::Heading { level: 1, text: finding.title },
        BodyElement::Heading { level: 2, text: "Finding" },
        BodyElement::Paragraph { text: finding.description },
        BodyElement::Heading { level: 2, text: "Evidence" },
        BodyElement::Code { language: Some("rust"), content: finding.evidence },
        BodyElement::Heading { level: 2, text: "Recommendation" },
        BodyElement::Paragraph { text: finding.recommendation },
    ],
}
```

### Phase Guards

Findings integrate with existing phase guards:
- Phase guard checks finding gate before allowing transition
- User sees a summary of open findings when reviewing phase transition
- P1 findings produce a warning that requires explicit override

### Supervisor Questions

When P1 findings block a transition, the supervisor can ask:

```bash
makima supervisor ask \
  "2 P1 findings are still open. How would you like to proceed?" \
  --choices "Fix findings first,Override and continue,Mark as won't-fix" \
  --context "SEC-001: SQL injection (P1), PERF-001: Memory leak (P1)"
```

### Task Assignment

Findings reference tasks:
- `source_task_id`: The review task that discovered the finding
- `assigned_to`: The task spawned to resolve the finding

```bash
# Spawn a fix task and assign the finding
makima supervisor spawn "fix-sec-001" \
  --plan "Fix SQL injection vulnerability in src/api/users.rs:47. Use parameterized queries."

makima supervisor finding update SEC-001 \
  --status in-progress \
  --assigned-to <spawned-task-id>
```

### Autonomous Loop

The autonomous loop can use findings as a completion gate condition:

```xml
<COMPLETION_GATE>
ready: false
reason: "2 P1 findings still open"
progress: "Resolved 5/7 findings"
blockers: ["SEC-001: SQL injection", "PERF-001: Memory leak"]
</COMPLETION_GATE>
```

---

## Implementation Plan

### Phase 1: Core Finding System (3-4 days)

| Task | Effort | Description |
|------|--------|-------------|
| Finding metadata schema | 0.5 days | FindingMetadata struct, validation |
| `finding create` command | 1 day | Create finding as contract file |
| `finding list/summary` commands | 0.5 days | Query and display findings |
| `finding update` command | 0.5 days | Status transitions, validation |
| Auto-ID generation | 0.5 days | Category-based IDs (SEC-001, PERF-002) |

### Phase 2: Integration (2-3 days)

| Task | Effort | Description |
|------|--------|-------------|
| Phase guard integration | 0.5 days | Check P1 findings before transition |
| `finding parse-output` | 1 day | Parse review task output into findings |
| Merge blocking logic | 0.5 days | Block merge with open P1s |
| Finding assignment to tasks | 0.5 days | Track resolution via task ID |

### Phase 3: Automation & Polish (2-3 days)

| Task | Effort | Description |
|------|--------|-------------|
| `finding auto-resolve` | 1 day | Spawn fix tasks per finding |
| Verification workflow | 0.5 days | Re-review to verify fixes |
| Finding reports | 0.5 days | Summary contract file |
| Documentation | 0.5 days | User guide |
| Tests | 0.5 days | Unit + integration |

---

## Configuration Examples

### Finding Creation in Review Agent Output

Review agents produce structured findings in their output:

```markdown
## FINDING: SQL Injection in User Search

- **Severity**: P1
- **Category**: security
- **File**: src/api/users.rs
- **Line**: 47
- **Tags**: injection, input-validation, database

### Description
The `search_users` handler directly interpolates the `query` parameter...

### Evidence
```rust
let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query);
```

### Recommendation
Use parameterized queries with sqlx::query().bind()
```

The synthesis step parses these into formal Finding records.

### Merge Blocking Configuration

```yaml
# .makima/review-agents.yaml (or contract config)
review:
  findings:
    merge_blocking_severity: P1     # P1 blocks merge
    require_justification: P2       # P2 needs justification to defer
    auto_resolve: true              # Spawn fix tasks for P1/P2
    auto_resolve_severity: P1,P2    # Which severities to auto-resolve
    verification:
      enabled: true                 # Re-review after resolution
      re_review_agents:             # Which agents verify fixes
        - security-sentinel         # Security findings verified by security agent
```

### Finding Lifecycle Example

```bash
# 1. Review creates finding
makima supervisor finding create --severity P1 --category security \
  --title "SQL injection in user search" --file src/api/users.rs --line 47

# 2. Auto-resolve spawns fix task
makima supervisor finding auto-resolve --severity P1
# → Spawns task "fix-SEC-001" with plan based on finding details

# 3. Fix task completes, finding auto-updated
# finding SEC-001: open → in-progress → resolved

# 4. Verification re-reviews the fix
makima supervisor finding verify SEC-001
# → Spawns verification task targeting the specific file/line

# 5. Verification passes
# finding SEC-001: resolved → verified

# 6. Phase transition allowed
makima supervisor advance-phase compound -y
```

---

## Open Questions

1. **Finding storage**: Contract files vs. dedicated findings table in the database? Contract files are simpler but querying is less efficient.
2. **Cross-contract findings**: Should findings persist across contracts? (e.g., a P2 deferred from one contract carries to the next)
3. **Finding templates**: Should common finding types have templates? (e.g., "SQL injection" pre-fills category, severity, recommendation)
4. **External integration**: Should findings be exportable to GitHub Issues, Jira, or other issue trackers?
5. **Metric tracking**: How granular should finding metrics be? Per-contract? Per-repository? Per-category?
6. **False positive handling**: How should agents indicate confidence level? Should low-confidence findings be automatically P3?

---

## Alternatives Considered

| Alternative | Pros | Cons | Decision |
|-------------|------|------|----------|
| GitHub Issues integration | Rich UI, collaboration | External dependency; not all projects use GitHub | Deferred — consider as export target |
| Plain text findings | Simple | Not queryable, no lifecycle | Rejected — defeats the purpose |
| Dedicated findings DB table | Fast queries, rich indexing | New infrastructure, migration | Recommended for v2 |
| Contract file-based | Uses existing infrastructure | Slower queries for large sets | Adopted for v1 |
| Inline code comments | Close to code | Lost on next commit; hard to track | Rejected — not persistent |

---

## Priority & Complexity Assessment

- **Priority: MEDIUM** — Structured findings transform the review phase from documentation to a quality gate. Essential for the Multi-Agent Review feature to produce actionable output.
- **Complexity: LOW** — Finding records are simple structured data. Lifecycle state machine is straightforward. Main integration point (phase guards) already exists.
- **Risk: LOW** — Purely additive feature. Worst case: findings exist but aren't used (same as today). Can be adopted incrementally.