diff options
Diffstat (limited to 'docs/proposals/feature-findings-tracking.md')
| -rw-r--r-- | docs/proposals/feature-findings-tracking.md | 504 |
1 files changed, 504 insertions, 0 deletions
diff --git a/docs/proposals/feature-findings-tracking.md b/docs/proposals/feature-findings-tracking.md new file mode 100644 index 0000000..bb8a68e --- /dev/null +++ b/docs/proposals/feature-findings-tracking.md @@ -0,0 +1,504 @@ +# Feature Proposal: Structured Findings / Issues Tracking + +> **Priority:** Medium +> **Complexity:** Low +> **Estimated Effort:** 7-10 days +> **Status:** Proposal +> **Date:** 2026-02-09 +> **Dependencies:** None (standalone, but enhances [Multi-Agent Review](feature-multi-agent-review.md)) +> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Multi-Agent Review](feature-multi-agent-review.md) · [Workflow Presets](feature-workflow-presets.md) + +--- + +## Problem Statement + +Currently, review outputs in makima are **unstructured text** in task conversation history: + +- **No standard format** for reporting issues found during review +- **No severity classification** — all findings are treated equally +- **No lifecycle tracking** — findings are either "in the review output" or "hopefully fixed" +- **No verification** — there's no way to confirm a finding was actually resolved +- **No aggregation** — findings from multiple review tasks can't be collected and deduplicated +- **No blocking mechanism** — critical findings can't prevent phase transitions +- **No metrics** — no data on how many findings are produced, resolved, or escaped + +This makes the review phase a documentation exercise rather than a quality gate. + +--- + +## How Compound Engineering Solves This + +The compound engineering plugin uses **structured TODO/finding files** with YAML frontmatter and a defined lifecycle: + +### File Format + +```markdown +--- +id: SEC-001 +status: open +priority: P1 +category: security +title: SQL injection in user search endpoint +file: src/api/users.rs +line: 47 +agent: security-sentinel +created: 2026-02-09T10:30:00Z +updated: 2026-02-09T10:30:00Z +tags: [injection, input-validation, database] +--- + +# SQL Injection in User Search Endpoint + +## Finding +The `search_users` handler directly interpolates the `query` parameter into +a SQL string without parameterization. + +## Evidence +```rust +// src/api/users.rs:47 +let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query); +``` + +## Impact +An attacker can execute arbitrary SQL queries, potentially: +- Exfiltrating all user data +- Modifying or deleting records +- Escalating privileges + +## Recommendation +Use parameterized queries: +```rust +let results = sqlx::query("SELECT * FROM users WHERE name LIKE $1") + .bind(format!("%{}%", query)) + .fetch_all(&pool) + .await?; +``` + +## Resolution +_Not yet resolved_ +``` + +### File Naming Convention + +``` +findings/{issue_id}-{status}-{priority}-{description}.md +``` + +Example: `findings/SEC-001-open-P1-sql-injection-user-search.md` + +### Lifecycle + +``` +open ──▶ in-progress ──▶ resolved ──▶ verified + │ │ + └── wont-fix ◀────────────┘ +``` + +--- + +## Proposed Makima Implementation + +### 1. Finding Record Format + +Findings are stored as **contract files** with structured metadata and body: + +```rust +// Finding metadata (stored in file description as structured JSON) +#[derive(Serialize, Deserialize)] +pub struct FindingMetadata { + pub id: String, // "SEC-001", auto-generated + pub status: FindingStatus, // open, in_progress, resolved, verified, wont_fix + pub severity: FindingSeverity, // P1 (critical), P2 (major), P3 (minor) + pub category: String, // security, performance, architecture, etc. + pub title: String, // Short description + pub file_path: Option<String>, // Affected file + pub line_number: Option<u32>, // Affected line + pub source_agent: Option<String>, // Which review agent found this + pub source_task_id: Option<Uuid>, // Task that produced this finding + pub assigned_to: Option<Uuid>, // Task assigned to resolve this + pub created_at: DateTime<Utc>, + pub updated_at: DateTime<Utc>, + pub resolved_at: Option<DateTime<Utc>>, + pub verified_at: Option<DateTime<Utc>>, + pub tags: Vec<String>, +} + +pub enum FindingStatus { + Open, + InProgress, + Resolved, + Verified, + WontFix, +} + +pub enum FindingSeverity { + P1, // Critical — must fix before merge + P2, // Major — should fix, can defer with justification + P3, // Minor — nice to fix, can defer +} +``` + +### 2. Supervisor Commands + +#### Create a Finding + +```bash +# Create a finding from review output +makima supervisor finding create \ + --severity P1 \ + --category security \ + --title "SQL injection in user search endpoint" \ + --file src/api/users.rs \ + --line 47 \ + --description "Direct string interpolation in SQL query" + +# Output: Created finding SEC-001 (P1/security) +``` + +#### List Findings + +```bash +# List all findings for the current contract +makima supervisor finding list +# Output: +# ID SEVERITY STATUS CATEGORY TITLE +# SEC-001 P1 open security SQL injection in user search +# PERF-001 P2 in-progress performance N+1 query in order listing +# ARCH-001 P3 resolved architecture Handler accessing DB directly + +# Filter by severity +makima supervisor finding list --severity P1 + +# Filter by status +makima supervisor finding list --status open + +# Summary only +makima supervisor finding summary +# Output: +# Total: 12 findings +# P1: 2 open, 1 resolved +# P2: 3 open, 2 in-progress +# P3: 4 resolved +``` + +#### Update Finding Status + +```bash +# Mark as in-progress (assigned to a task) +makima supervisor finding update SEC-001 --status in-progress --assigned-to <task-id> + +# Mark as resolved +makima supervisor finding update SEC-001 --status resolved \ + --resolution "Replaced with parameterized query in commit abc123" + +# Mark as verified (after re-review) +makima supervisor finding update SEC-001 --status verified + +# Mark as won't fix +makima supervisor finding update SEC-001 --status wont-fix \ + --justification "Endpoint is internal-only, behind auth" +``` + +#### Auto-Create from Review Output + +```bash +# Parse review agent output and create findings automatically +makima supervisor finding parse-output --task-id <review-task-id> +``` + +This parses structured review output and creates individual finding records. + +### 3. Finding Lifecycle + +``` +┌────────────────────────────────────────────────────────────┐ +│ Finding Lifecycle │ +│ │ +│ ┌──────┐ ┌─────────────┐ ┌──────────┐ │ +│ │ │ │ │ │ │ │ +│ │ OPEN │───▶│ IN-PROGRESS │───▶│ RESOLVED │ │ +│ │ │ │ │ │ │ │ +│ └──┬───┘ └─────────────┘ └────┬─────┘ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌────┴─────┐ │ +│ │ │ │ │ │ │ +│ └───────▶│ WONT-FIX │ │ VERIFIED │ │ +│ │ │ │ │ │ +│ └─────────────┘ └──────────┘ │ +│ │ +│ Triggers: │ +│ open ─▶ in_progress : Task assigned to fix │ +│ in_progress ─▶ resolved : Fix committed │ +│ resolved ─▶ verified : Re-review confirms fix │ +│ open ─▶ wont_fix : Explicit decision with justification │ +│ resolved ─▶ wont_fix : Fix deemed unnecessary after review│ +└────────────────────────────────────────────────────────────┘ +``` + +### 4. P1/P2/P3 Severity System + +| Severity | Name | Description | Merge Policy | +|----------|------|-------------|--------------| +| **P1** | Critical | Security vulnerabilities, data loss risks, crash bugs | **Blocks merge** — must be resolved before contract completion | +| **P2** | Major | Performance issues, architectural concerns, significant tech debt | **Should fix** — can defer with explicit justification | +| **P3** | Minor | Style issues, minor improvements, documentation gaps | **Nice to fix** — can defer freely | + +### 5. Merge Blocking + +When findings exist, phase transitions and merge operations check for blockers: + +```rust +// In advance-phase handler +async fn check_findings_gate(contract_id: Uuid) -> Result<bool> { + let findings = get_findings(contract_id).await?; + let open_p1s = findings.iter() + .filter(|f| f.severity == P1 && f.status == Open) + .count(); + + if open_p1s > 0 { + warn!("{} open P1 findings block phase transition", open_p1s); + return Ok(false); + } + Ok(true) +} +``` + +### 6. Auto-Resolution Workflow + +When the Multi-Agent Review feature is available, findings drive an automated resolution cycle: + +``` +┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐ +│ Review │────▶│ Findings │────▶│ Resolve │────▶│ Verify │ +│ Phase │ │ Created │ │ Tasks │ │ Fixes │ +│ │ │ (P1/P2/P3)│ │ Spawned │ │ Pass? │ +└──────────┘ └───────────┘ └──────────┘ └────┬─────┘ + │ + Yes │ No + ┌────┴────┐ + ▼ ▼ + ┌──────────┐ Loop back + │ Findings │ to resolve + │ Verified │ + └──────────┘ +``` + +```bash +# Auto-resolve: spawn tasks to fix each P1/P2 finding +makima supervisor finding auto-resolve --severity P1,P2 + +# This spawns one task per finding: +# - Task plan includes the finding details and recommendation +# - Task is assigned to the finding (finding.assigned_to = task.id) +# - When task completes, finding status → resolved +# - Verification task confirms the fix +``` + +--- + +## Integration with Existing Makima Features + +### Contract Files + +Each finding is stored as a **contract file**: + +```rust +File { + contract_id: Some(contract.id), + contract_phase: Some("review"), + name: "Finding: SEC-001 — SQL injection in user search", + description: Some(serde_json::to_string(&finding_metadata)?), + body: vec![ + BodyElement::Heading { level: 1, text: finding.title }, + BodyElement::Heading { level: 2, text: "Finding" }, + BodyElement::Paragraph { text: finding.description }, + BodyElement::Heading { level: 2, text: "Evidence" }, + BodyElement::Code { language: Some("rust"), content: finding.evidence }, + BodyElement::Heading { level: 2, text: "Recommendation" }, + BodyElement::Paragraph { text: finding.recommendation }, + ], +} +``` + +### Phase Guards + +Findings integrate with existing phase guards: +- Phase guard checks finding gate before allowing transition +- User sees a summary of open findings when reviewing phase transition +- P1 findings produce a warning that requires explicit override + +### Supervisor Questions + +When P1 findings block a transition, the supervisor can ask: + +```bash +makima supervisor ask \ + "2 P1 findings are still open. How would you like to proceed?" \ + --choices "Fix findings first,Override and continue,Mark as won't-fix" \ + --context "SEC-001: SQL injection (P1), PERF-001: Memory leak (P1)" +``` + +### Task Assignment + +Findings reference tasks: +- `source_task_id`: The review task that discovered the finding +- `assigned_to`: The task spawned to resolve the finding + +```bash +# Spawn a fix task and assign the finding +makima supervisor spawn "fix-sec-001" \ + --plan "Fix SQL injection vulnerability in src/api/users.rs:47. Use parameterized queries." + +makima supervisor finding update SEC-001 \ + --status in-progress \ + --assigned-to <spawned-task-id> +``` + +### Autonomous Loop + +The autonomous loop can use findings as a completion gate condition: + +```xml +<COMPLETION_GATE> +ready: false +reason: "2 P1 findings still open" +progress: "Resolved 5/7 findings" +blockers: ["SEC-001: SQL injection", "PERF-001: Memory leak"] +</COMPLETION_GATE> +``` + +--- + +## Implementation Plan + +### Phase 1: Core Finding System (3-4 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Finding metadata schema | 0.5 days | FindingMetadata struct, validation | +| `finding create` command | 1 day | Create finding as contract file | +| `finding list/summary` commands | 0.5 days | Query and display findings | +| `finding update` command | 0.5 days | Status transitions, validation | +| Auto-ID generation | 0.5 days | Category-based IDs (SEC-001, PERF-002) | + +### Phase 2: Integration (2-3 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Phase guard integration | 0.5 days | Check P1 findings before transition | +| `finding parse-output` | 1 day | Parse review task output into findings | +| Merge blocking logic | 0.5 days | Block merge with open P1s | +| Finding assignment to tasks | 0.5 days | Track resolution via task ID | + +### Phase 3: Automation & Polish (2-3 days) + +| Task | Effort | Description | +|------|--------|-------------| +| `finding auto-resolve` | 1 day | Spawn fix tasks per finding | +| Verification workflow | 0.5 days | Re-review to verify fixes | +| Finding reports | 0.5 days | Summary contract file | +| Documentation | 0.5 days | User guide | +| Tests | 0.5 days | Unit + integration | + +--- + +## Configuration Examples + +### Finding Creation in Review Agent Output + +Review agents produce structured findings in their output: + +```markdown +## FINDING: SQL Injection in User Search + +- **Severity**: P1 +- **Category**: security +- **File**: src/api/users.rs +- **Line**: 47 +- **Tags**: injection, input-validation, database + +### Description +The `search_users` handler directly interpolates the `query` parameter... + +### Evidence +```rust +let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query); +``` + +### Recommendation +Use parameterized queries with sqlx::query().bind() +``` + +The synthesis step parses these into formal Finding records. + +### Merge Blocking Configuration + +```yaml +# .makima/review-agents.yaml (or contract config) +review: + findings: + merge_blocking_severity: P1 # P1 blocks merge + require_justification: P2 # P2 needs justification to defer + auto_resolve: true # Spawn fix tasks for P1/P2 + auto_resolve_severity: P1,P2 # Which severities to auto-resolve + verification: + enabled: true # Re-review after resolution + re_review_agents: # Which agents verify fixes + - security-sentinel # Security findings verified by security agent +``` + +### Finding Lifecycle Example + +```bash +# 1. Review creates finding +makima supervisor finding create --severity P1 --category security \ + --title "SQL injection in user search" --file src/api/users.rs --line 47 + +# 2. Auto-resolve spawns fix task +makima supervisor finding auto-resolve --severity P1 +# → Spawns task "fix-SEC-001" with plan based on finding details + +# 3. Fix task completes, finding auto-updated +# finding SEC-001: open → in-progress → resolved + +# 4. Verification re-reviews the fix +makima supervisor finding verify SEC-001 +# → Spawns verification task targeting the specific file/line + +# 5. Verification passes +# finding SEC-001: resolved → verified + +# 6. Phase transition allowed +makima supervisor advance-phase compound -y +``` + +--- + +## Open Questions + +1. **Finding storage**: Contract files vs. dedicated findings table in the database? Contract files are simpler but querying is less efficient. +2. **Cross-contract findings**: Should findings persist across contracts? (e.g., a P2 deferred from one contract carries to the next) +3. **Finding templates**: Should common finding types have templates? (e.g., "SQL injection" pre-fills category, severity, recommendation) +4. **External integration**: Should findings be exportable to GitHub Issues, Jira, or other issue trackers? +5. **Metric tracking**: How granular should finding metrics be? Per-contract? Per-repository? Per-category? +6. **False positive handling**: How should agents indicate confidence level? Should low-confidence findings be automatically P3? + +--- + +## Alternatives Considered + +| Alternative | Pros | Cons | Decision | +|-------------|------|------|----------| +| GitHub Issues integration | Rich UI, collaboration | External dependency; not all projects use GitHub | Deferred — consider as export target | +| Plain text findings | Simple | Not queryable, no lifecycle | Rejected — defeats the purpose | +| Dedicated findings DB table | Fast queries, rich indexing | New infrastructure, migration | Recommended for v2 | +| Contract file-based | Uses existing infrastructure | Slower queries for large sets | Adopted for v1 | +| Inline code comments | Close to code | Lost on next commit; hard to track | Rejected — not persistent | + +--- + +## Priority & Complexity Assessment + +- **Priority: MEDIUM** — Structured findings transform the review phase from documentation to a quality gate. Essential for the Multi-Agent Review feature to produce actionable output. +- **Complexity: LOW** — Finding records are simple structured data. Lifecycle state machine is straightforward. Main integration point (phase guards) already exists. +- **Risk: LOW** — Purely additive feature. Worst case: findings exist but aren't used (same as today). Can be adopted incrementally. |
