# Feature Proposal: Structured Findings / Issues Tracking > **Priority:** Medium > **Complexity:** Low > **Estimated Effort:** 7-10 days > **Status:** Proposal > **Date:** 2026-02-09 > **Dependencies:** None (standalone, but enhances [Multi-Agent Review](feature-multi-agent-review.md)) > **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Multi-Agent Review](feature-multi-agent-review.md) · [Workflow Presets](feature-workflow-presets.md) --- ## Problem Statement Currently, review outputs in makima are **unstructured text** in task conversation history: - **No standard format** for reporting issues found during review - **No severity classification** — all findings are treated equally - **No lifecycle tracking** — findings are either "in the review output" or "hopefully fixed" - **No verification** — there's no way to confirm a finding was actually resolved - **No aggregation** — findings from multiple review tasks can't be collected and deduplicated - **No blocking mechanism** — critical findings can't prevent phase transitions - **No metrics** — no data on how many findings are produced, resolved, or escaped This makes the review phase a documentation exercise rather than a quality gate. --- ## How Compound Engineering Solves This The compound engineering plugin uses **structured TODO/finding files** with YAML frontmatter and a defined lifecycle: ### File Format ```markdown --- id: SEC-001 status: open priority: P1 category: security title: SQL injection in user search endpoint file: src/api/users.rs line: 47 agent: security-sentinel created: 2026-02-09T10:30:00Z updated: 2026-02-09T10:30:00Z tags: [injection, input-validation, database] --- # SQL Injection in User Search Endpoint ## Finding The `search_users` handler directly interpolates the `query` parameter into a SQL string without parameterization. ## Evidence ```rust // src/api/users.rs:47 let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query); ``` ## Impact An attacker can execute arbitrary SQL queries, potentially: - Exfiltrating all user data - Modifying or deleting records - Escalating privileges ## Recommendation Use parameterized queries: ```rust let results = sqlx::query("SELECT * FROM users WHERE name LIKE $1") .bind(format!("%{}%", query)) .fetch_all(&pool) .await?; ``` ## Resolution _Not yet resolved_ ``` ### File Naming Convention ``` findings/{issue_id}-{status}-{priority}-{description}.md ``` Example: `findings/SEC-001-open-P1-sql-injection-user-search.md` ### Lifecycle ``` open ──▶ in-progress ──▶ resolved ──▶ verified │ │ └── wont-fix ◀────────────┘ ``` --- ## Proposed Makima Implementation ### 1. Finding Record Format Findings are stored as **contract files** with structured metadata and body: ```rust // Finding metadata (stored in file description as structured JSON) #[derive(Serialize, Deserialize)] pub struct FindingMetadata { pub id: String, // "SEC-001", auto-generated pub status: FindingStatus, // open, in_progress, resolved, verified, wont_fix pub severity: FindingSeverity, // P1 (critical), P2 (major), P3 (minor) pub category: String, // security, performance, architecture, etc. pub title: String, // Short description pub file_path: Option, // Affected file pub line_number: Option, // Affected line pub source_agent: Option, // Which review agent found this pub source_task_id: Option, // Task that produced this finding pub assigned_to: Option, // Task assigned to resolve this pub created_at: DateTime, pub updated_at: DateTime, pub resolved_at: Option>, pub verified_at: Option>, pub tags: Vec, } pub enum FindingStatus { Open, InProgress, Resolved, Verified, WontFix, } pub enum FindingSeverity { P1, // Critical — must fix before merge P2, // Major — should fix, can defer with justification P3, // Minor — nice to fix, can defer } ``` ### 2. Supervisor Commands #### Create a Finding ```bash # Create a finding from review output makima supervisor finding create \ --severity P1 \ --category security \ --title "SQL injection in user search endpoint" \ --file src/api/users.rs \ --line 47 \ --description "Direct string interpolation in SQL query" # Output: Created finding SEC-001 (P1/security) ``` #### List Findings ```bash # List all findings for the current contract makima supervisor finding list # Output: # ID SEVERITY STATUS CATEGORY TITLE # SEC-001 P1 open security SQL injection in user search # PERF-001 P2 in-progress performance N+1 query in order listing # ARCH-001 P3 resolved architecture Handler accessing DB directly # Filter by severity makima supervisor finding list --severity P1 # Filter by status makima supervisor finding list --status open # Summary only makima supervisor finding summary # Output: # Total: 12 findings # P1: 2 open, 1 resolved # P2: 3 open, 2 in-progress # P3: 4 resolved ``` #### Update Finding Status ```bash # Mark as in-progress (assigned to a task) makima supervisor finding update SEC-001 --status in-progress --assigned-to # Mark as resolved makima supervisor finding update SEC-001 --status resolved \ --resolution "Replaced with parameterized query in commit abc123" # Mark as verified (after re-review) makima supervisor finding update SEC-001 --status verified # Mark as won't fix makima supervisor finding update SEC-001 --status wont-fix \ --justification "Endpoint is internal-only, behind auth" ``` #### Auto-Create from Review Output ```bash # Parse review agent output and create findings automatically makima supervisor finding parse-output --task-id ``` This parses structured review output and creates individual finding records. ### 3. Finding Lifecycle ``` ┌────────────────────────────────────────────────────────────┐ │ Finding Lifecycle │ │ │ │ ┌──────┐ ┌─────────────┐ ┌──────────┐ │ │ │ │ │ │ │ │ │ │ │ OPEN │───▶│ IN-PROGRESS │───▶│ RESOLVED │ │ │ │ │ │ │ │ │ │ │ └──┬───┘ └─────────────┘ └────┬─────┘ │ │ │ │ │ │ │ ┌─────────────┐ ┌────┴─────┐ │ │ │ │ │ │ │ │ │ └───────▶│ WONT-FIX │ │ VERIFIED │ │ │ │ │ │ │ │ │ └─────────────┘ └──────────┘ │ │ │ │ Triggers: │ │ open ─▶ in_progress : Task assigned to fix │ │ in_progress ─▶ resolved : Fix committed │ │ resolved ─▶ verified : Re-review confirms fix │ │ open ─▶ wont_fix : Explicit decision with justification │ │ resolved ─▶ wont_fix : Fix deemed unnecessary after review│ └────────────────────────────────────────────────────────────┘ ``` ### 4. P1/P2/P3 Severity System | Severity | Name | Description | Merge Policy | |----------|------|-------------|--------------| | **P1** | Critical | Security vulnerabilities, data loss risks, crash bugs | **Blocks merge** — must be resolved before contract completion | | **P2** | Major | Performance issues, architectural concerns, significant tech debt | **Should fix** — can defer with explicit justification | | **P3** | Minor | Style issues, minor improvements, documentation gaps | **Nice to fix** — can defer freely | ### 5. Merge Blocking When findings exist, phase transitions and merge operations check for blockers: ```rust // In advance-phase handler async fn check_findings_gate(contract_id: Uuid) -> Result { let findings = get_findings(contract_id).await?; let open_p1s = findings.iter() .filter(|f| f.severity == P1 && f.status == Open) .count(); if open_p1s > 0 { warn!("{} open P1 findings block phase transition", open_p1s); return Ok(false); } Ok(true) } ``` ### 6. Auto-Resolution Workflow When the Multi-Agent Review feature is available, findings drive an automated resolution cycle: ``` ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐ │ Review │────▶│ Findings │────▶│ Resolve │────▶│ Verify │ │ Phase │ │ Created │ │ Tasks │ │ Fixes │ │ │ │ (P1/P2/P3)│ │ Spawned │ │ Pass? │ └──────────┘ └───────────┘ └──────────┘ └────┬─────┘ │ Yes │ No ┌────┴────┐ ▼ ▼ ┌──────────┐ Loop back │ Findings │ to resolve │ Verified │ └──────────┘ ``` ```bash # Auto-resolve: spawn tasks to fix each P1/P2 finding makima supervisor finding auto-resolve --severity P1,P2 # This spawns one task per finding: # - Task plan includes the finding details and recommendation # - Task is assigned to the finding (finding.assigned_to = task.id) # - When task completes, finding status → resolved # - Verification task confirms the fix ``` --- ## Integration with Existing Makima Features ### Contract Files Each finding is stored as a **contract file**: ```rust File { contract_id: Some(contract.id), contract_phase: Some("review"), name: "Finding: SEC-001 — SQL injection in user search", description: Some(serde_json::to_string(&finding_metadata)?), body: vec![ BodyElement::Heading { level: 1, text: finding.title }, BodyElement::Heading { level: 2, text: "Finding" }, BodyElement::Paragraph { text: finding.description }, BodyElement::Heading { level: 2, text: "Evidence" }, BodyElement::Code { language: Some("rust"), content: finding.evidence }, BodyElement::Heading { level: 2, text: "Recommendation" }, BodyElement::Paragraph { text: finding.recommendation }, ], } ``` ### Phase Guards Findings integrate with existing phase guards: - Phase guard checks finding gate before allowing transition - User sees a summary of open findings when reviewing phase transition - P1 findings produce a warning that requires explicit override ### Supervisor Questions When P1 findings block a transition, the supervisor can ask: ```bash makima supervisor ask \ "2 P1 findings are still open. How would you like to proceed?" \ --choices "Fix findings first,Override and continue,Mark as won't-fix" \ --context "SEC-001: SQL injection (P1), PERF-001: Memory leak (P1)" ``` ### Task Assignment Findings reference tasks: - `source_task_id`: The review task that discovered the finding - `assigned_to`: The task spawned to resolve the finding ```bash # Spawn a fix task and assign the finding makima supervisor spawn "fix-sec-001" \ --plan "Fix SQL injection vulnerability in src/api/users.rs:47. Use parameterized queries." makima supervisor finding update SEC-001 \ --status in-progress \ --assigned-to ``` ### Autonomous Loop The autonomous loop can use findings as a completion gate condition: ```xml ready: false reason: "2 P1 findings still open" progress: "Resolved 5/7 findings" blockers: ["SEC-001: SQL injection", "PERF-001: Memory leak"] ``` --- ## Implementation Plan ### Phase 1: Core Finding System (3-4 days) | Task | Effort | Description | |------|--------|-------------| | Finding metadata schema | 0.5 days | FindingMetadata struct, validation | | `finding create` command | 1 day | Create finding as contract file | | `finding list/summary` commands | 0.5 days | Query and display findings | | `finding update` command | 0.5 days | Status transitions, validation | | Auto-ID generation | 0.5 days | Category-based IDs (SEC-001, PERF-002) | ### Phase 2: Integration (2-3 days) | Task | Effort | Description | |------|--------|-------------| | Phase guard integration | 0.5 days | Check P1 findings before transition | | `finding parse-output` | 1 day | Parse review task output into findings | | Merge blocking logic | 0.5 days | Block merge with open P1s | | Finding assignment to tasks | 0.5 days | Track resolution via task ID | ### Phase 3: Automation & Polish (2-3 days) | Task | Effort | Description | |------|--------|-------------| | `finding auto-resolve` | 1 day | Spawn fix tasks per finding | | Verification workflow | 0.5 days | Re-review to verify fixes | | Finding reports | 0.5 days | Summary contract file | | Documentation | 0.5 days | User guide | | Tests | 0.5 days | Unit + integration | --- ## Configuration Examples ### Finding Creation in Review Agent Output Review agents produce structured findings in their output: ```markdown ## FINDING: SQL Injection in User Search - **Severity**: P1 - **Category**: security - **File**: src/api/users.rs - **Line**: 47 - **Tags**: injection, input-validation, database ### Description The `search_users` handler directly interpolates the `query` parameter... ### Evidence ```rust let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query); ``` ### Recommendation Use parameterized queries with sqlx::query().bind() ``` The synthesis step parses these into formal Finding records. ### Merge Blocking Configuration ```yaml # .makima/review-agents.yaml (or contract config) review: findings: merge_blocking_severity: P1 # P1 blocks merge require_justification: P2 # P2 needs justification to defer auto_resolve: true # Spawn fix tasks for P1/P2 auto_resolve_severity: P1,P2 # Which severities to auto-resolve verification: enabled: true # Re-review after resolution re_review_agents: # Which agents verify fixes - security-sentinel # Security findings verified by security agent ``` ### Finding Lifecycle Example ```bash # 1. Review creates finding makima supervisor finding create --severity P1 --category security \ --title "SQL injection in user search" --file src/api/users.rs --line 47 # 2. Auto-resolve spawns fix task makima supervisor finding auto-resolve --severity P1 # → Spawns task "fix-SEC-001" with plan based on finding details # 3. Fix task completes, finding auto-updated # finding SEC-001: open → in-progress → resolved # 4. Verification re-reviews the fix makima supervisor finding verify SEC-001 # → Spawns verification task targeting the specific file/line # 5. Verification passes # finding SEC-001: resolved → verified # 6. Phase transition allowed makima supervisor advance-phase compound -y ``` --- ## Open Questions 1. **Finding storage**: Contract files vs. dedicated findings table in the database? Contract files are simpler but querying is less efficient. 2. **Cross-contract findings**: Should findings persist across contracts? (e.g., a P2 deferred from one contract carries to the next) 3. **Finding templates**: Should common finding types have templates? (e.g., "SQL injection" pre-fills category, severity, recommendation) 4. **External integration**: Should findings be exportable to GitHub Issues, Jira, or other issue trackers? 5. **Metric tracking**: How granular should finding metrics be? Per-contract? Per-repository? Per-category? 6. **False positive handling**: How should agents indicate confidence level? Should low-confidence findings be automatically P3? --- ## Alternatives Considered | Alternative | Pros | Cons | Decision | |-------------|------|------|----------| | GitHub Issues integration | Rich UI, collaboration | External dependency; not all projects use GitHub | Deferred — consider as export target | | Plain text findings | Simple | Not queryable, no lifecycle | Rejected — defeats the purpose | | Dedicated findings DB table | Fast queries, rich indexing | New infrastructure, migration | Recommended for v2 | | Contract file-based | Uses existing infrastructure | Slower queries for large sets | Adopted for v1 | | Inline code comments | Close to code | Lost on next commit; hard to track | Rejected — not persistent | --- ## Priority & Complexity Assessment - **Priority: MEDIUM** — Structured findings transform the review phase from documentation to a quality gate. Essential for the Multi-Agent Review feature to produce actionable output. - **Complexity: LOW** — Finding records are simple structured data. Lifecycle state machine is straightforward. Main integration point (phase guards) already exists. - **Risk: LOW** — Purely additive feature. Worst case: findings exist but aren't used (same as today). Can be adopted incrementally.