summaryrefslogtreecommitdiff
path: root/docs/proposals/feature-findings-tracking.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/proposals/feature-findings-tracking.md')
-rw-r--r--docs/proposals/feature-findings-tracking.md504
1 files changed, 504 insertions, 0 deletions
diff --git a/docs/proposals/feature-findings-tracking.md b/docs/proposals/feature-findings-tracking.md
new file mode 100644
index 0000000..bb8a68e
--- /dev/null
+++ b/docs/proposals/feature-findings-tracking.md
@@ -0,0 +1,504 @@
+# Feature Proposal: Structured Findings / Issues Tracking
+
+> **Priority:** Medium
+> **Complexity:** Low
+> **Estimated Effort:** 7-10 days
+> **Status:** Proposal
+> **Date:** 2026-02-09
+> **Dependencies:** None (standalone, but enhances [Multi-Agent Review](feature-multi-agent-review.md))
+> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Multi-Agent Review](feature-multi-agent-review.md) · [Workflow Presets](feature-workflow-presets.md)
+
+---
+
+## Problem Statement
+
+Currently, review outputs in makima are **unstructured text** in task conversation history:
+
+- **No standard format** for reporting issues found during review
+- **No severity classification** — all findings are treated equally
+- **No lifecycle tracking** — findings are either "in the review output" or "hopefully fixed"
+- **No verification** — there's no way to confirm a finding was actually resolved
+- **No aggregation** — findings from multiple review tasks can't be collected and deduplicated
+- **No blocking mechanism** — critical findings can't prevent phase transitions
+- **No metrics** — no data on how many findings are produced, resolved, or escaped
+
+This makes the review phase a documentation exercise rather than a quality gate.
+
+---
+
+## How Compound Engineering Solves This
+
+The compound engineering plugin uses **structured TODO/finding files** with YAML frontmatter and a defined lifecycle:
+
+### File Format
+
+```markdown
+---
+id: SEC-001
+status: open
+priority: P1
+category: security
+title: SQL injection in user search endpoint
+file: src/api/users.rs
+line: 47
+agent: security-sentinel
+created: 2026-02-09T10:30:00Z
+updated: 2026-02-09T10:30:00Z
+tags: [injection, input-validation, database]
+---
+
+# SQL Injection in User Search Endpoint
+
+## Finding
+The `search_users` handler directly interpolates the `query` parameter into
+a SQL string without parameterization.
+
+## Evidence
+```rust
+// src/api/users.rs:47
+let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query);
+```
+
+## Impact
+An attacker can execute arbitrary SQL queries, potentially:
+- Exfiltrating all user data
+- Modifying or deleting records
+- Escalating privileges
+
+## Recommendation
+Use parameterized queries:
+```rust
+let results = sqlx::query("SELECT * FROM users WHERE name LIKE $1")
+ .bind(format!("%{}%", query))
+ .fetch_all(&pool)
+ .await?;
+```
+
+## Resolution
+_Not yet resolved_
+```
+
+### File Naming Convention
+
+```
+findings/{issue_id}-{status}-{priority}-{description}.md
+```
+
+Example: `findings/SEC-001-open-P1-sql-injection-user-search.md`
+
+### Lifecycle
+
+```
+open ──▶ in-progress ──▶ resolved ──▶ verified
+ │ │
+ └── wont-fix ◀────────────┘
+```
+
+---
+
+## Proposed Makima Implementation
+
+### 1. Finding Record Format
+
+Findings are stored as **contract files** with structured metadata and body:
+
+```rust
+// Finding metadata (stored in file description as structured JSON)
+#[derive(Serialize, Deserialize)]
+pub struct FindingMetadata {
+ pub id: String, // "SEC-001", auto-generated
+ pub status: FindingStatus, // open, in_progress, resolved, verified, wont_fix
+ pub severity: FindingSeverity, // P1 (critical), P2 (major), P3 (minor)
+ pub category: String, // security, performance, architecture, etc.
+ pub title: String, // Short description
+ pub file_path: Option<String>, // Affected file
+ pub line_number: Option<u32>, // Affected line
+ pub source_agent: Option<String>, // Which review agent found this
+ pub source_task_id: Option<Uuid>, // Task that produced this finding
+ pub assigned_to: Option<Uuid>, // Task assigned to resolve this
+ pub created_at: DateTime<Utc>,
+ pub updated_at: DateTime<Utc>,
+ pub resolved_at: Option<DateTime<Utc>>,
+ pub verified_at: Option<DateTime<Utc>>,
+ pub tags: Vec<String>,
+}
+
+pub enum FindingStatus {
+ Open,
+ InProgress,
+ Resolved,
+ Verified,
+ WontFix,
+}
+
+pub enum FindingSeverity {
+ P1, // Critical — must fix before merge
+ P2, // Major — should fix, can defer with justification
+ P3, // Minor — nice to fix, can defer
+}
+```
+
+### 2. Supervisor Commands
+
+#### Create a Finding
+
+```bash
+# Create a finding from review output
+makima supervisor finding create \
+ --severity P1 \
+ --category security \
+ --title "SQL injection in user search endpoint" \
+ --file src/api/users.rs \
+ --line 47 \
+ --description "Direct string interpolation in SQL query"
+
+# Output: Created finding SEC-001 (P1/security)
+```
+
+#### List Findings
+
+```bash
+# List all findings for the current contract
+makima supervisor finding list
+# Output:
+# ID SEVERITY STATUS CATEGORY TITLE
+# SEC-001 P1 open security SQL injection in user search
+# PERF-001 P2 in-progress performance N+1 query in order listing
+# ARCH-001 P3 resolved architecture Handler accessing DB directly
+
+# Filter by severity
+makima supervisor finding list --severity P1
+
+# Filter by status
+makima supervisor finding list --status open
+
+# Summary only
+makima supervisor finding summary
+# Output:
+# Total: 12 findings
+# P1: 2 open, 1 resolved
+# P2: 3 open, 2 in-progress
+# P3: 4 resolved
+```
+
+#### Update Finding Status
+
+```bash
+# Mark as in-progress (assigned to a task)
+makima supervisor finding update SEC-001 --status in-progress --assigned-to <task-id>
+
+# Mark as resolved
+makima supervisor finding update SEC-001 --status resolved \
+ --resolution "Replaced with parameterized query in commit abc123"
+
+# Mark as verified (after re-review)
+makima supervisor finding update SEC-001 --status verified
+
+# Mark as won't fix
+makima supervisor finding update SEC-001 --status wont-fix \
+ --justification "Endpoint is internal-only, behind auth"
+```
+
+#### Auto-Create from Review Output
+
+```bash
+# Parse review agent output and create findings automatically
+makima supervisor finding parse-output --task-id <review-task-id>
+```
+
+This parses structured review output and creates individual finding records.
+
+### 3. Finding Lifecycle
+
+```
+┌────────────────────────────────────────────────────────────┐
+│ Finding Lifecycle │
+│ │
+│ ┌──────┐ ┌─────────────┐ ┌──────────┐ │
+│ │ │ │ │ │ │ │
+│ │ OPEN │───▶│ IN-PROGRESS │───▶│ RESOLVED │ │
+│ │ │ │ │ │ │ │
+│ └──┬───┘ └─────────────┘ └────┬─────┘ │
+│ │ │ │
+│ │ ┌─────────────┐ ┌────┴─────┐ │
+│ │ │ │ │ │ │
+│ └───────▶│ WONT-FIX │ │ VERIFIED │ │
+│ │ │ │ │ │
+│ └─────────────┘ └──────────┘ │
+│ │
+│ Triggers: │
+│ open ─▶ in_progress : Task assigned to fix │
+│ in_progress ─▶ resolved : Fix committed │
+│ resolved ─▶ verified : Re-review confirms fix │
+│ open ─▶ wont_fix : Explicit decision with justification │
+│ resolved ─▶ wont_fix : Fix deemed unnecessary after review│
+└────────────────────────────────────────────────────────────┘
+```
+
+### 4. P1/P2/P3 Severity System
+
+| Severity | Name | Description | Merge Policy |
+|----------|------|-------------|--------------|
+| **P1** | Critical | Security vulnerabilities, data loss risks, crash bugs | **Blocks merge** — must be resolved before contract completion |
+| **P2** | Major | Performance issues, architectural concerns, significant tech debt | **Should fix** — can defer with explicit justification |
+| **P3** | Minor | Style issues, minor improvements, documentation gaps | **Nice to fix** — can defer freely |
+
+### 5. Merge Blocking
+
+When findings exist, phase transitions and merge operations check for blockers:
+
+```rust
+// In advance-phase handler
+async fn check_findings_gate(contract_id: Uuid) -> Result<bool> {
+ let findings = get_findings(contract_id).await?;
+ let open_p1s = findings.iter()
+ .filter(|f| f.severity == P1 && f.status == Open)
+ .count();
+
+ if open_p1s > 0 {
+ warn!("{} open P1 findings block phase transition", open_p1s);
+ return Ok(false);
+ }
+ Ok(true)
+}
+```
+
+### 6. Auto-Resolution Workflow
+
+When the Multi-Agent Review feature is available, findings drive an automated resolution cycle:
+
+```
+┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐
+│ Review │────▶│ Findings │────▶│ Resolve │────▶│ Verify │
+│ Phase │ │ Created │ │ Tasks │ │ Fixes │
+│ │ │ (P1/P2/P3)│ │ Spawned │ │ Pass? │
+└──────────┘ └───────────┘ └──────────┘ └────┬─────┘
+ │
+ Yes │ No
+ ┌────┴────┐
+ ▼ ▼
+ ┌──────────┐ Loop back
+ │ Findings │ to resolve
+ │ Verified │
+ └──────────┘
+```
+
+```bash
+# Auto-resolve: spawn tasks to fix each P1/P2 finding
+makima supervisor finding auto-resolve --severity P1,P2
+
+# This spawns one task per finding:
+# - Task plan includes the finding details and recommendation
+# - Task is assigned to the finding (finding.assigned_to = task.id)
+# - When task completes, finding status → resolved
+# - Verification task confirms the fix
+```
+
+---
+
+## Integration with Existing Makima Features
+
+### Contract Files
+
+Each finding is stored as a **contract file**:
+
+```rust
+File {
+ contract_id: Some(contract.id),
+ contract_phase: Some("review"),
+ name: "Finding: SEC-001 — SQL injection in user search",
+ description: Some(serde_json::to_string(&finding_metadata)?),
+ body: vec![
+ BodyElement::Heading { level: 1, text: finding.title },
+ BodyElement::Heading { level: 2, text: "Finding" },
+ BodyElement::Paragraph { text: finding.description },
+ BodyElement::Heading { level: 2, text: "Evidence" },
+ BodyElement::Code { language: Some("rust"), content: finding.evidence },
+ BodyElement::Heading { level: 2, text: "Recommendation" },
+ BodyElement::Paragraph { text: finding.recommendation },
+ ],
+}
+```
+
+### Phase Guards
+
+Findings integrate with existing phase guards:
+- Phase guard checks finding gate before allowing transition
+- User sees a summary of open findings when reviewing phase transition
+- P1 findings produce a warning that requires explicit override
+
+### Supervisor Questions
+
+When P1 findings block a transition, the supervisor can ask:
+
+```bash
+makima supervisor ask \
+ "2 P1 findings are still open. How would you like to proceed?" \
+ --choices "Fix findings first,Override and continue,Mark as won't-fix" \
+ --context "SEC-001: SQL injection (P1), PERF-001: Memory leak (P1)"
+```
+
+### Task Assignment
+
+Findings reference tasks:
+- `source_task_id`: The review task that discovered the finding
+- `assigned_to`: The task spawned to resolve the finding
+
+```bash
+# Spawn a fix task and assign the finding
+makima supervisor spawn "fix-sec-001" \
+ --plan "Fix SQL injection vulnerability in src/api/users.rs:47. Use parameterized queries."
+
+makima supervisor finding update SEC-001 \
+ --status in-progress \
+ --assigned-to <spawned-task-id>
+```
+
+### Autonomous Loop
+
+The autonomous loop can use findings as a completion gate condition:
+
+```xml
+<COMPLETION_GATE>
+ready: false
+reason: "2 P1 findings still open"
+progress: "Resolved 5/7 findings"
+blockers: ["SEC-001: SQL injection", "PERF-001: Memory leak"]
+</COMPLETION_GATE>
+```
+
+---
+
+## Implementation Plan
+
+### Phase 1: Core Finding System (3-4 days)
+
+| Task | Effort | Description |
+|------|--------|-------------|
+| Finding metadata schema | 0.5 days | FindingMetadata struct, validation |
+| `finding create` command | 1 day | Create finding as contract file |
+| `finding list/summary` commands | 0.5 days | Query and display findings |
+| `finding update` command | 0.5 days | Status transitions, validation |
+| Auto-ID generation | 0.5 days | Category-based IDs (SEC-001, PERF-002) |
+
+### Phase 2: Integration (2-3 days)
+
+| Task | Effort | Description |
+|------|--------|-------------|
+| Phase guard integration | 0.5 days | Check P1 findings before transition |
+| `finding parse-output` | 1 day | Parse review task output into findings |
+| Merge blocking logic | 0.5 days | Block merge with open P1s |
+| Finding assignment to tasks | 0.5 days | Track resolution via task ID |
+
+### Phase 3: Automation & Polish (2-3 days)
+
+| Task | Effort | Description |
+|------|--------|-------------|
+| `finding auto-resolve` | 1 day | Spawn fix tasks per finding |
+| Verification workflow | 0.5 days | Re-review to verify fixes |
+| Finding reports | 0.5 days | Summary contract file |
+| Documentation | 0.5 days | User guide |
+| Tests | 0.5 days | Unit + integration |
+
+---
+
+## Configuration Examples
+
+### Finding Creation in Review Agent Output
+
+Review agents produce structured findings in their output:
+
+```markdown
+## FINDING: SQL Injection in User Search
+
+- **Severity**: P1
+- **Category**: security
+- **File**: src/api/users.rs
+- **Line**: 47
+- **Tags**: injection, input-validation, database
+
+### Description
+The `search_users` handler directly interpolates the `query` parameter...
+
+### Evidence
+```rust
+let sql = format!("SELECT * FROM users WHERE name LIKE '%{}%'", query);
+```
+
+### Recommendation
+Use parameterized queries with sqlx::query().bind()
+```
+
+The synthesis step parses these into formal Finding records.
+
+### Merge Blocking Configuration
+
+```yaml
+# .makima/review-agents.yaml (or contract config)
+review:
+ findings:
+ merge_blocking_severity: P1 # P1 blocks merge
+ require_justification: P2 # P2 needs justification to defer
+ auto_resolve: true # Spawn fix tasks for P1/P2
+ auto_resolve_severity: P1,P2 # Which severities to auto-resolve
+ verification:
+ enabled: true # Re-review after resolution
+ re_review_agents: # Which agents verify fixes
+ - security-sentinel # Security findings verified by security agent
+```
+
+### Finding Lifecycle Example
+
+```bash
+# 1. Review creates finding
+makima supervisor finding create --severity P1 --category security \
+ --title "SQL injection in user search" --file src/api/users.rs --line 47
+
+# 2. Auto-resolve spawns fix task
+makima supervisor finding auto-resolve --severity P1
+# → Spawns task "fix-SEC-001" with plan based on finding details
+
+# 3. Fix task completes, finding auto-updated
+# finding SEC-001: open → in-progress → resolved
+
+# 4. Verification re-reviews the fix
+makima supervisor finding verify SEC-001
+# → Spawns verification task targeting the specific file/line
+
+# 5. Verification passes
+# finding SEC-001: resolved → verified
+
+# 6. Phase transition allowed
+makima supervisor advance-phase compound -y
+```
+
+---
+
+## Open Questions
+
+1. **Finding storage**: Contract files vs. dedicated findings table in the database? Contract files are simpler but querying is less efficient.
+2. **Cross-contract findings**: Should findings persist across contracts? (e.g., a P2 deferred from one contract carries to the next)
+3. **Finding templates**: Should common finding types have templates? (e.g., "SQL injection" pre-fills category, severity, recommendation)
+4. **External integration**: Should findings be exportable to GitHub Issues, Jira, or other issue trackers?
+5. **Metric tracking**: How granular should finding metrics be? Per-contract? Per-repository? Per-category?
+6. **False positive handling**: How should agents indicate confidence level? Should low-confidence findings be automatically P3?
+
+---
+
+## Alternatives Considered
+
+| Alternative | Pros | Cons | Decision |
+|-------------|------|------|----------|
+| GitHub Issues integration | Rich UI, collaboration | External dependency; not all projects use GitHub | Deferred — consider as export target |
+| Plain text findings | Simple | Not queryable, no lifecycle | Rejected — defeats the purpose |
+| Dedicated findings DB table | Fast queries, rich indexing | New infrastructure, migration | Recommended for v2 |
+| Contract file-based | Uses existing infrastructure | Slower queries for large sets | Adopted for v1 |
+| Inline code comments | Close to code | Lost on next commit; hard to track | Rejected — not persistent |
+
+---
+
+## Priority & Complexity Assessment
+
+- **Priority: MEDIUM** — Structured findings transform the review phase from documentation to a quality gate. Essential for the Multi-Agent Review feature to produce actionable output.
+- **Complexity: LOW** — Finding records are simple structured data. Lifecycle state machine is straightforward. Main integration point (phase guards) already exists.
+- **Risk: LOW** — Purely additive feature. Worst case: findings exist but aren't used (same as today). Can be adopted incrementally.