diff options
Diffstat (limited to 'docs/proposals/feature-knowledge-accumulation.md')
| -rw-r--r-- | docs/proposals/feature-knowledge-accumulation.md | 539 |
1 files changed, 539 insertions, 0 deletions
diff --git a/docs/proposals/feature-knowledge-accumulation.md b/docs/proposals/feature-knowledge-accumulation.md new file mode 100644 index 0000000..faef06a --- /dev/null +++ b/docs/proposals/feature-knowledge-accumulation.md @@ -0,0 +1,539 @@ +# Feature Proposal: Knowledge Accumulation / Compound Learning System + +> **Priority:** High +> **Complexity:** Medium +> **Estimated Effort:** 10-15 days +> **Status:** Proposal +> **Date:** 2026-02-09 +> **Dependencies:** Contract Files system (existing) +> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Plan Deepening](feature-plan-deepening.md) · [Workflow Presets](feature-workflow-presets.md) + +--- + +## Problem Statement + +When a makima contract completes, the **knowledge generated during that contract is effectively lost**: + +- **Solutions to tricky problems** exist only in task conversation history, which is not searchable or surfaceable +- **Patterns discovered** during one contract cannot inform future contracts +- **Mistakes made** in one contract are likely to be repeated in similar future contracts +- **Best practices** established during execution are not codified anywhere retrievable +- **Contract files** capture deliverables but not the *meta-knowledge* about how those deliverables were produced + +This means every new contract starts from zero context, even when the team has solved similar problems before. Engineering effort does not compound. + +--- + +## How Compound Engineering Solves This + +The compound engineering plugin implements a `/compound` command that runs **5 parallel sub-agents** immediately after review: + +``` +┌─────────────────────────────────────────────────────────┐ +│ /compound │ +│ │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Context │ │ Solution │ │ Prevention │ │ +│ │ Extractor │ │ Documenter │ │ Strategist │ │ +│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ +│ │ │ │ │ +│ ┌──────┴──────┐ ┌──────┴──────┐ │ +│ │ Doc │ │ Category │ │ +│ │ Linker │ │ Classifier │ │ +│ └──────┬──────┘ └──────┬──────┘ │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌──────────────────────────────────────┐ │ +│ │ docs/solutions/[category]/file.md │ │ +│ │ │ │ +│ │ --- │ │ +│ │ category: build-errors │ │ +│ │ severity: medium │ │ +│ │ tags: [webpack, esm, cjs] │ │ +│ │ date: 2026-02-09 │ │ +│ │ contract: abc-123 │ │ +│ │ --- │ │ +│ │ │ │ +│ │ # Mixed ESM/CJS Import Resolution │ │ +│ │ │ │ +│ │ ## Problem │ │ +│ │ ... │ │ +│ │ ## Solution │ │ +│ │ ... │ │ +│ │ ## Prevention │ │ +│ │ ... │ │ +│ └──────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +### 9 Auto-Detected Categories + +| Category | Description | +|----------|-------------| +| `build-errors` | Compilation, bundling, dependency resolution | +| `test-failures` | Test setup, assertion patterns, mocking | +| `api-patterns` | API design, endpoint structure, versioning | +| `architecture-decisions` | Structural choices, trade-offs, patterns | +| `performance-optimizations` | Speed, memory, caching strategies | +| `security-practices` | Auth, input validation, secrets management | +| `debugging-techniques` | Investigation methods, logging strategies | +| `tooling-configurations` | Tool setup, config patterns, CI/CD | +| `domain-knowledge` | Business logic, domain-specific patterns | + +--- + +## Proposed Makima Implementation + +### 1. New "Compound" Phase + +Add an optional **compound** phase to the contract lifecycle, positioned after review: + +``` +Research → Specify → Plan → Execute → Review → Compound + ▲ + (new phase) +``` + +**Phase behavior:** +- **Auto-triggered** after review phase completes (configurable) +- **Short-lived** — typically completes in 1-3 minutes +- Extracts learnings from the contract's execution and review +- Stores them as searchable, categorized learning documents +- Can be skipped via configuration for trivial contracts + +### 2. New Supervisor Command: `makima supervisor compound` + +```bash +# Run compound learning for the current contract +makima supervisor compound + +# Compound with specific focus areas +makima supervisor compound --focus "security,performance" + +# Compound with explicit learnings +makima supervisor compound --learning "The retry logic needed exponential backoff, not fixed delay" +``` + +**Implementation:** + +```bash +# Under the hood, this spawns learning sub-agents +makima supervisor spawn-group "compound" \ + --tasks '[ + { + "name": "context-extractor", + "plan": "Extract the problem context, constraints, and environment details from the contract execution history..." + }, + { + "name": "solution-documenter", + "plan": "Document the solutions that were applied, including code patterns and configuration changes..." + }, + { + "name": "prevention-strategist", + "plan": "Identify what could prevent this class of problem in the future..." + }, + { + "name": "category-classifier", + "plan": "Classify these learnings into the appropriate category..." + }, + { + "name": "doc-linker", + "plan": "Link these learnings to existing documentation and related learnings..." + } + ]' +``` + +### 3. Learning Document Schema + +Each learning is stored as a **contract file** with structured content and metadata: + +```yaml +# Learning document metadata (stored in file description/metadata) +learning: + category: "build-errors" # One of 9 categories + severity: "medium" # low, medium, high, critical + tags: ["webpack", "esm", "cjs"] # Free-form tags + source_contract_id: "abc-123" # Contract that produced this learning + source_contract_name: "Fix webpack bundling" + repository: "github.com/org/repo" + date: "2026-02-09" + quality_score: 0.85 # 0-1, set by quality gate + access_count: 0 # Incremented on retrieval + last_accessed: null + relevance_decay: 0.95 # Per-month decay factor +``` + +**Document body structure:** + +```markdown +# Mixed ESM/CJS Import Resolution + +## Problem +When upgrading to webpack 5, mixed ESM and CommonJS imports caused +"Cannot use import statement outside a module" errors in production +but not development. + +## Root Cause +The `type: "module"` field in package.json applied ESM resolution +globally, but several dependencies only provided CJS exports. + +## Solution +1. Added `resolve.fullySpecified: false` to webpack config +2. Used `@babel/plugin-transform-modules-commonjs` for CJS deps +3. Created explicit `.cjs` extensions for config files + +## Code Pattern +```javascript +// webpack.config.cjs (note: .cjs extension) +module.exports = { + resolve: { + fullySpecified: false, + extensions: ['.js', '.mjs', '.cjs', '.json'] + } +}; +``` + +## Prevention +- Add webpack build check to CI before merging +- Document module system choice in project README +- Use `resolve.fullySpecified: false` by default in webpack 5 projects + +## Related +- docs/solutions/tooling-configurations/webpack-5-migration.md +- Contract: "Initial Webpack 5 Migration" (2026-01-15) +``` + +### 4. Storage Architecture + +Learnings are stored in two complementary locations: + +#### A. Contract Files (Structured, Persistent) + +```rust +// Each learning becomes a contract file +File { + contract_id: Some(source_contract.id), + contract_phase: Some("compound"), + name: "Learning: Mixed ESM/CJS Import Resolution", + description: Some("category=build-errors; tags=webpack,esm,cjs; severity=medium"), + body: vec![ + BodyElement::Heading { level: 1, text: "Mixed ESM/CJS Import Resolution" }, + BodyElement::Heading { level: 2, text: "Problem" }, + BodyElement::Paragraph { text: "..." }, + // ... structured content + ], + repo_file_path: Some("docs/solutions/build-errors/mixed-esm-cjs-resolution.md"), + repo_sync_status: Some("synced"), +} +``` + +#### B. Repository Files (Searchable, Portable) + +``` +docs/solutions/ +├── build-errors/ +│ ├── mixed-esm-cjs-resolution.md +│ └── docker-multi-stage-cache.md +├── test-failures/ +│ ├── async-test-timeout-patterns.md +│ └── mock-service-worker-setup.md +├── api-patterns/ +│ └── pagination-cursor-vs-offset.md +├── architecture-decisions/ +│ └── event-sourcing-tradeoffs.md +├── performance-optimizations/ +│ └── database-connection-pooling.md +├── security-practices/ +│ └── jwt-refresh-token-rotation.md +├── debugging-techniques/ +│ └── distributed-tracing-setup.md +├── tooling-configurations/ +│ └── github-actions-cache-strategy.md +└── domain-knowledge/ + └── payment-processing-idempotency.md +``` + +### 5. Auto-Surface Relevant Learnings + +When a new contract is created, automatically search for relevant learnings: + +```bash +# Supervisor plan template automatically includes: +# "Search existing learnings relevant to this task" + +makima supervisor search-learnings --query "webpack bundling errors" +makima supervisor search-learnings --category "build-errors" --tags "webpack" +makima supervisor search-learnings --repository "github.com/org/repo" +``` + +**Search algorithm:** + +``` +Relevance Score = + keyword_match_score * 0.4 + + category_match_score * 0.2 + + tag_overlap_score * 0.2 + + recency_score * 0.1 # Decays over time + + quality_score * 0.1 # Higher quality = more relevant +``` + +**Integration with plan phase:** + +``` +┌──────────────┐ ┌───────────────────┐ +│ New Contract │──────▶│ Plan Phase │ +│ Created │ │ │ +└──────────────┘ │ 1. Create plan │ + │ 2. Search for │◀── Learnings DB + │ relevant │ + │ learnings │ + │ 3. Inject context │ + │ into plan │ + └───────────────────┘ +``` + +### 6. Quality Control + +#### Relevance Decay + +Learnings lose relevance over time unless accessed: + +``` +effective_relevance = quality_score * (decay_factor ^ months_since_creation) + + access_bonus * recent_access_count +``` + +- Default decay factor: 0.95/month (learning at 60% relevance after 1 year) +- Access bonus: +0.05 per access (caps at +0.25) +- Learnings below 0.3 effective relevance are archived + +#### Deduplication + +When a new learning is created, check for existing similar learnings: + +``` +similarity = cosine_similarity(new_learning_embedding, existing_learning_embedding) +if similarity > 0.85: + merge_or_update(existing_learning, new_learning) +elif similarity > 0.70: + link_as_related(new_learning, existing_learning) +``` + +#### Quality Gate + +Before storing a learning, validate: + +| Check | Threshold | Action if Failed | +|-------|-----------|------------------| +| Has problem statement | Required | Reject | +| Has solution | Required | Reject | +| Has prevention strategy | Recommended | Warn, store with quality penalty | +| Code examples present | Recommended | Warn, store with quality penalty | +| Category valid | Required | Auto-classify | +| Not duplicate | >0.85 similarity | Merge with existing | +| Minimum length | >200 characters | Reject | + +--- + +## Integration with Existing Makima Features + +### Contract Phases + +The compound phase integrates into the existing phase system: + +```rust +// New phase variant +enum ContractPhase { + Research, + Specify, + Plan, + Execute, + Review, + Compound, // NEW +} +``` + +- Contracts with `contract_type: "specification"` get the full 6-phase cycle +- Contracts with `contract_type: "simple"` can opt-in via config +- Phase guard still applies: user must approve transition to compound + +### Contract Files + +Learnings are first-class contract files, leveraging existing: +- Versioning system +- Structured body format (`BodyElement` types) +- Repository file sync (`repo_file_path`, `repo_sync_status`) +- Phase association (`contract_phase: "compound"`) + +### Directive System + +For directive-based workflows, learnings can be captured per-step: + +```rust +DirectiveStep { + name: "compound-step-3", + description: "Capture learnings from database migration step", + depends_on: [step_3_id, review_step_id], + task_plan: "Extract and document learnings from the completed migration...", +} +``` + +### Supervisor CLI + +New commands integrate with existing CLI infrastructure: + +```bash +# In supervisor context +makima supervisor compound # Run compound phase +makima supervisor search-learnings "query" # Search knowledge base +makima supervisor list-learnings # List all learnings +makima supervisor learning-stats # Knowledge base statistics +``` + +--- + +## Implementation Plan + +### Phase 1: Core Infrastructure (4-5 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Add `compound` phase to contract lifecycle | 1 day | New phase enum, transition rules | +| Learning document schema | 1 day | Metadata structure, validation | +| `supervisor compound` command | 1-2 days | Spawn learning sub-agents | +| Repository file sync for learnings | 1 day | Write to `docs/solutions/` | + +### Phase 2: Search & Retrieval (3-5 days) + +| Task | Effort | Description | +|------|--------|-------------| +| `search-learnings` command | 1-2 days | Keyword + category search | +| Auto-surface in plan phase | 1-2 days | Inject relevant learnings into plans | +| Learning index | 1 day | Category/tag index for fast lookup | + +### Phase 3: Quality & Maintenance (3-5 days) + +| Task | Effort | Description | +|------|--------|-------------| +| Quality gate validation | 1 day | Pre-storage checks | +| Relevance decay system | 1 day | Scheduled decay + access tracking | +| Deduplication check | 1-2 days | Similarity detection and merging | +| Documentation & defaults | 1 day | User guide, default categories | + +--- + +## Configuration Examples + +### Enable Compound Phase (Contract-Level) + +```yaml +# Contract configuration +compound: + enabled: true + auto_trigger: true # Auto-run after review completes + categories: # Override default categories + - build-errors + - test-failures + - api-patterns + - architecture-decisions + - performance-optimizations + - security-practices + - debugging-techniques + - tooling-configurations + - domain-knowledge + quality_gate: + min_length: 200 + require_problem: true + require_solution: true + require_prevention: false + storage: + contract_files: true # Store as contract files + repo_files: true # Also write to docs/solutions/ + repo_path: "docs/solutions" +``` + +### Repository-Level Configuration (`.makima/compound.yaml`) + +```yaml +# .makima/compound.yaml +version: 1 +compound: + # Default settings for all contracts in this repo + auto_trigger: true + + # Custom categories for this project + categories: + - build-errors + - test-failures + - api-patterns + - payment-processing # Custom domain category + - compliance-requirements # Custom domain category + + # Search settings + search: + max_results: 10 + min_relevance: 0.3 + include_archived: false + + # Decay settings + decay: + factor: 0.95 # Per month + archive_threshold: 0.3 + access_bonus: 0.05 + max_access_bonus: 0.25 +``` + +### Searching Learnings + +```bash +# Full-text search +makima supervisor search-learnings "webpack ESM import error" + +# Category filter +makima supervisor search-learnings --category build-errors + +# Tag filter +makima supervisor search-learnings --tags webpack,esm + +# Repository filter +makima supervisor search-learnings --repo github.com/org/repo + +# Combined +makima supervisor search-learnings "import error" \ + --category build-errors \ + --tags webpack \ + --min-relevance 0.5 \ + --limit 5 +``` + +--- + +## Open Questions + +1. **Cross-repository knowledge**: Should learnings be scoped to a single repository or shared across all repositories for an owner? +2. **Learning ownership**: Who owns a learning — the contract creator, the repository, or the organization? +3. **Privacy**: Are learnings visible to all users, or scoped by access control? +4. **Embedding model**: For similarity-based deduplication and search, which embedding model should be used? Trade-off between quality and cost. +5. **Storage limits**: Should there be a cap on the number of learnings per repository/owner? +6. **Manual curation**: Should users be able to manually create, edit, or delete learnings outside the compound phase? +7. **Export/import**: Should learnings be exportable/importable across makima instances? + +--- + +## Alternatives Considered + +| Alternative | Pros | Cons | Decision | +|-------------|------|------|----------| +| Store learnings only in contract files | Simple, uses existing infrastructure | Not easily searchable across contracts | Rejected — search is critical | +| Store learnings only in repo files | Portable, version-controlled, greppable | Lost if repo deleted; no cross-repo search | Partial — use as secondary storage | +| Use external knowledge base (e.g., vector DB) | Best search quality | Added infrastructure dependency | Deferred — consider for v2 | +| Manual-only knowledge capture | No noise | Knowledge rarely captured | Rejected — must be automatic | +| Full contract history indexing | Most complete | Massive storage, noise, privacy concerns | Rejected — too much signal-to-noise | + +--- + +## Priority & Complexity Assessment + +- **Priority: HIGH** — This is the defining feature of compound engineering. Without knowledge accumulation, every contract starts from scratch. This is the feature that creates compounding returns. +- **Complexity: MEDIUM** — Core capture and storage is straightforward using existing contract files and repo sync. Search quality and relevance decay require iterative refinement. +- **Risk: MEDIUM** — Primary risk is low adoption (users skip compound phase) mitigated by auto-trigger. Secondary risk is knowledge base noise mitigated by quality gates. |
