summaryrefslogblamecommitdiff
path: root/docs/research/makima-improvement-ideas.md
blob: 36e3be2ce42e454164d409d38148556eda46e377 (plain) (tree)



































































































































































































































                                                                                                                                                                                                                                                                                   
# Makima Improvement Ideas from claude-flow & ruvector Research

> Research conducted 2026-02-24
> Sources: https://github.com/ruvnet/claude-flow, https://github.com/ruvnet/ruvector

## Summary of Top Improvement Ideas

### 1. Intelligent Task Routing with Q-Learning
**Source**: claude-flow Q-Learning Router + MoE
**Priority**: High
**Type**: Spike

**Current State**: Makima uses static task assignment - the planning step determines task distribution upfront.

**Improvement**: Add a Q-Learning-based router that learns which types of tasks succeed best with which configurations. Track execution metrics (time, token usage, success rate, retry count) per task type and use this to inform future planning.

**Implementation Sketch**:
- Record task execution telemetry (duration, tokens, outcome, complexity indicators)
- Build a task-type classifier based on step descriptions
- Train Q-values for (task_type, configuration) → expected_outcome
- Use learned routing to suggest optimal task configurations during directive planning
- Start simple: just track success rates by task description keywords

**Expected Benefit**: 20-40% improvement in task completion rates, reduced wasted retries.

---

### 2. Self-Learning from Execution Trajectories (ReasoningBank)
**Source**: claude-flow ReasoningBank, ruvector trajectory learning
**Priority**: High
**Type**: Spike

**Current State**: Makima doesn't learn from past executions. Each new directive starts from scratch.

**Improvement**: Implement a pattern bank that stores successful task plans, common failure patterns, and effective prompt strategies. When planning new directives, retrieve similar past successes to inform the plan.

**Implementation Sketch**:
- After each directive completes, extract and store: step descriptions, task plans, success/failure, duration, key decisions
- Use embedding similarity (via simple TF-IDF or external embeddings) to match new steps against past patterns
- Inject relevant past patterns into the planning prompt as examples
- Track which patterns led to better outcomes (RETRIEVE → JUDGE → CONSOLIDATE cycle)

**Expected Benefit**: 30%+ faster planning, fewer failed first attempts, accumulated organizational knowledge.

---

### 3. Drift Control & Checkpoint System
**Source**: claude-flow anti-drift mechanisms
**Priority**: High
**Type**: Spike

**Current State**: Makima relies on Claude Code to stay on-task. No systematic drift detection.

**Improvement**: Add checkpoint validation between task steps. The coordinator can review task outputs against original goals and catch drift early.

**Implementation Sketch**:
- Add post-task hooks that validate output against step description
- Implement a lightweight "alignment check" prompt that evaluates: "Does this output match the intended goal?"
- If drift detected, flag for human review (reconcile mode) or auto-correct
- Track drift frequency per task type to improve future task plans

**Expected Benefit**: Catch 80%+ of drift before it compounds across steps. Reduce wasted work.

---

### 4. Cost-Aware Model Routing
**Source**: claude-flow three-tier routing, Agent Booster
**Priority**: Medium
**Type**: Spike

**Current State**: Makima always uses whatever Claude Code instance is configured. No cost optimization.

**Improvement**: Classify task complexity and route to appropriate model tier. Simple tasks (documentation updates, config changes) could use cheaper models or cached patterns. Complex tasks (architecture, debugging) get full Opus.

**Implementation Sketch**:
- Add task complexity classifier based on description keywords and historical data
- For simple tasks: use Haiku-class model or pre-cached patterns
- For medium tasks: use Sonnet
- For complex tasks: use Opus with extended context
- Track cost per task and optimize routing over time

**Expected Benefit**: 40-60% cost reduction for typical directive workloads.

---

### 5. Hook/Event System for Task Lifecycle
**Source**: claude-flow 33+ hook system
**Priority**: Medium
**Type**: Spike

**Current State**: Makima has limited lifecycle events. Steps go from pending → running → completed/failed.

**Improvement**: Add a hook/event system for extensible task lifecycle management. Events like pre-task, post-task, on-error, on-retry enable plugins for logging, metrics, drift detection, and custom workflows.

**Implementation Sketch**:
- Define event types: directive.start, step.start, step.complete, step.fail, step.retry, task.spawn, task.complete
- Add webhook/callback support for each event
- Enable custom handlers (e.g., Slack notifications, metrics collection, auto-retry policies)
- Use events to feed the self-learning system

**Expected Benefit**: Extensibility, better observability, foundation for self-learning and drift control.

---

### 6. Swarm Topologies for Complex Directives
**Source**: claude-flow swarm coordination (hierarchical, mesh, ring, star)
**Priority**: Medium
**Type**: Spike

**Current State**: Makima uses a simple supervisor → worker model.

**Improvement**: Support different coordination topologies for different directive types. Large features could use hierarchical (coordinator reviews all outputs). Research tasks could use mesh (agents share findings). Sequential migrations could use ring (output chains).

**Implementation Sketch**:
- Define topology types in directive configuration
- Hierarchical: one coordinator task validates all worker outputs before proceeding
- Mesh: tasks can share intermediate results through shared files/context
- Ring: strict sequential with output-as-input chaining
- Auto-select topology based on directive type

**Expected Benefit**: Better coordination for complex multi-task directives. Reduced duplication and conflict.

---

### 7. Self-Learning DAG Optimization
**Source**: ruvector self-learning DAG execution
**Priority**: Medium
**Type**: Spike

**Current State**: Makima DAG execution follows static dependency ordering.

**Improvement**: Learn optimal step ordering and parallelization from execution history. Identify which steps benefit from parallelization vs sequential execution. Automatically adjust DAG scheduling based on learned patterns.

**Implementation Sketch**:
- Track actual execution times and dependencies between steps
- Identify critical path and bottleneck steps
- Learn which steps can safely run in parallel (based on file overlap analysis)
- Apply MinCut-style optimization to identify steps that could be split for parallelism
- Suggest DAG modifications during planning based on historical data

**Expected Benefit**: 20-40% faster directive completion through optimized scheduling.

---

### 8. Memory/Context Sharing Between Tasks
**Source**: claude-flow 3-scope memory, ruvector COW branching
**Priority**: Medium
**Type**: Spike

**Current State**: Each makima task operates in isolation with its own worktree. Limited context sharing.

**Improvement**: Implement scoped shared memory for cross-task context sharing. Tasks within the same directive should be able to share findings, decisions, and intermediate results.

**Implementation Sketch**:
- Add directive-scoped key-value store (the `memory-set`/`memory-get` commands from the task plan)
- Tasks can read/write to shared directive memory
- Support structured data (JSON) for machine-readable sharing
- Add project-scoped memory for cross-directive learnings
- Consider copy-on-write semantics for large shared contexts

**Expected Benefit**: Better coordination between parallel tasks. Reduced redundant work.

---

### 9. Claims System for Human-Agent Coordination
**Source**: claude-flow claims system
**Priority**: Low
**Type**: Chore

**Current State**: Makima reconcile mode allows human review but lacks formal work ownership tracking.

**Improvement**: Add a claims system where tasks can be claimed by agents or humans. Enables smooth handoff when an agent gets stuck and a human needs to take over (or vice versa).

**Implementation Sketch**:
- Add claim/release/handoff operations to task lifecycle
- Track who (agent or human) currently owns each task
- Allow partial completion with handoff notes
- Integrate with reconcile mode for approval workflows

**Expected Benefit**: Smoother human-agent collaboration. Clearer responsibility tracking.

---

### 10. Fault Tolerance with Provider Failover
**Source**: claude-flow 6-provider failover, ruvector Raft consensus
**Priority**: Low
**Type**: Spike

**Current State**: Makima depends on a single Claude Code provider. Failures require manual intervention.

**Improvement**: Add retry policies with exponential backoff, provider failover (if multiple API keys available), and graceful degradation.

**Implementation Sketch**:
- Add configurable retry policies per task type
- Support multiple API key rotation
- Implement circuit breaker pattern for persistent failures
- Auto-reassign failed tasks to fresh Claude Code instances
- Track failure patterns to avoid repeating known-bad configurations

**Expected Benefit**: Higher reliability. Reduced manual intervention for transient failures.

---

## Priority Matrix

| # | Improvement | Impact | Effort | Priority |
|---|-------------|--------|--------|----------|
| 1 | Q-Learning Task Routing | High | High | High |
| 2 | ReasoningBank (Learn from History) | High | Medium | High |
| 3 | Drift Control & Checkpoints | High | Medium | High |
| 4 | Cost-Aware Model Routing | Medium | Medium | Medium |
| 5 | Hook/Event System | Medium | Medium | Medium |
| 6 | Swarm Topologies | Medium | High | Medium |
| 7 | Self-Learning DAG Optimization | Medium | High | Medium |
| 8 | Memory/Context Sharing | Medium | Medium | Medium |
| 9 | Claims System | Low | Low | Low |
| 10 | Fault Tolerance/Failover | Low | Medium | Low |

## Quick Wins (Implement First)

1. **Execution telemetry** - Start recording task metrics now (foundation for everything)
2. **Directive memory** (`memory-set`/`memory-get`) - Already planned, enables context sharing
3. **Post-task validation** - Simple alignment check on completed tasks
4. **Retry policies** - Configurable auto-retry with backoff

## Long-Term Vision

Combine items 1, 2, 7 into a **Self-Improving Orchestration Engine**: makima learns from every directive execution, optimizes task routing and scheduling, and continuously improves planning quality. This creates a competitive moat where the system gets better with every use.