summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorsoryu <soryu@soryu.co>2026-02-24 20:33:54 +0000
committersoryu <soryu@soryu.co>2026-02-24 20:33:54 +0000
commit76255bae0c19b3a49bd1e36be8cd135202e85372 (patch)
tree48f66f56c2557b5101a49775e0d964ccd94b516a
parent1266715f2e0da01dba92339584b3284d8039e2b3 (diff)
parent16850953ccc2ff098000125d94f902774659e917 (diff)
downloadsoryu-makima/directive-soryu-co-soryu---makima-19fd3e1d-v1771965216.tar.gz
soryu-makima/directive-soryu-co-soryu---makima-19fd3e1d-v1771965216.zip
Merge remote-tracking branch 'origin/makima/soryu-co-soryu---makima--research-claude-flow-and--de0c996c' into makima/directive-soryu-co-soryu---makima-19fd3e1d-v1771965216makima/directive-soryu-co-soryu---makima-19fd3e1d-v1771965216
-rw-r--r--docs/research/claude-flow-research.md169
-rw-r--r--docs/research/makima-improvement-ideas.md228
-rw-r--r--docs/research/ruvector-research.md113
3 files changed, 510 insertions, 0 deletions
diff --git a/docs/research/claude-flow-research.md b/docs/research/claude-flow-research.md
new file mode 100644
index 0000000..479691a
--- /dev/null
+++ b/docs/research/claude-flow-research.md
@@ -0,0 +1,169 @@
+# Claude-Flow (Ruflo v3) Research Summary
+
+> Research conducted 2026-02-24 for makima improvement evaluation
+
+## Overview
+
+claude-flow (marketed as Ruflo v3) is an enterprise AI orchestration system built around Claude Code. It provides 175+ MCP tools, manages 60+ specialized agents, and has accumulated 5,923+ commits. Key performance claims: 84.8% SWE-Bench solve rate, 2.8-4.4x faster task completion vs baseline Claude Code.
+
+**Repository**: https://github.com/ruvnet/claude-flow
+
+## Architecture
+
+### Layered Design
+```
+User Layer: CLI + Claude Code interfaces
+Entry Layer: MCP Server with AIDefence security validation
+Routing Layer: Q-Learning router + MoE (8 experts) + 42 skills + 17 hooks
+Swarm Layer: Topologies (mesh/hierarchical/ring/star) + consensus
+Agent Layer: 60+ specialized agents
+Resources: Memory systems, LLM providers, 12 background workers
+Intelligence: RuVector with 10+ optimization components
+```
+
+### MCP Integration
+- Runs as stdio process providing 175+ tools
+- MCP 2025-11-25 full specification compliance
+- Supports tools, resources, prompts, and tasks
+- Multiple transports: stdio, HTTP, WebSocket, in-process
+
+## Multi-Agent Coordination (Hive Mind)
+
+### Queen Types (Coordinators)
+| Type | Role |
+|------|------|
+| Strategic | Planning and goal decomposition |
+| Tactical | Execution coordination |
+| Adaptive | Optimization and learning |
+
+### Worker Types (8 Specialized Roles)
+1. **Researcher** - Information gathering and analysis
+2. **Coder** - Implementation
+3. **Analyst** - Data analysis and insights
+4. **Tester** - Quality assurance
+5. **Architect** - System design
+6. **Reviewer** - Code review and quality gates
+7. **Optimizer** - Performance tuning
+8. **Documenter** - Documentation generation
+
+### Consensus Algorithms
+- **Byzantine** (f < n/3): 2/3 majority for decisions
+- **Weighted Voting**: Queen has 3x authority
+- **Majority Voting**: Simple democratic decisions
+
+## Task Routing & Scheduling
+
+### Q-Learning Router
+- Combined with MoE (8 experts)
+- 89% routing accuracy
+- 34,798 routes/s throughput
+- Learns which agents perform best per task type through execution trajectories
+
+### Three-Tier Routing Strategy
+| Tier | Handler | Latency | Cost |
+|------|---------|---------|------|
+| Simple | Agent Booster WASM | <1ms | $0 |
+| Medium | Haiku/Sonnet | ~500ms | Low |
+| Complex | Opus + multi-agent swarms | 2-5s | Standard |
+
+### Task Templates (Agent Combinations)
+| Task Type | Recommended Agents |
+|-----------|--------------------|
+| Bug Fix | Coordinator, Researcher, Coder, Tester |
+| Feature | Coordinator, Architect, Coder, Tester, Reviewer |
+| Refactor | Coordinator, Architect, Coder, Reviewer |
+| Performance | Coordinator, Perf-Engineer, Coder |
+| Security | Coordinator, Security-Architect, Auditor |
+
+## Self-Learning Mechanisms
+
+### SONA (Self-Optimizing Neural Architecture)
+- <0.05ms adaptation time
+- Rapid behavior adjustment at runtime
+- Two-tier LoRA + EWC++ + ReasoningBank integration
+
+### EWC++ (Elastic Weight Consolidation)
+- Preserves 95%+ knowledge across tasks
+- Prevents catastrophic forgetting
+
+### ReasoningBank
+- Pattern caching with RETRIEVE → JUDGE → DISTILL → CONSOLIDATE → ROUTE cycle
+- 32% token savings through pattern retrieval instead of full context
+- Stores successful execution trajectories for reuse
+
+### MicroLoRA
+- 128x compressed fine-tuning
+- No full retraining required
+- Lightweight runtime adaptation
+
+## Memory & Context Sharing
+
+### 3-Scope Architecture
+| Scope | Purpose |
+|-------|---------|
+| Project | Task-specific context |
+| Local | Machine/user patterns |
+| User | Cross-project learnings |
+
+### Storage Stack
+- **HNSW Vector Search**: 150x-12,500x faster retrieval, 16,400 QPS
+- **AgentDB**: SQLite with WAL for persistence
+- **LRU Cache**: Sub-millisecond access for hot data
+- **Knowledge Graph**: PageRank + community detection for insight ranking
+
+### 8 Memory Types
+Attention, episodic, procedural, semantic, + 4 additional types for comprehensive knowledge representation.
+
+## Drift Control
+
+Critical for multi-agent alignment:
+1. **Hierarchical Coordinator** validates all outputs against goals
+2. **Small Teams** (6-8 agents) reduce coordination overhead
+3. **Frequent Checkpoints** via post-task hooks verify compliance
+4. **Raft Consensus** maintains authoritative state
+5. **Specialized Roles** enforce clear task boundaries
+
+## Cost Optimization
+
+### Multi-Layer Strategy
+| Layer | Mechanism | Savings |
+|-------|-----------|---------|
+| 1 | Agent Booster WASM | Eliminates tokens entirely |
+| 2 | Haiku/Sonnet routing | 75% lower than Opus |
+| 3 | ReasoningBank | -32% token savings |
+| 4 | Token compression | 30-50% reduction |
+| 5 | Caching | 95% hit rate |
+| **Combined** | **All layers** | **Extends Claude Max 250%** |
+
+## Hook System
+
+33+ hooks across 7 categories:
+- **Session**: start, end
+- **Agent**: pre-spawn, post-spawn, pre-terminate
+- **Task**: pre-execute, post-complete, error
+- **Tool**: pre-call, post-call
+- **Memory**: store/retrieve operations
+- **Swarm**: coordination events
+- **File**: read/write operations
+
+Self-Learning Hooks feed execution insights back into the Q-Learning router.
+
+## Claims System (Human-Agent Coordination)
+- **Claim**: Agent requests task ownership
+- **Release**: Agent returns uncompleted work
+- **Handoff**: Human reassigns to different agent
+- Prevents duplicate effort and maintains clear responsibility
+
+## Fault Tolerance
+- Byzantine fault-tolerant (f < n/3, 2/3 majority)
+- 6 LLM provider failover (Claude, GPT, Gemini, etc.)
+- Checkpoint system prevents cascading failures
+- Persist/Restore/Export session management
+
+## Swarm Topologies
+| Topology | Structure | Best For |
+|----------|-----------|----------|
+| Hierarchical | Coordinator + workers | Structured coding tasks (0.20s, 256MB/agent) |
+| Mesh | Peer-to-peer | Collaborative, high redundancy |
+| Ring | Sequential chain | Pipeline processing |
+| Star | Hub-and-spoke | Centralized control |
diff --git a/docs/research/makima-improvement-ideas.md b/docs/research/makima-improvement-ideas.md
new file mode 100644
index 0000000..36e3be2
--- /dev/null
+++ b/docs/research/makima-improvement-ideas.md
@@ -0,0 +1,228 @@
+# Makima Improvement Ideas from claude-flow & ruvector Research
+
+> Research conducted 2026-02-24
+> Sources: https://github.com/ruvnet/claude-flow, https://github.com/ruvnet/ruvector
+
+## Summary of Top Improvement Ideas
+
+### 1. Intelligent Task Routing with Q-Learning
+**Source**: claude-flow Q-Learning Router + MoE
+**Priority**: High
+**Type**: Spike
+
+**Current State**: Makima uses static task assignment - the planning step determines task distribution upfront.
+
+**Improvement**: Add a Q-Learning-based router that learns which types of tasks succeed best with which configurations. Track execution metrics (time, token usage, success rate, retry count) per task type and use this to inform future planning.
+
+**Implementation Sketch**:
+- Record task execution telemetry (duration, tokens, outcome, complexity indicators)
+- Build a task-type classifier based on step descriptions
+- Train Q-values for (task_type, configuration) → expected_outcome
+- Use learned routing to suggest optimal task configurations during directive planning
+- Start simple: just track success rates by task description keywords
+
+**Expected Benefit**: 20-40% improvement in task completion rates, reduced wasted retries.
+
+---
+
+### 2. Self-Learning from Execution Trajectories (ReasoningBank)
+**Source**: claude-flow ReasoningBank, ruvector trajectory learning
+**Priority**: High
+**Type**: Spike
+
+**Current State**: Makima doesn't learn from past executions. Each new directive starts from scratch.
+
+**Improvement**: Implement a pattern bank that stores successful task plans, common failure patterns, and effective prompt strategies. When planning new directives, retrieve similar past successes to inform the plan.
+
+**Implementation Sketch**:
+- After each directive completes, extract and store: step descriptions, task plans, success/failure, duration, key decisions
+- Use embedding similarity (via simple TF-IDF or external embeddings) to match new steps against past patterns
+- Inject relevant past patterns into the planning prompt as examples
+- Track which patterns led to better outcomes (RETRIEVE → JUDGE → CONSOLIDATE cycle)
+
+**Expected Benefit**: 30%+ faster planning, fewer failed first attempts, accumulated organizational knowledge.
+
+---
+
+### 3. Drift Control & Checkpoint System
+**Source**: claude-flow anti-drift mechanisms
+**Priority**: High
+**Type**: Spike
+
+**Current State**: Makima relies on Claude Code to stay on-task. No systematic drift detection.
+
+**Improvement**: Add checkpoint validation between task steps. The coordinator can review task outputs against original goals and catch drift early.
+
+**Implementation Sketch**:
+- Add post-task hooks that validate output against step description
+- Implement a lightweight "alignment check" prompt that evaluates: "Does this output match the intended goal?"
+- If drift detected, flag for human review (reconcile mode) or auto-correct
+- Track drift frequency per task type to improve future task plans
+
+**Expected Benefit**: Catch 80%+ of drift before it compounds across steps. Reduce wasted work.
+
+---
+
+### 4. Cost-Aware Model Routing
+**Source**: claude-flow three-tier routing, Agent Booster
+**Priority**: Medium
+**Type**: Spike
+
+**Current State**: Makima always uses whatever Claude Code instance is configured. No cost optimization.
+
+**Improvement**: Classify task complexity and route to appropriate model tier. Simple tasks (documentation updates, config changes) could use cheaper models or cached patterns. Complex tasks (architecture, debugging) get full Opus.
+
+**Implementation Sketch**:
+- Add task complexity classifier based on description keywords and historical data
+- For simple tasks: use Haiku-class model or pre-cached patterns
+- For medium tasks: use Sonnet
+- For complex tasks: use Opus with extended context
+- Track cost per task and optimize routing over time
+
+**Expected Benefit**: 40-60% cost reduction for typical directive workloads.
+
+---
+
+### 5. Hook/Event System for Task Lifecycle
+**Source**: claude-flow 33+ hook system
+**Priority**: Medium
+**Type**: Spike
+
+**Current State**: Makima has limited lifecycle events. Steps go from pending → running → completed/failed.
+
+**Improvement**: Add a hook/event system for extensible task lifecycle management. Events like pre-task, post-task, on-error, on-retry enable plugins for logging, metrics, drift detection, and custom workflows.
+
+**Implementation Sketch**:
+- Define event types: directive.start, step.start, step.complete, step.fail, step.retry, task.spawn, task.complete
+- Add webhook/callback support for each event
+- Enable custom handlers (e.g., Slack notifications, metrics collection, auto-retry policies)
+- Use events to feed the self-learning system
+
+**Expected Benefit**: Extensibility, better observability, foundation for self-learning and drift control.
+
+---
+
+### 6. Swarm Topologies for Complex Directives
+**Source**: claude-flow swarm coordination (hierarchical, mesh, ring, star)
+**Priority**: Medium
+**Type**: Spike
+
+**Current State**: Makima uses a simple supervisor → worker model.
+
+**Improvement**: Support different coordination topologies for different directive types. Large features could use hierarchical (coordinator reviews all outputs). Research tasks could use mesh (agents share findings). Sequential migrations could use ring (output chains).
+
+**Implementation Sketch**:
+- Define topology types in directive configuration
+- Hierarchical: one coordinator task validates all worker outputs before proceeding
+- Mesh: tasks can share intermediate results through shared files/context
+- Ring: strict sequential with output-as-input chaining
+- Auto-select topology based on directive type
+
+**Expected Benefit**: Better coordination for complex multi-task directives. Reduced duplication and conflict.
+
+---
+
+### 7. Self-Learning DAG Optimization
+**Source**: ruvector self-learning DAG execution
+**Priority**: Medium
+**Type**: Spike
+
+**Current State**: Makima DAG execution follows static dependency ordering.
+
+**Improvement**: Learn optimal step ordering and parallelization from execution history. Identify which steps benefit from parallelization vs sequential execution. Automatically adjust DAG scheduling based on learned patterns.
+
+**Implementation Sketch**:
+- Track actual execution times and dependencies between steps
+- Identify critical path and bottleneck steps
+- Learn which steps can safely run in parallel (based on file overlap analysis)
+- Apply MinCut-style optimization to identify steps that could be split for parallelism
+- Suggest DAG modifications during planning based on historical data
+
+**Expected Benefit**: 20-40% faster directive completion through optimized scheduling.
+
+---
+
+### 8. Memory/Context Sharing Between Tasks
+**Source**: claude-flow 3-scope memory, ruvector COW branching
+**Priority**: Medium
+**Type**: Spike
+
+**Current State**: Each makima task operates in isolation with its own worktree. Limited context sharing.
+
+**Improvement**: Implement scoped shared memory for cross-task context sharing. Tasks within the same directive should be able to share findings, decisions, and intermediate results.
+
+**Implementation Sketch**:
+- Add directive-scoped key-value store (the `memory-set`/`memory-get` commands from the task plan)
+- Tasks can read/write to shared directive memory
+- Support structured data (JSON) for machine-readable sharing
+- Add project-scoped memory for cross-directive learnings
+- Consider copy-on-write semantics for large shared contexts
+
+**Expected Benefit**: Better coordination between parallel tasks. Reduced redundant work.
+
+---
+
+### 9. Claims System for Human-Agent Coordination
+**Source**: claude-flow claims system
+**Priority**: Low
+**Type**: Chore
+
+**Current State**: Makima reconcile mode allows human review but lacks formal work ownership tracking.
+
+**Improvement**: Add a claims system where tasks can be claimed by agents or humans. Enables smooth handoff when an agent gets stuck and a human needs to take over (or vice versa).
+
+**Implementation Sketch**:
+- Add claim/release/handoff operations to task lifecycle
+- Track who (agent or human) currently owns each task
+- Allow partial completion with handoff notes
+- Integrate with reconcile mode for approval workflows
+
+**Expected Benefit**: Smoother human-agent collaboration. Clearer responsibility tracking.
+
+---
+
+### 10. Fault Tolerance with Provider Failover
+**Source**: claude-flow 6-provider failover, ruvector Raft consensus
+**Priority**: Low
+**Type**: Spike
+
+**Current State**: Makima depends on a single Claude Code provider. Failures require manual intervention.
+
+**Improvement**: Add retry policies with exponential backoff, provider failover (if multiple API keys available), and graceful degradation.
+
+**Implementation Sketch**:
+- Add configurable retry policies per task type
+- Support multiple API key rotation
+- Implement circuit breaker pattern for persistent failures
+- Auto-reassign failed tasks to fresh Claude Code instances
+- Track failure patterns to avoid repeating known-bad configurations
+
+**Expected Benefit**: Higher reliability. Reduced manual intervention for transient failures.
+
+---
+
+## Priority Matrix
+
+| # | Improvement | Impact | Effort | Priority |
+|---|-------------|--------|--------|----------|
+| 1 | Q-Learning Task Routing | High | High | High |
+| 2 | ReasoningBank (Learn from History) | High | Medium | High |
+| 3 | Drift Control & Checkpoints | High | Medium | High |
+| 4 | Cost-Aware Model Routing | Medium | Medium | Medium |
+| 5 | Hook/Event System | Medium | Medium | Medium |
+| 6 | Swarm Topologies | Medium | High | Medium |
+| 7 | Self-Learning DAG Optimization | Medium | High | Medium |
+| 8 | Memory/Context Sharing | Medium | Medium | Medium |
+| 9 | Claims System | Low | Low | Low |
+| 10 | Fault Tolerance/Failover | Low | Medium | Low |
+
+## Quick Wins (Implement First)
+
+1. **Execution telemetry** - Start recording task metrics now (foundation for everything)
+2. **Directive memory** (`memory-set`/`memory-get`) - Already planned, enables context sharing
+3. **Post-task validation** - Simple alignment check on completed tasks
+4. **Retry policies** - Configurable auto-retry with backoff
+
+## Long-Term Vision
+
+Combine items 1, 2, 7 into a **Self-Improving Orchestration Engine**: makima learns from every directive execution, optimizes task routing and scheduling, and continuously improves planning quality. This creates a competitive moat where the system gets better with every use.
diff --git a/docs/research/ruvector-research.md b/docs/research/ruvector-research.md
new file mode 100644
index 0000000..faf00d0
--- /dev/null
+++ b/docs/research/ruvector-research.md
@@ -0,0 +1,113 @@
+# RuVector Research Summary
+
+> Research conducted 2026-02-24 for makima improvement evaluation
+
+## Overview
+
+RuVector is a high-performance vector database with graph intelligence capabilities, self-learning DAG execution, and distributed consensus. It combines vector search, graph neural networks, and reinforcement learning into a unified platform.
+
+**Repository**: https://github.com/ruvnet/ruvector
+
+## Architecture
+
+### Core Components
+- **HNSW Indexing**: Hierarchical Navigable Small World for approximate nearest neighbor search
+- **GNN Layers**: Graph Neural Networks for query reranking and path optimization
+- **ReasoningBank**: Trajectory learning with verdict judgment
+- **SONA**: Self-Optimizing Neural Architecture for runtime adaptation
+- **Raft Consensus**: Distributed coordination and fault tolerance
+
+### Key Performance
+- 61µs p50 latency for vector queries
+- 2-32x memory reduction via adaptive compression
+- SIMD acceleration (AVX2/AVX-512/NEON)
+- 10-50x burst scaling for traffic spikes
+
+## Graph Database Capabilities
+
+### Cypher Query Support
+- Neo4j-compatible Cypher syntax: `MATCH (a)-[:SIMILAR]->(b)`
+- Hyperedge support connecting 3+ nodes simultaneously
+- Combines relationship-based search with vector similarity
+- Enables "semantic + structured search"
+
+### Graph Intelligence
+- PageRank computation
+- Spectral clustering
+- Community detection
+- Multi-head attention for neighbor importance weighting
+
+## Self-Learning DAG Execution
+
+This is the most relevant capability for makima:
+
+### Architecture
+- **Automatic query optimization** through continuous learning
+- **7 attention mechanisms** dynamically select optimal execution strategies
+- **50-80% latency reduction** over time as patterns are learned
+- **MinCut control** triggers automatic strategy switching above tension thresholds
+
+### Learning Cycle
+1. **Topological analysis** of DAG structure
+2. **Causal cone evaluation** for dependency impact
+3. **Critical path identification** and optimization
+4. **Trajectory learning** via ReasoningBank
+5. **Adaptive routing** based on learned patterns
+
+## Self-Learning Mechanisms
+
+### SONA (Self-Optimizing Neural Architecture)
+- Two-tier LoRA + EWC++ + ReasoningBank
+- <1ms adaptive learning per request
+- 55% quality improvement over baseline
+- Per-request micro-LoRA adaptation (rank 1-2)
+
+### GNN-Based Optimization
+- Multi-head attention weights neighbor importance
+- Updates representations based on graph structure
+- Reinforces frequently-accessed paths
+- "The more you search, the better results get"
+
+### Q-Learning Integration
+- Neural patterns learn optimal routing
+- HNSW memory integration for fast retrieval
+- Dynamic index topology optimization
+
+## Memory Management
+
+### Efficiency Mechanisms
+- **Adaptive tiered compression**: 2-32x memory reduction
+- **SIMD-optimized SpMV**: Sparse matrix-vector multiplication
+- **Arena allocators**: Bounds-check elimination
+- **COW branching**: Cluster-level copy-on-write (1M vectors, ~2.5MB per edit)
+
+### RVF Cognitive Container
+- Git-like branching of vector datasets
+- DNA-style lineage tracking parent/child derivation
+- Cryptographic verification of data integrity
+- Progressive indexing through layered construction
+
+## Fault Tolerance
+
+- **Raft consensus**: Leader election and log replication
+- **Multi-master replication**: Vector clock conflict resolution
+- **Geo-distributed sync**: Cross-region high availability
+- **Snapshot/backup**: Point-in-time recovery
+- **Post-quantum signatures**: ML-DSA-65 and Ed25519 on every segment
+
+## Advanced Features
+
+### 46 Attention Mechanisms
+Categories: standard (dot-product, multi-head, Flash), graph-specific (RoPE, edge-featured, dual-space), efficiency (sparse, cross-attention, neighborhood), specialized (mincut-gated transformer with 50% compute reduction).
+
+### Adaptive Routing
+- "Tiny Dancer" FastGRNN neural inference
+- 90% semantic routing accuracy
+- Hybrid keyword-first + embedding fallback strategy
+
+### Additional Capabilities
+- Local LLM execution (ruvllm with GGUF models)
+- Topological data analysis via persistent homology
+- Post-quantum cryptography
+- eBPF kernel acceleration
+- 5.5KB WASM runtime for browser deployment