docs/proposals/feature-plan-deepening.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383

# Feature Proposal: Parallel Plan Deepening

> **Priority:** Medium
> **Complexity:** Low
> **Estimated Effort:** 5-8 days
> **Status:** Proposal
> **Date:** 2026-02-09
> **Dependencies:** [Knowledge Accumulation](feature-knowledge-accumulation.md) (recommended, not required)
> **Related:** [Overview Analysis](compound-engineering-analysis.md) · [Multi-Agent Review](feature-multi-agent-review.md)

---

## Problem Statement

Makima's planning phase currently suffers from **single-pass planning**:

- A supervisor creates a plan based on its immediate analysis of the task
- **No systematic research** is conducted before finalizing the plan
- **Edge cases are discovered during execution**, requiring mid-stream plan changes
- **Best practices are not consulted** — the plan relies solely on the model's training knowledge
- **Existing project learnings** (if the knowledge accumulation feature exists) are not surfaced during planning
- **Revision rate is high** — an estimated ~40% of plans require significant changes after execution begins

The result: plans are shallow, execution discovers problems that planning should have caught, and contracts take longer than necessary.

---

## How Compound Engineering Solves This

The compound engineering plugin's `/deepen-plan` command takes an existing plan and enhances it by spawning **20-40 parallel research agents**:

```
┌──────────────────────────────────────────────────────────────┐
│                      /deepen-plan                             │
│                                                              │
│  Input: Initial plan (from /plan)                            │
│                                                              │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐       │
│  │ Best     │ │ Edge     │ │ Dep.     │ │ Pattern  │       │
│  │ Practice │ │ Case     │ │ Research │ │ Matching │       │
│  │ Agent 1  │ │ Agent 1  │ │ Agent 1  │ │ Agent 1  │       │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘       │
│       │            │            │            │              │
│  ┌────┴─────┐ ┌────┴─────┐ ┌────┴─────┐ ┌────┴─────┐       │
│  │ Best     │ │ Edge     │ │ Security │ │ Existing │       │
│  │ Practice │ │ Case     │ │ Concerns │ │ Learning │       │
│  │ Agent 2  │ │ Agent 2  │ │ Agent    │ │ Agent    │       │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘       │
│       │            │            │            │              │
│  ... (20-40 agents per plan item) ...                        │
│       │            │            │            │              │
│       ▼            ▼            ▼            ▼              │
│  ┌──────────────────────────────────────────────────┐       │
│  │              Synthesis Agent                      │       │
│  │  - Merge research into plan                       │       │
│  │  - Add edge case handling                         │       │
│  │  - Insert best practice notes                     │       │
│  │  - Flag risks and dependencies                    │       │
│  └──────────────────────────────────────────────────┘       │
│                        │                                     │
│                        ▼                                     │
│              Enhanced Plan (Deepened)                         │
│              - Original steps preserved                      │
│              - Edge cases added per step                      │
│              - Best practices annotated                       │
│              - Risks flagged                                  │
│              - Dependencies clarified                         │
└──────────────────────────────────────────────────────────────┘
```

The key insight: **research is embarrassingly parallel**. Each plan item can be researched independently, and each research dimension (best practices, edge cases, security, etc.) is independent.

---

## Proposed Makima Implementation

### 1. New Supervisor Command: `makima supervisor deepen-plan`

```bash
# Deepen the current contract's plan
makima supervisor deepen-plan

# Deepen with specific focus areas
makima supervisor deepen-plan --focus "security,edge-cases,performance"

# Deepen with explicit plan file reference
makima supervisor deepen-plan --plan-file plan.md

# Control parallelism
makima supervisor deepen-plan --max-agents 10

# Include knowledge base search (requires Knowledge Accumulation feature)
makima supervisor deepen-plan --search-learnings
```

### 2. Research Agent Categories

Each plan item is researched along multiple dimensions:

| Agent Category | Purpose | Example Output |
|----------------|---------|----------------|
| **Best Practices** | Industry standards for the technology/pattern | "Use parameterized queries for all DB operations" |
| **Edge Cases** | Boundary conditions and error scenarios | "Handle concurrent modification of shared resource" |
| **Dependency Research** | Compatibility, versions, known issues | "Library X v3 has breaking changes from v2" |
| **Security Concerns** | Security implications of the planned approach | "JWT stored in localStorage is vulnerable to XSS" |
| **Performance Implications** | Performance characteristics and bottlenecks | "N+1 query risk with eager loading disabled" |
| **Pattern Matching** | Similar patterns in the existing codebase | "Module Y already implements this pattern; follow its conventions" |
| **Existing Learnings** | Prior solutions from knowledge base | "Similar issue solved in contract Z; see docs/solutions/..." |

### 3. Deepening Flow

```
┌─────────────┐     ┌──────────────────┐     ┌────────────────┐
│ Original    │     │ Research Phase    │     │ Enhanced Plan  │
│ Plan        │────▶│                  │────▶│                │
│             │     │ Per plan item:    │     │ Original +     │
│ Step 1      │     │ - Best practices │     │ annotations    │
│ Step 2      │     │ - Edge cases     │     │               │
│ Step 3      │     │ - Dependencies   │     │ Step 1         │
│ Step 4      │     │ - Security       │     │  ├ Edge cases  │
│             │     │ - Performance    │     │  ├ Best pracs  │
│             │     │ - Patterns       │     │  └ Risks       │
│             │     │ - Learnings      │     │ Step 2         │
│             │     │                  │     │  ├ Edge cases  │
│             │     │ All in parallel  │     │  └ ...         │
└─────────────┘     └──────────────────┘     └────────────────┘
```

**Implementation using existing infrastructure:**

```bash
# Step 1: Parse plan into items
plan_items=$(makima supervisor get-plan-items)

# Step 2: For each item, spawn research agents as a group
for item in $plan_items; do
  makima supervisor spawn-group "deepen-${item.id}" \
    --tasks "[
      {\"name\": \"best-practices\", \"plan\": \"Research best practices for: ${item.description}\"},
      {\"name\": \"edge-cases\", \"plan\": \"Identify edge cases for: ${item.description}\"},
      {\"name\": \"security\", \"plan\": \"Analyze security implications of: ${item.description}\"},
      {\"name\": \"performance\", \"plan\": \"Assess performance implications of: ${item.description}\"}
    ]" \
    --share-worktree \
    --read-only
done

# Step 3: Wait for all groups
makima supervisor wait-group "deepen-*" --timeout 300

# Step 4: Synthesize results into enhanced plan
makima supervisor synthesize-plan
```

### 4. Enhanced Plan Format

The deepened plan augments each step with structured annotations:

```markdown
## Step 3: Implement JWT Authentication

### Original Plan
Add JWT-based authentication middleware to the API gateway.
Generate tokens on login, validate on each request.

### Research Findings

#### Best Practices
- Use RS256 (asymmetric) for microservices, HS256 for monoliths
- Set short access token TTL (15 min) with refresh token rotation
- Include only essential claims (sub, exp, iat, roles)
- Never store sensitive data in JWT payload (it's base64, not encrypted)

#### Edge Cases
- Token expiry during long-running requests
- Clock skew between services (use ±30s leeway)
- Concurrent refresh token rotation (race condition)
- Token size exceeding header limits (>8KB with many claims)

#### Security Concerns
- **P2**: JWT in localStorage is XSS-vulnerable; prefer httpOnly cookies
- **P3**: Missing CSRF protection if using cookies
- **P2**: No token revocation mechanism for compromised tokens

#### Performance Notes
- JWT validation is CPU-bound (RS256 ~1ms per validation)
- Consider caching decoded tokens for repeated validation
- Refresh token DB lookup adds latency (~5ms)

#### Existing Learnings
- See: docs/solutions/security-practices/jwt-refresh-token-rotation.md
- Previous contract "Auth Service Refactor" used similar pattern

### Risks
- [ ] Clock skew handling not in original plan
- [ ] Token revocation strategy needed
- [ ] CSRF protection if using cookie storage
```

### 5. Integration with Knowledge Base

When the Knowledge Accumulation feature is available, `deepen-plan` automatically includes a **learning search agent** for each plan item:

```
Research Agent: "Search existing learnings relevant to JWT authentication"

Results:
- docs/solutions/security-practices/jwt-refresh-token-rotation.md (relevance: 0.92)
- docs/solutions/api-patterns/authentication-middleware-pattern.md (relevance: 0.78)
- docs/solutions/debugging-techniques/token-expiry-debugging.md (relevance: 0.65)
```

These results are included in the deepened plan with direct links.

---

## Integration with Existing Makima Features

### Contract Phases

Plan deepening occurs during the **Plan phase**, between initial plan creation and phase transition to Execute:

```
Plan Phase Timeline:
  1. Supervisor creates initial plan
  2. makima supervisor deepen-plan    ← NEW
  3. User reviews deepened plan
  4. makima supervisor advance-phase execute
```

### Supervisor/Worker Hierarchy

Research agents are spawned as **worker tasks** under the supervisor. Uses the existing `spawn-task` infrastructure with the proposed `spawn-group`/`wait-group` from the [Multi-Agent Review](feature-multi-agent-review.md) proposal.

### Contract Files

The deepened plan replaces or augments the plan document as a contract file:

```rust
File {
    contract_id: contract.id,
    contract_phase: "plan",
    name: "Implementation Plan (Deepened)",
    body: vec![
        // Enhanced plan content with annotations
    ],
}
```

### Directive System

For directive-based workflows, plan deepening can be added as a step:

```rust
DirectiveStep {
    name: "deepen-plan",
    description: "Enhance implementation plan with parallel research",
    depends_on: [initial_plan_step_id],
    task_plan: "Run deepen-plan on the initial plan...",
}
```

### Phase Guards

If `phase_guard` is enabled, the user reviews the deepened plan before approving transition to execute. This is the natural checkpoint for plan quality.

---

## Implementation Plan

### Phase 1: Core Command (2-3 days)

| Task | Effort | Description |
|------|--------|-------------|
| `deepen-plan` command | 1 day | Parse plan, spawn research groups |
| Research agent templates | 0.5 days | Default prompts for each category |
| Synthesis logic | 1 day | Merge research into annotated plan |
| Plan file update | 0.5 days | Write deepened plan as contract file |

### Phase 2: Knowledge Integration (1-2 days)

| Task | Effort | Description |
|------|--------|-------------|
| Learning search agent | 0.5 days | Search knowledge base per plan item |
| Result integration | 0.5 days | Include learning links in plan |
| Fallback when no KB | 0.5 days | Graceful degradation without KB |

### Phase 3: Configuration & Polish (2-3 days)

| Task | Effort | Description |
|------|--------|-------------|
| Config file support | 0.5 days | `.makima/deepen.yaml` |
| Focus area filtering | 0.5 days | `--focus` flag implementation |
| Concurrency control | 0.5 days | `--max-agents` limit |
| Documentation | 0.5 days | User guide |
| Tests | 1 day | Unit + integration |

---

## Configuration Examples

### Repository-Level Configuration

```yaml
# .makima/deepen.yaml
version: 1
deepen:
  # Auto-deepen when plan is created
  auto_trigger: false

  # Maximum agents per plan item
  max_agents_per_item: 5

  # Total maximum concurrent agents
  max_concurrent: 20

  # Timeout per research agent (seconds)
  agent_timeout: 120

  # Research dimensions to include
  dimensions:
    - best-practices
    - edge-cases
    - security
    - performance
    - dependencies
    - patterns
    - learnings          # Requires Knowledge Accumulation

  # Minimum plan items to trigger deepening
  min_plan_items: 3

  # Search learnings (requires Knowledge Accumulation)
  search_learnings: true
  search_min_relevance: 0.5
```

### Inline Usage

```bash
# Quick deepen with defaults
makima supervisor deepen-plan

# Focused deepen for security-sensitive work
makima supervisor deepen-plan --focus security,edge-cases

# Deepen with more agents for complex plans
makima supervisor deepen-plan --max-agents 30

# Deepen without knowledge base search
makima supervisor deepen-plan --no-learnings
```

---

## Open Questions

1. **Plan format parsing**: How should the system parse existing plans to identify discrete items? Markdown headers? Numbered lists? YAML structure?
2. **Research depth vs. cost**: 20-40 agents per deepening is expensive. Should there be a "lite" mode with fewer agents?
3. **Deepening multiple times**: Can a plan be deepened iteratively? Should subsequent deepenings build on previous research?
4. **User-provided context**: Should users be able to provide additional context (e.g., "this project uses PostgreSQL, not MySQL") to guide research?
5. **Codebase analysis**: Should research agents analyze the existing codebase to find relevant patterns, or only reason from general knowledge?
6. **Conflicting research**: When research agents disagree (e.g., one says "use Redis" and another says "avoid Redis"), how should the synthesis handle it?

---

## Alternatives Considered

| Alternative | Pros | Cons | Decision |
|-------------|------|------|----------|
| Sequential research (one agent) | Simple, cheaper | Slow; misses multi-perspective insights | Rejected — parallel is core value |
| Automatic deepening (always on) | No manual step | Adds latency to every plan; unnecessary for simple tasks | Optional auto-trigger |
| Web search integration | Real-time information | Inconsistent quality; potential hallucination from web results | Deferred — consider for v2 |
| User-provided research questions | Targeted research | Requires user to know what to ask | Complement — support alongside auto-research |
| LLM-only research (no task spawning) | Simpler, no infrastructure | Limited by single context window; no parallelism | Rejected — defeats the purpose |

---

## Priority & Complexity Assessment

- **Priority: MEDIUM** — Plan deepening significantly improves plan quality, but it's enhancement over an already-functional planning workflow. The compound engineering plugin's data shows ~40% plan revision reduction.
- **Complexity: LOW** — This feature is largely a composition of existing primitives (task spawning, group waiting, plan file updates). The main new work is research agent prompts and synthesis logic.
- **Risk: LOW** — Worst case is slightly better plans. No system changes required. Can be adopted incrementally.