summaryrefslogtreecommitdiff
path: root/makima/src/daemon/skills/directive.md
blob: 97e8e20548a1fecf954d1e8a84023b47722587e5 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
---
name: makima-directive
description: Directive orchestration tools for autonomous goal-driven execution. Use when working with directives, chains, steps, verifiers, and approvals.
---

# Directive Orchestration Tools

Directives are top-level goals that drive autonomous execution with configurable guardrails. Each directive generates a chain of steps that spawn contracts with supervisors, verified by programmatic checks and LLM evaluation.

## Architecture

```
Directive (goal + requirements + acceptance criteria)
  |
  +-- Chain (generated DAG execution plan)
  |     +-- Step 1 (pending -> ready -> running -> evaluating -> passed)
  |     |     +-- Contract (spawned when step reaches 'ready')
  |     |           +-- Supervisor Task
  |     +-- Step 2 (depends_on: [Step 1])
  |     +-- Step 3 (depends_on: [Step 1], parallel with Step 2)
  |
  +-- Verifiers (test runner, linter, build, type checker)
  +-- Evaluations (programmatic + LLM composite scores)
  +-- Events (audit stream)
  +-- Approvals (human-in-the-loop gates)
```

## Status Flow

### Directive Status
- `draft` - Created but not started
- `planning` - Generating chain from requirements
- `active` - Executing steps
- `paused` - Temporarily stopped
- `completed` - All steps passed
- `archived` - No longer active
- `failed` - Execution failed

### Step Status
- `pending` - Waiting for dependencies
- `ready` - Dependencies met, ready to start
- `running` - Contract executing
- `evaluating` - Running verifiers
- `passed` - Evaluation succeeded
- `failed` - Evaluation failed, exceeded retries
- `rework` - Sent back for corrections
- `skipped` - Manually skipped
- `blocked` - Blocked by failed dependency

## Autonomy Levels

- `full_auto` - No approval gates, automatic progression
- `guardrails` - Request approval for yellow/red confidence scores
- `manual` - Request approval for all step completions

## Confidence Scoring

Each step evaluation produces a composite confidence score:

1. **Programmatic verifiers** run first (tests, lint, build)
   - Weight: 1.0 each
   - If any required verifier fails: automatic RED

2. **LLM evaluation** runs second
   - Weight: 2.0
   - Evaluates against acceptance criteria

3. **Composite score** computed from weighted average
   - GREEN: >= configured threshold (default 0.8)
   - YELLOW: >= yellow threshold (default 0.5)
   - RED: below yellow threshold

## CLI Commands

```bash
# Create a new directive
makima directive create --goal "Add OAuth2 authentication" --repository https://github.com/org/repo

# List directives
makima directive list [--status active]

# Get directive status with progress
makima directive status <directive-id>

# Start execution (generates chain and begins)
makima directive start <directive-id>

# View chain steps
makima directive steps <directive-id>

# View DAG visualization
makima directive graph <directive-id> --with-status

# View recent events
makima directive events <directive-id> --limit 20

# Approve a pending request
makima directive approve <directive-id> <approval-id> [--response "Looks good"]

# Deny a pending request
makima directive deny <directive-id> <approval-id> [--reason "Need more testing"]

# Lifecycle commands
makima directive pause <directive-id>
makima directive resume <directive-id>
makima directive stop <directive-id>
makima directive archive <directive-id>
```

## API Endpoints

### Directive CRUD
```
POST   /api/v1/directives                    # Create from goal
GET    /api/v1/directives                    # List
GET    /api/v1/directives/:id                # Get with progress
PUT    /api/v1/directives/:id                # Update
DELETE /api/v1/directives/:id                # Archive
```

### Lifecycle
```
POST   /api/v1/directives/:id/start          # Plan + execute
POST   /api/v1/directives/:id/pause          # Pause
POST   /api/v1/directives/:id/resume         # Resume
POST   /api/v1/directives/:id/stop           # Stop
```

### Chain & Steps
```
GET    /api/v1/directives/:id/chain          # Current chain + steps
GET    /api/v1/directives/:id/chain/graph    # DAG for visualization
POST   /api/v1/directives/:id/chain/replan   # Force regeneration
POST   /api/v1/directives/:id/chain/steps    # Add step
PUT    /api/v1/directives/:id/chain/steps/:sid   # Modify step
DELETE /api/v1/directives/:id/chain/steps/:sid   # Remove step
```

### Step Operations
```
GET    /api/v1/directives/:id/steps/:sid     # Step detail
POST   /api/v1/directives/:id/steps/:sid/evaluate  # Force re-evaluation
POST   /api/v1/directives/:id/steps/:sid/skip      # Skip step
POST   /api/v1/directives/:id/steps/:sid/rework    # Manual rework
```

### Monitoring
```
GET    /api/v1/directives/:id/evaluations    # List evaluations
GET    /api/v1/directives/:id/events         # Event log (polling)
GET    /api/v1/directives/:id/events/stream  # Event stream (SSE)
```

### Verifiers
```
GET    /api/v1/directives/:id/verifiers             # List verifiers
POST   /api/v1/directives/:id/verifiers             # Add verifier
PUT    /api/v1/directives/:id/verifiers/:vid        # Update verifier
POST   /api/v1/directives/:id/verifiers/auto-detect # Auto-detect
```

### Approvals
```
GET    /api/v1/directives/:id/approvals              # Pending approvals
POST   /api/v1/directives/:id/approvals/:aid/approve # Approve
POST   /api/v1/directives/:id/approvals/:aid/deny    # Deny
```

## Creating a Directive

### Request
```json
POST /api/v1/directives
{
  "goal": "Implement user authentication with OAuth2",
  "repositoryUrl": "https://github.com/org/repo",
  "autonomyLevel": "guardrails",
  "confidenceThresholdGreen": 0.8,
  "confidenceThresholdYellow": 0.5,
  "maxReworkCycles": 3,
  "maxTotalCostUsd": 100.0,
  "maxWallTimeMinutes": 480
}
```

### Response
```json
{
  "id": "uuid",
  "title": "Implement user authentication with OAuth2",
  "goal": "Implement user authentication with OAuth2",
  "status": "draft",
  "autonomyLevel": "guardrails",
  "createdAt": "2026-02-05T12:00:00Z"
}
```

## Starting a Directive

When you start a directive:
1. System generates requirements from the goal
2. Chain planner creates a DAG of steps
3. Root steps (no dependencies) transition to `ready`
4. Contracts spawn for ready steps with supervisors
5. Verifiers auto-detect from repository

## Evaluation Flow

When a contract completes:

1. Step transitions to `evaluating`
2. **Programmatic verifiers** run (tests, lint, build)
   - Each produces pass/fail + output
3. **LLM evaluation** runs
   - Reviews code against acceptance criteria
   - Provides feedback and score
4. **Composite score** computed
5. Based on confidence level and autonomy:
   - GREEN: Step passes, downstream unblocks
   - YELLOW (guardrails): Request approval
   - RED: Initiate rework or request approval

## Rework Flow

When a step needs rework:

1. Contract phase reset to editing
2. Supervisor receives rework instructions
3. Rework count incremented
4. If max reworks exceeded: escalate or fail

## Event Types

Events are logged for audit and monitoring:

- `directive_created`, `directive_started`, `directive_paused`, `directive_completed`
- `chain_generated`, `chain_regenerated`
- `step_ready`, `step_started`, `step_evaluating`, `step_passed`, `step_failed`
- `rework_initiated`, `rework_completed`
- `approval_requested`, `approval_granted`, `approval_denied`
- `verifier_run`, `evaluation_completed`
- `circuit_breaker_triggered`

## Verifier Configuration

Verifiers can be auto-detected or manually configured:

```json
POST /api/v1/directives/:id/verifiers
{
  "name": "Test Runner",
  "verifierType": "test_runner",
  "command": "npm test",
  "workingDirectory": ".",
  "timeoutSeconds": 300,
  "weight": 1.0,
  "required": true,
  "enabled": true
}
```

### Auto-Detection

The system detects verifiers from:
- `package.json` - npm test, npm run lint, npm run build
- `Cargo.toml` - cargo test, cargo clippy, cargo build
- `pyproject.toml` - pytest, ruff, mypy

## Circuit Breakers

Directives have built-in circuit breakers:

- `maxTotalCostUsd` - Stop if cumulative cost exceeds limit
- `maxWallTimeMinutes` - Stop if elapsed time exceeds limit
- `maxReworkCycles` - Fail step after N rework attempts
- `maxChainRegenerations` - Fail if chain regenerated too many times

## Example Workflow

```bash
# 1. Create a directive
makima directive create \
  --goal "Add dark mode to the application" \
  --repository https://github.com/myorg/myapp \
  --autonomy guardrails

# Returns directive ID: 123e4567-e89b-12d3-a456-426614174000

# 2. Start execution
makima directive start 123e4567-e89b-12d3-a456-426614174000

# 3. Monitor progress
makima directive status 123e4567-e89b-12d3-a456-426614174000

# 4. View the execution graph
makima directive graph 123e4567-e89b-12d3-a456-426614174000 --with-status

# 5. Watch events
makima directive events 123e4567-e89b-12d3-a456-426614174000

# 6. If approval needed, approve or deny
makima directive approve 123e4567-e89b-12d3-a456-426614174000 <approval-id>
```