diff options
| author | soryu <soryu@soryu.co> | 2026-01-31 22:53:28 +0000 |
|---|---|---|
| committer | soryu <soryu@soryu.co> | 2026-01-31 22:54:50 +0000 |
| commit | 44bb3fe07ab191abd8260af6975bc175c223878e (patch) | |
| tree | 1d7dd73756345f3671af32cc84b9b4235d34d173 /docs | |
| parent | a6e36a8bfecb9ebe6c7b135b9e01557f7ebc3e58 (diff) | |
| download | soryu-44bb3fe07ab191abd8260af6975bc175c223878e.tar.gz soryu-44bb3fe07ab191abd8260af6975bc175c223878e.zip | |
feat: Add contract management system improvements (Phase 1)makima/contract-management-improvements
- Add docs/contract-management-spec.md with full system design
- Add docs/plans/implementation-plan.md with 5-phase rollout plan
- Add validate_deliverable() function and use in mark_deliverable_complete
- Add PhaseChangeResult enum and change_contract_phase_with_version() with FOR UPDATE locking
- Enforce phase_guard at API level for all callers
This addresses critical issues in contract management:
- Deliverable validation to prevent marking non-existent deliverables complete
- Version conflict detection for phase changes with row locking
- Phase guard enforcement at API level (applies to all callers including supervisors)
- Comprehensive specification and implementation plan for future phases
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/contract-management-spec.md | 1337 | ||||
| -rw-r--r-- | docs/plans/implementation-plan.md | 1226 |
2 files changed, 2563 insertions, 0 deletions
diff --git a/docs/contract-management-spec.md b/docs/contract-management-spec.md new file mode 100644 index 0000000..c7f948a --- /dev/null +++ b/docs/contract-management-spec.md @@ -0,0 +1,1337 @@ +# Contract Management System Specification + +**Version**: 1.0.0 +**Status**: Draft +**Author**: AI Assistant +**Date**: 2025-01-31 + +## Executive Summary + +This specification addresses critical issues in the current contract management system: + +1. **Manual Completion Required** - Contracts stay 'active' indefinitely +2. **No Phase Readiness Validation** - No automatic checking before phase advancement +3. **Supervisor State Restoration Broken** - Context lost after daemon crash +4. **Version Conflicts Silent** - Phase changes can fail silently +5. **No Deliverable Validation** - Can mark non-existent deliverables as complete +6. **Phase Guard Supervisor Bypass** - Supervisors can bypass phase_guard setting + +--- + +## 1. Contract Lifecycle State Machine + +### 1.1 Current State (ContractStatus) + +The current implementation uses three states: +- `active` - Contract is being worked on +- `completed` - Contract finished successfully +- `archived` - Contract archived (soft delete) + +### 1.2 Proposed State Machine + +``` + ┌─────────────────────────────────────────────┐ + │ │ + ▼ │ +┌────────┐ ┌─────────┐ ┌──────────────────┐ ┌────────────┴───┐ +│ created ├───►│ active ├───►│ waiting_for_input├───►│ completing │ +└────────┘ └────┬────┘ └────────┬─────────┘ └───────┬────────┘ + │ │ │ + │ │ ▼ + │ │ ┌───────────────┐ + │ │ │ completed │ + │ │ └───────────────┘ + │ │ │ + ▼ ▼ ▼ + ┌─────────┐ ┌──────────┐ ┌───────────────┐ + │ paused │ │ blocked │ │ archived │ + └────┬────┘ └────┬─────┘ └───────────────┘ + │ │ ▲ + └────────────────┴───────────────────────┤ + │ + ┌────────┐ │ + │ failed ├───────────────────────────────┘ + └────────┘ +``` + +### 1.3 State Definitions + +```rust +/// Contract lifecycle states +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum ContractStatus { + /// Contract created but not yet started + Created, + /// Contract is actively being worked on + Active, + /// Waiting for user input (phase confirmation, question, etc.) + WaitingForInput, + /// Contract is paused by user request + Paused, + /// Contract is blocked on external dependency + Blocked, + /// All phases complete, running final validation + Completing, + /// Contract completed successfully + Completed, + /// Contract failed with errors + Failed, + /// Contract archived (soft delete) + Archived, +} +``` + +### 1.4 State Transitions + +| From State | To State | Guard Conditions | Trigger | +|------------|----------|------------------|---------| +| Created | Active | Has supervisor task | Supervisor starts | +| Active | WaitingForInput | Pending question exists | Supervisor asks question | +| Active | Paused | - | User requests pause | +| Active | Blocked | Has external blocker | Blocker detected | +| Active | Completing | Final phase, all deliverables met | Auto-completion check | +| WaitingForInput | Active | Question answered | User responds | +| WaitingForInput | Paused | - | Timeout or user pause | +| Paused | Active | - | User resumes | +| Blocked | Active | Blocker resolved | Blocker cleared | +| Completing | Completed | All cleanup done | Completion confirmed | +| Completing | Active | Completion rejected | User rejects | +| Any | Failed | Unrecoverable error | Error detected | +| Completed | Archived | - | User archives | +| Failed | Archived | - | User archives | + +### 1.5 Timeout and Stale Detection + +```rust +/// Configuration for contract timeout and stale detection +pub struct ContractTimeoutConfig { + /// Time after last supervisor activity before contract is considered stale + pub stale_threshold: Duration, // Default: 30 minutes + + /// Time to wait for user input before timing out + pub input_timeout: Duration, // Default: 24 hours + + /// Time before completing contracts are auto-completed + pub completion_grace_period: Duration, // Default: 5 minutes + + /// Time before archived contracts are deleted + pub archive_retention: Duration, // Default: 30 days +} + +impl Default for ContractTimeoutConfig { + fn default() -> Self { + Self { + stale_threshold: Duration::from_secs(30 * 60), + input_timeout: Duration::from_secs(24 * 60 * 60), + completion_grace_period: Duration::from_secs(5 * 60), + archive_retention: Duration::from_secs(30 * 24 * 60 * 60), + } + } +} +``` + +### 1.6 Database Schema Changes + +```sql +-- Add new status column options and tracking fields +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS + status_changed_at TIMESTAMPTZ DEFAULT NOW(); + +-- Track last activity for stale detection +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS + last_activity_at TIMESTAMPTZ DEFAULT NOW(); + +-- Track pending input +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS + waiting_for TEXT; -- 'question', 'phase_confirmation', 'completion_confirmation' + +-- Track blockers +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS + blocked_reason TEXT; + +-- Track failure reason +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS + failure_reason TEXT; + +-- Index for status queries +CREATE INDEX idx_contracts_status ON contracts(status); +CREATE INDEX idx_contracts_last_activity ON contracts(last_activity_at); +``` + +### 1.7 API Changes + +```rust +/// Request to change contract state +#[derive(Debug, Deserialize)] +#[serde(rename_all = "camelCase")] +pub struct ChangeStatusRequest { + pub target_status: ContractStatus, + /// Required for some transitions + pub reason: Option<String>, + /// For blocking states, what is blocking + pub blocker: Option<String>, +} + +/// Response for status change +#[derive(Debug, Serialize)] +#[serde(rename_all = "camelCase")] +pub struct ChangeStatusResponse { + pub success: bool, + pub previous_status: ContractStatus, + pub new_status: ContractStatus, + /// If transition failed, why + pub rejection_reason: Option<String>, +} +``` + +--- + +## 2. Automatic Completion Detection + +### 2.1 Current Problem + +Contracts currently require manual `supervisor_complete()` calls. Supervisors may exit without completing contracts, leaving them active indefinitely. + +### 2.2 Proposed Solution: Completion Gates + +#### 2.2.1 Phase Completion Gates + +```rust +/// Gate that must be satisfied before advancing to next phase +pub struct PhaseCompletionGate { + /// Required deliverables for this phase + pub required_deliverables: Vec<String>, + /// Required tasks to be completed + pub required_tasks: TaskRequirement, + /// Optional custom validation function + pub custom_validator: Option<Box<dyn Fn(&Contract) -> bool>>, + /// Whether to auto-advance when gate is satisfied + pub auto_advance: bool, +} + +/// Task completion requirements +pub enum TaskRequirement { + /// No task requirements + None, + /// All spawned tasks must complete + AllComplete, + /// At least N tasks must complete + MinComplete(usize), + /// Specific named tasks must complete + NamedTasks(Vec<String>), +} +``` + +#### 2.2.2 Contract Completion Detection + +```rust +/// Contract completion detector +pub struct CompletionDetector { + /// Phase-specific gates + phase_gates: HashMap<String, PhaseCompletionGate>, +} + +impl CompletionDetector { + /// Check if current phase is ready to advance + pub fn check_phase_readiness( + &self, + contract: &Contract, + tasks: &[TaskSummary], + ) -> PhaseReadinessResult { + let gate = match self.phase_gates.get(&contract.phase) { + Some(g) => g, + None => return PhaseReadinessResult::NoGate, + }; + + let mut missing = Vec::new(); + + // Check deliverables + let completed = contract.get_completed_deliverables(&contract.phase); + for req in &gate.required_deliverables { + if !completed.contains(req) { + missing.push(format!("Deliverable: {}", req)); + } + } + + // Check tasks + match &gate.required_tasks { + TaskRequirement::None => {}, + TaskRequirement::AllComplete => { + let incomplete = tasks.iter() + .filter(|t| !t.is_supervisor && t.status != "done") + .count(); + if incomplete > 0 { + missing.push(format!("{} tasks incomplete", incomplete)); + } + }, + TaskRequirement::MinComplete(n) => { + let complete = tasks.iter() + .filter(|t| !t.is_supervisor && t.status == "done") + .count(); + if complete < *n { + missing.push(format!("Need {} tasks complete, have {}", n, complete)); + } + }, + TaskRequirement::NamedTasks(names) => { + for name in names { + let found = tasks.iter() + .find(|t| &t.name == name && t.status == "done"); + if found.is_none() { + missing.push(format!("Task '{}' not complete", name)); + } + } + } + } + + if missing.is_empty() { + PhaseReadinessResult::Ready + } else { + PhaseReadinessResult::NotReady { missing } + } + } + + /// Check if contract should auto-complete + pub fn check_contract_completion(&self, contract: &Contract) -> bool { + // Must be in terminal phase + if contract.phase != contract.terminal_phase_id() { + return false; + } + + // Terminal phase gate must be satisfied + matches!( + self.check_phase_readiness(contract, &[]), + PhaseReadinessResult::Ready + ) + } +} + +/// Result of phase readiness check +pub enum PhaseReadinessResult { + /// Phase is ready to advance + Ready, + /// Phase is not ready, with list of missing items + NotReady { missing: Vec<String> }, + /// No gate defined for this phase + NoGate, +} +``` + +#### 2.2.3 Auto-Completion Flow + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ AUTO-COMPLETION FLOW │ +└─────────────────────────────────────────────────────────────────────┘ + +1. Task completes (status = "done") + │ + ▼ +2. Check if any phase gate is now satisfied + │ + ├─ NO ──► Return, wait for more tasks + │ + ▼ YES +3. Is auto_advance enabled for phase? + │ + ├─ NO ──► Notify user, wait for manual advance + │ + ▼ YES +4. Is phase_guard enabled? + │ + ├─ YES ─► Set status = WaitingForInput, ask for confirmation + │ + ▼ NO +5. Auto-advance to next phase + │ + ▼ +6. Is this the terminal phase? + │ + ├─ NO ──► Continue working + │ + ▼ YES +7. All terminal deliverables complete? + │ + ├─ NO ──► Continue working + │ + ▼ YES +8. Set status = Completing + │ + ▼ +9. Cleanup worktrees, stop supervisor + │ + ▼ +10. Set status = Completed +``` + +### 2.3 Database Schema Changes + +```sql +-- Track auto-completion state +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS + auto_complete_enabled BOOLEAN DEFAULT TRUE; + +-- Track when completion was detected +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS + completion_detected_at TIMESTAMPTZ; +``` + +### 2.4 API Endpoints + +```rust +/// Check phase readiness +/// GET /api/v1/contracts/{id}/phase-readiness +pub async fn check_phase_readiness( + contract_id: Uuid, +) -> PhaseReadinessResponse; + +/// Force completion check +/// POST /api/v1/contracts/{id}/check-completion +pub async fn check_completion( + contract_id: Uuid, +) -> CompletionCheckResponse; + +/// Enable/disable auto-completion +/// PUT /api/v1/contracts/{id}/auto-complete +pub async fn set_auto_complete( + contract_id: Uuid, + enabled: bool, +) -> ContractSummary; +``` + +--- + +## 3. Supervisor Status Reporting + +### 3.1 Current Problem + +The `supervisor_states` table exists but: +- State is not reliably persisted during daemon operations +- Restoration after crash doesn't properly resume context +- No clear indication of supervisor's current activity + +### 3.2 Proposed Supervisor States + +```rust +/// Supervisor execution states +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum SupervisorState { + /// Supervisor is starting up + Initializing, + /// Supervisor is idle, no pending work + Idle, + /// Supervisor is actively working (LLM processing) + Working, + /// Supervisor is waiting for user input + WaitingForUser, + /// Supervisor is waiting for child tasks + WaitingForTasks, + /// Supervisor is blocked on external resource + Blocked, + /// Supervisor has completed its work + Completed, + /// Supervisor has failed + Failed, + /// Supervisor was interrupted + Interrupted, +} +``` + +### 3.3 Heartbeat Mechanism + +```rust +/// Heartbeat message from supervisor to server +#[derive(Debug, Serialize, Deserialize)] +pub struct SupervisorHeartbeat { + pub task_id: Uuid, + pub contract_id: Uuid, + pub state: SupervisorState, + pub phase: String, + /// What the supervisor is currently doing + pub current_activity: String, + /// Progress percentage (0-100) + pub progress: u8, + /// IDs of tasks supervisor is waiting on + pub pending_task_ids: Vec<Uuid>, + /// Timestamp + pub timestamp: DateTime<Utc>, +} + +/// Heartbeat configuration +pub struct HeartbeatConfig { + /// How often to send heartbeats + pub interval: Duration, // Default: 30 seconds + /// How long before a supervisor is considered dead + pub timeout: Duration, // Default: 2 minutes +} +``` + +### 3.4 State Persistence + +```rust +/// Enhanced supervisor state for persistence +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SupervisorPersistentState { + /// Current supervisor state + pub state: SupervisorState, + /// Current contract phase + pub phase: String, + /// Conversation history for resumption + pub conversation_history: Vec<ConversationMessage>, + /// Currently pending questions + pub pending_questions: Vec<PendingQuestion>, + /// Tasks spawned by this supervisor + pub spawned_task_ids: Vec<Uuid>, + /// Tasks we're waiting on + pub waiting_on_task_ids: Vec<Uuid>, + /// Last checkpoint created + pub last_checkpoint: Option<CheckpointInfo>, + /// Current activity description + pub current_activity: String, + /// Error if in failed state + pub error: Option<String>, + /// Timestamps + pub created_at: DateTime<Utc>, + pub updated_at: DateTime<Utc>, +} +``` + +### 3.5 Database Schema Changes + +```sql +-- Enhance supervisor_states table +ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS + state VARCHAR(50) NOT NULL DEFAULT 'initializing'; + +ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS + current_activity TEXT; + +ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS + progress INTEGER DEFAULT 0; + +ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS + error_message TEXT; + +ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS + spawned_task_ids UUID[] DEFAULT ARRAY[]::UUID[]; + +-- Create heartbeat tracking table +CREATE TABLE IF NOT EXISTS supervisor_heartbeats ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + supervisor_task_id UUID NOT NULL REFERENCES tasks(id) ON DELETE CASCADE, + contract_id UUID NOT NULL REFERENCES contracts(id) ON DELETE CASCADE, + state VARCHAR(50) NOT NULL, + phase VARCHAR(50) NOT NULL, + current_activity TEXT, + progress INTEGER DEFAULT 0, + pending_task_ids UUID[] DEFAULT ARRAY[]::UUID[], + timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(), + + -- Keep only recent heartbeats + CONSTRAINT heartbeat_ttl CHECK (timestamp > NOW() - INTERVAL '24 hours') +); + +CREATE INDEX idx_heartbeats_supervisor ON supervisor_heartbeats(supervisor_task_id); +CREATE INDEX idx_heartbeats_timestamp ON supervisor_heartbeats(timestamp); +``` + +### 3.6 Restoration Protocol + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ SUPERVISOR RESTORATION PROTOCOL │ +└─────────────────────────────────────────────────────────────────────┘ + +1. Daemon restarts or task is assigned to new daemon + │ + ▼ +2. Load supervisor state from supervisor_states table + │ + ├─ NOT FOUND ──► Start fresh, log warning + │ + ▼ FOUND +3. Validate state consistency + │ + ├─ INVALID ──► Start from last checkpoint + │ + ▼ VALID +4. Restore conversation history + │ + ▼ +5. Check for pending questions + │ + ├─ HAS PENDING ──► Re-deliver questions to user + │ + ▼ NO PENDING +6. Check for waiting tasks + │ + ├─ HAS WAITING ──► Resume waiting state + │ + ▼ NO WAITING +7. Send restoration context to Claude + │ + ▼ +8. Resume execution from last state +``` + +### 3.7 API Endpoints + +```rust +/// Get supervisor status +/// GET /api/v1/contracts/{id}/supervisor/status +pub async fn get_supervisor_status( + contract_id: Uuid, +) -> SupervisorStatusResponse; + +/// Get supervisor heartbeat history +/// GET /api/v1/contracts/{id}/supervisor/heartbeats +pub async fn get_heartbeats( + contract_id: Uuid, + limit: Option<i32>, +) -> HeartbeatListResponse; + +/// Force supervisor state sync +/// POST /api/v1/contracts/{id}/supervisor/sync +pub async fn sync_supervisor_state( + contract_id: Uuid, +) -> SyncResponse; +``` + +--- + +## 4. Contract Monitoring Dashboard + +### 4.1 Real-Time Status Updates + +#### 4.1.1 WebSocket Events + +```rust +/// Contract monitoring events +#[derive(Debug, Serialize)] +#[serde(tag = "type", rename_all = "snake_case")] +pub enum ContractMonitorEvent { + /// Contract status changed + StatusChanged { + contract_id: Uuid, + old_status: ContractStatus, + new_status: ContractStatus, + reason: Option<String>, + }, + /// Phase changed + PhaseChanged { + contract_id: Uuid, + old_phase: String, + new_phase: String, + }, + /// Supervisor state changed + SupervisorStateChanged { + contract_id: Uuid, + supervisor_task_id: Uuid, + old_state: SupervisorState, + new_state: SupervisorState, + }, + /// Supervisor heartbeat received + Heartbeat { + contract_id: Uuid, + state: SupervisorState, + activity: String, + progress: u8, + }, + /// Contract became stale + StaleDetected { + contract_id: Uuid, + last_activity: DateTime<Utc>, + stale_duration: Duration, + }, + /// Task completed + TaskCompleted { + contract_id: Uuid, + task_id: Uuid, + task_name: String, + success: bool, + }, + /// Deliverable marked complete + DeliverableCompleted { + contract_id: Uuid, + phase: String, + deliverable_id: String, + }, + /// Question asked (needs user attention) + QuestionAsked { + contract_id: Uuid, + question_id: Uuid, + question: String, + question_type: String, + }, +} +``` + +#### 4.1.2 Subscription API + +```rust +/// Subscribe to contract monitoring events +/// WS /api/v1/contracts/monitor +pub async fn monitor_contracts( + ws: WebSocket, + filter: ContractMonitorFilter, +) -> Result<(), Error>; + +#[derive(Debug, Deserialize)] +pub struct ContractMonitorFilter { + /// Filter by specific contract IDs + pub contract_ids: Option<Vec<Uuid>>, + /// Filter by status + pub statuses: Option<Vec<ContractStatus>>, + /// Include stale detection events + pub include_stale: bool, + /// Include heartbeat events + pub include_heartbeats: bool, +} +``` + +### 4.2 Stale Contract Detection + +```rust +/// Stale contract detector service +pub struct StaleContractDetector { + pool: PgPool, + config: ContractTimeoutConfig, +} + +impl StaleContractDetector { + /// Run stale detection loop + pub async fn run(&self, event_tx: Sender<ContractMonitorEvent>) { + let mut interval = tokio::time::interval(Duration::from_secs(60)); + + loop { + interval.tick().await; + + let stale = self.detect_stale_contracts().await; + for (contract_id, last_activity) in stale { + let _ = event_tx.send(ContractMonitorEvent::StaleDetected { + contract_id, + last_activity, + stale_duration: Utc::now() - last_activity, + }).await; + } + } + } + + /// Detect stale contracts + async fn detect_stale_contracts(&self) -> Vec<(Uuid, DateTime<Utc>)> { + let threshold = Utc::now() - self.config.stale_threshold; + + sqlx::query_as::<_, (Uuid, DateTime<Utc>)>( + r#" + SELECT id, last_activity_at + FROM contracts + WHERE status = 'active' + AND last_activity_at < $1 + "# + ) + .bind(threshold) + .fetch_all(&self.pool) + .await + .unwrap_or_default() + } +} +``` + +### 4.3 Batch Operations + +```rust +/// Batch operation types +#[derive(Debug, Deserialize)] +#[serde(tag = "operation", rename_all = "snake_case")] +pub enum BatchOperation { + /// Archive completed contracts older than threshold + ArchiveOld { + older_than: Duration, + status_filter: Vec<ContractStatus>, + }, + /// Pause all active contracts + PauseAll { + reason: String, + }, + /// Resume all paused contracts + ResumeAll, + /// Delete archived contracts older than threshold + CleanupArchived { + older_than: Duration, + }, + /// Restart stale supervisors + RestartStale { + stale_threshold: Duration, + }, +} + +/// Batch operation result +#[derive(Debug, Serialize)] +pub struct BatchOperationResult { + pub operation: String, + pub affected_count: usize, + pub affected_ids: Vec<Uuid>, + pub errors: Vec<BatchOperationError>, +} + +#[derive(Debug, Serialize)] +pub struct BatchOperationError { + pub contract_id: Uuid, + pub error: String, +} +``` + +### 4.4 Dashboard API + +```rust +/// Get dashboard summary +/// GET /api/v1/contracts/dashboard +pub async fn get_dashboard() -> DashboardResponse; + +#[derive(Debug, Serialize)] +#[serde(rename_all = "camelCase")] +pub struct DashboardResponse { + /// Count by status + pub status_counts: HashMap<ContractStatus, usize>, + /// Count by phase (for active contracts) + pub phase_counts: HashMap<String, usize>, + /// Stale contracts + pub stale_contracts: Vec<StaleContractInfo>, + /// Contracts waiting for input + pub waiting_for_input: Vec<WaitingContractInfo>, + /// Recent activity + pub recent_events: Vec<ContractMonitorEvent>, + /// Resource usage + pub resource_usage: ResourceUsage, +} + +#[derive(Debug, Serialize)] +#[serde(rename_all = "camelCase")] +pub struct StaleContractInfo { + pub id: Uuid, + pub name: String, + pub phase: String, + pub last_activity: DateTime<Utc>, + pub stale_duration_secs: i64, +} + +#[derive(Debug, Serialize)] +#[serde(rename_all = "camelCase")] +pub struct WaitingContractInfo { + pub id: Uuid, + pub name: String, + pub waiting_for: String, // 'question', 'phase_confirmation', etc. + pub waiting_since: DateTime<Utc>, + pub question: Option<String>, +} + +#[derive(Debug, Serialize)] +#[serde(rename_all = "camelCase")] +pub struct ResourceUsage { + pub active_supervisors: usize, + pub running_tasks: usize, + pub pending_tasks: usize, + pub active_daemons: usize, + pub total_worktrees: usize, +} +``` + +--- + +## 5. Improved CLI Commands + +### 5.1 Contract Listing with Filters + +```bash +# List all contracts +makima contracts list + +# List with status filter +makima contracts list --status active +makima contracts list --status completed,failed + +# List stale contracts +makima contracts list --stale +makima contracts list --stale --threshold 30m + +# List contracts waiting for input +makima contracts list --waiting + +# List by phase +makima contracts list --phase execute + +# Combine filters +makima contracts list --status active --phase plan --stale + +# Output formats +makima contracts list --format json +makima contracts list --format table +makima contracts list --format compact +``` + +#### Implementation + +```rust +#[derive(Debug, Args)] +pub struct ListContractsArgs { + /// Filter by status (comma-separated) + #[arg(long)] + pub status: Option<String>, + + /// Show only stale contracts + #[arg(long)] + pub stale: bool, + + /// Stale threshold (e.g., "30m", "1h") + #[arg(long, default_value = "30m")] + pub threshold: String, + + /// Show contracts waiting for input + #[arg(long)] + pub waiting: bool, + + /// Filter by phase + #[arg(long)] + pub phase: Option<String>, + + /// Output format + #[arg(long, default_value = "table")] + pub format: OutputFormat, + + /// Limit results + #[arg(long, short = 'n')] + pub limit: Option<usize>, +} +``` + +### 5.2 Cleanup Command + +```bash +# Archive completed contracts older than 7 days +makima contracts cleanup --archive --older-than 7d + +# Delete archived contracts older than 30 days +makima contracts cleanup --delete-archived --older-than 30d + +# Dry run (show what would be affected) +makima contracts cleanup --archive --older-than 7d --dry-run + +# Force cleanup without confirmation +makima contracts cleanup --archive --older-than 7d --force + +# Cleanup stale worktrees +makima contracts cleanup --worktrees + +# Full cleanup: archive old, delete archived, clean worktrees +makima contracts cleanup --all --older-than 7d +``` + +#### Implementation + +```rust +#[derive(Debug, Args)] +pub struct CleanupContractsArgs { + /// Archive completed/failed contracts + #[arg(long)] + pub archive: bool, + + /// Delete archived contracts + #[arg(long)] + pub delete_archived: bool, + + /// Clean up orphaned worktrees + #[arg(long)] + pub worktrees: bool, + + /// Run all cleanup operations + #[arg(long)] + pub all: bool, + + /// Threshold for cleanup (e.g., "7d", "30d") + #[arg(long, default_value = "7d")] + pub older_than: String, + + /// Dry run - show what would be affected + #[arg(long)] + pub dry_run: bool, + + /// Skip confirmation prompts + #[arg(long)] + pub force: bool, +} +``` + +### 5.3 Monitor Command + +```bash +# Real-time monitoring dashboard +makima contracts monitor + +# Monitor specific contracts +makima contracts monitor <contract-id> <contract-id> + +# Monitor with filters +makima contracts monitor --status active +makima contracts monitor --stale + +# Quiet mode - only show important events +makima contracts monitor --quiet + +# JSON output for scripting +makima contracts monitor --format json +``` + +#### Implementation + +```rust +#[derive(Debug, Args)] +pub struct MonitorContractsArgs { + /// Contract IDs to monitor (empty = all) + pub contract_ids: Vec<Uuid>, + + /// Filter by status + #[arg(long)] + pub status: Option<String>, + + /// Only show stale contracts + #[arg(long)] + pub stale: bool, + + /// Quiet mode - only important events + #[arg(long, short)] + pub quiet: bool, + + /// Output format + #[arg(long, default_value = "tui")] + pub format: MonitorFormat, +} + +#[derive(Debug, Clone, ValueEnum)] +pub enum MonitorFormat { + /// Terminal UI dashboard + Tui, + /// Plain text output + Text, + /// JSON stream + Json, +} +``` + +### 5.4 Additional Commands + +```bash +# Resume a paused contract +makima contracts resume <contract-id> + +# Pause an active contract +makima contracts pause <contract-id> --reason "Waiting for external review" + +# Force advance phase +makima contracts advance <contract-id> --phase execute --force + +# Restart stale supervisor +makima contracts restart-supervisor <contract-id> + +# Show contract details +makima contracts show <contract-id> --verbose + +# Check contract health +makima contracts health <contract-id> + +# Export contract history +makima contracts export <contract-id> --format json --output contract.json +``` + +--- + +## 6. Bug Fixes + +### 6.1 Version Conflicts (Silent Failures) + +**Problem**: Phase changes can fail silently when version conflicts occur. + +**Solution**: Implement explicit version checking and conflict reporting. + +```rust +/// Result type for phase changes with explicit conflict handling +pub enum PhaseChangeResult { + Success(Contract), + VersionConflict { + expected: i32, + actual: i32, + current_phase: String, + }, + ValidationFailed { + reason: String, + missing_requirements: Vec<String>, + }, + Unauthorized, + NotFound, +} + +/// Enhanced phase change handler +pub async fn change_phase_with_validation( + pool: &PgPool, + contract_id: Uuid, + owner_id: Uuid, + new_phase: &str, + expected_version: Option<i32>, +) -> Result<PhaseChangeResult, Error> { + // Start transaction + let mut tx = pool.begin().await?; + + // Get current contract with lock + let contract = sqlx::query_as::<_, Contract>( + "SELECT * FROM contracts WHERE id = $1 AND owner_id = $2 FOR UPDATE" + ) + .bind(contract_id) + .bind(owner_id) + .fetch_optional(&mut *tx) + .await?; + + let contract = match contract { + Some(c) => c, + None => return Ok(PhaseChangeResult::NotFound), + }; + + // Check version if provided + if let Some(expected) = expected_version { + if contract.version != expected { + return Ok(PhaseChangeResult::VersionConflict { + expected, + actual: contract.version, + current_phase: contract.phase.clone(), + }); + } + } + + // Validate phase transition + let validation = validate_phase_transition(&contract, new_phase); + if !validation.valid { + return Ok(PhaseChangeResult::ValidationFailed { + reason: validation.reason, + missing_requirements: validation.missing, + }); + } + + // Update phase + let updated = sqlx::query_as::<_, Contract>( + r#" + UPDATE contracts + SET phase = $1, version = version + 1, updated_at = NOW() + WHERE id = $2 + RETURNING * + "# + ) + .bind(new_phase) + .bind(contract_id) + .fetch_one(&mut *tx) + .await?; + + tx.commit().await?; + + Ok(PhaseChangeResult::Success(updated)) +} +``` + +### 6.2 Deliverable Validation + +**Problem**: Can mark non-existent deliverables as complete. + +**Solution**: Validate deliverable IDs before marking complete. + +```rust +/// Validate deliverable exists for contract type and phase +pub fn validate_deliverable( + contract_type: &str, + phase: &str, + deliverable_id: &str, + phase_config: Option<&PhaseConfig>, +) -> Result<(), DeliverableValidationError> { + let deliverables = if let Some(config) = phase_config { + get_phase_deliverables_from_config(phase, config) + } else { + get_phase_deliverables_for_type(phase, contract_type) + }; + + let valid_ids: Vec<&str> = deliverables + .deliverables + .iter() + .map(|d| d.id.as_str()) + .collect(); + + if !valid_ids.contains(&deliverable_id) { + return Err(DeliverableValidationError::InvalidDeliverable { + deliverable_id: deliverable_id.to_string(), + phase: phase.to_string(), + valid_ids: valid_ids.into_iter().map(String::from).collect(), + }); + } + + Ok(()) +} + +#[derive(Debug, thiserror::Error)] +pub enum DeliverableValidationError { + #[error("Invalid deliverable '{deliverable_id}' for {phase} phase. Valid IDs: {valid_ids:?}")] + InvalidDeliverable { + deliverable_id: String, + phase: String, + valid_ids: Vec<String>, + }, +} +``` + +### 6.3 Phase Guard Bypass + +**Problem**: Supervisors can bypass phase_guard setting. + +**Solution**: Enforce phase_guard at the API level, not just in supervisor logic. + +```rust +/// Enhanced phase change with phase_guard enforcement +pub async fn change_phase_enforced( + pool: &PgPool, + contract_id: Uuid, + owner_id: Uuid, + request: ChangePhaseRequest, + is_supervisor: bool, +) -> Result<PhaseChangeResponse, Error> { + let contract = get_contract_for_owner(pool, contract_id, owner_id).await? + .ok_or_else(|| Error::NotFound)?; + + // Phase guard is enforced for EVERYONE, including supervisors + if contract.phase_guard && !request.confirmed.unwrap_or(false) { + // Must return phase review info, regardless of caller + return Ok(PhaseChangeResponse::RequiresConfirmation { + current_phase: contract.phase, + next_phase: request.phase, + deliverables: get_phase_deliverables(&contract.phase), + message: "Phase guard is enabled. User confirmation required.".to_string(), + }); + } + + // Proceed with phase change + // ... +} +``` + +--- + +## 7. Migration Plan + +### 7.1 Phase 1: Database Schema (Week 1) + +1. Add new columns to `contracts` table +2. Add new columns to `supervisor_states` table +3. Create `supervisor_heartbeats` table +4. Create indexes +5. Backfill `last_activity_at` from existing data + +```sql +-- Migration 001: Contract status enhancements +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS status_changed_at TIMESTAMPTZ DEFAULT NOW(); +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS last_activity_at TIMESTAMPTZ DEFAULT NOW(); +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS waiting_for TEXT; +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS blocked_reason TEXT; +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS failure_reason TEXT; +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS auto_complete_enabled BOOLEAN DEFAULT TRUE; +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS completion_detected_at TIMESTAMPTZ; + +CREATE INDEX IF NOT EXISTS idx_contracts_status ON contracts(status); +CREATE INDEX IF NOT EXISTS idx_contracts_last_activity ON contracts(last_activity_at); + +-- Backfill last_activity_at from updated_at +UPDATE contracts SET last_activity_at = updated_at WHERE last_activity_at IS NULL; +``` + +### 7.2 Phase 2: Core Logic (Week 2) + +1. Implement `ContractStatus` enum with new states +2. Implement state transition validation +3. Implement `CompletionDetector` +4. Update phase change handlers with validation +5. Implement deliverable validation + +### 7.3 Phase 3: Supervisor Enhancements (Week 3) + +1. Implement `SupervisorState` enum +2. Implement heartbeat mechanism +3. Implement state persistence +4. Implement restoration protocol +5. Update supervisor API endpoints + +### 7.4 Phase 4: Monitoring (Week 4) + +1. Implement WebSocket monitoring events +2. Implement stale detection service +3. Implement batch operations +4. Implement dashboard API + +### 7.5 Phase 5: CLI (Week 5) + +1. Implement `contracts list` with filters +2. Implement `contracts cleanup` +3. Implement `contracts monitor` +4. Implement additional helper commands + +### 7.6 Phase 6: Testing & Rollout (Week 6) + +1. Unit tests for all new components +2. Integration tests for state machines +3. Load testing for monitoring +4. Staged rollout with feature flags +5. Documentation updates + +--- + +## 8. Appendix + +### 8.1 Configuration Options + +```toml +[contracts] +# Timeout configuration +stale_threshold_minutes = 30 +input_timeout_hours = 24 +completion_grace_period_minutes = 5 +archive_retention_days = 30 + +# Auto-completion +auto_complete_enabled = true +auto_advance_phases = true + +# Heartbeat +heartbeat_interval_seconds = 30 +heartbeat_timeout_seconds = 120 + +# Monitoring +monitor_ws_buffer_size = 1000 +stale_detection_interval_seconds = 60 +``` + +### 8.2 Error Codes + +| Code | Description | +|------|-------------| +| `CONTRACT_NOT_FOUND` | Contract does not exist | +| `INVALID_TRANSITION` | State transition not allowed | +| `VERSION_CONFLICT` | Optimistic locking conflict | +| `PHASE_GUARD_REQUIRED` | Phase guard confirmation needed | +| `INVALID_DELIVERABLE` | Deliverable ID not valid for phase | +| `SUPERVISOR_NOT_FOUND` | No supervisor for contract | +| `SUPERVISOR_DEAD` | Supervisor heartbeat timeout | +| `VALIDATION_FAILED` | Phase requirements not met | + +### 8.3 Metrics + +The following metrics should be tracked: + +- `contracts_by_status` (gauge) - Count of contracts by status +- `contracts_stale_count` (gauge) - Number of stale contracts +- `phase_transitions_total` (counter) - Phase changes by from/to +- `completion_detections_total` (counter) - Auto-completions detected +- `supervisor_heartbeats_total` (counter) - Heartbeats received +- `supervisor_restarts_total` (counter) - Supervisor restarts +- `batch_operations_total` (counter) - Batch operations by type diff --git a/docs/plans/implementation-plan.md b/docs/plans/implementation-plan.md new file mode 100644 index 0000000..a5ff7c7 --- /dev/null +++ b/docs/plans/implementation-plan.md @@ -0,0 +1,1226 @@ +# Contract Management System - Implementation Plan + +**Version**: 1.0.0 +**Status**: Draft +**Based On**: [Contract Management Specification](../contract-management-spec.md) +**Date**: 2025-01-31 + +## Executive Summary + +This implementation plan breaks down the Contract Management System specification into 5 phases, prioritizing critical bug fixes and core improvements first, then building more advanced features. Each phase is designed to be independently deployable with feature flags for gradual rollout. + +--- + +## Phase Overview + +| Phase | Focus | Duration | Risk Level | +|-------|-------|----------|------------| +| Phase 1 | Critical Fixes + Core Status | 1 week | Low | +| Phase 2 | Completion Detection + Auto-Advance | 1 week | Medium | +| Phase 3 | Supervisor Heartbeat + Restoration | 1.5 weeks | Medium-High | +| Phase 4 | CLI Improvements | 1 week | Low | +| Phase 5 | Monitoring Dashboard | 1.5 weeks | Medium | + +**Total Estimated Duration**: 6 weeks + +--- + +## Phase 1: Critical Bug Fixes + Core Status Improvements + +**Goal**: Fix immediate issues causing data inconsistency and silent failures. Lay groundwork for enhanced status tracking. + +**Priority**: CRITICAL +**Duration**: 1 week +**Breaking Changes**: None (additive only) + +### Task 1.1: Deliverable Validation + +**Problem**: Can mark non-existent deliverables as complete, causing data inconsistency. + +| Attribute | Value | +|-----------|-------| +| Complexity | Small | +| Dependencies | None | +| Risk | Low | + +**Files to Modify**: +- `makima/src/server/handlers/contracts.rs` - Add validation in `mark_deliverable_complete()` +- `makima/src/db/models.rs` - Add `DeliverableValidationError` type + +**Implementation**: +```rust +// In contracts.rs, before marking complete: +fn validate_deliverable( + contract_type: &str, + phase: &str, + deliverable_id: &str, + phase_config: Option<&PhaseConfig>, +) -> Result<(), DeliverableValidationError> +``` + +**Testing Requirements**: +- [ ] Unit test: Reject invalid deliverable IDs +- [ ] Unit test: Accept valid deliverable IDs for each phase +- [ ] Integration test: API returns 400 for invalid deliverable +- [ ] Regression test: Existing valid deliverables still work + +--- + +### Task 1.2: Version Conflict Detection + +**Problem**: Phase changes can fail silently when version conflicts occur. + +| Attribute | Value | +|-----------|-------| +| Complexity | Medium | +| Dependencies | None | +| Risk | Low | + +**Files to Modify**: +- `makima/src/db/repository.rs` - Enhance `change_contract_phase_for_owner()` with explicit version checking +- `makima/src/server/handlers/contracts.rs` - Return proper error responses for conflicts +- `makima/src/db/models.rs` - Add `PhaseChangeResult` enum + +**Implementation**: +```rust +pub enum PhaseChangeResult { + Success(Contract), + VersionConflict { expected: i32, actual: i32, current_phase: String }, + ValidationFailed { reason: String, missing_requirements: Vec<String> }, + Unauthorized, + NotFound, +} +``` + +**Testing Requirements**: +- [ ] Unit test: Detect version mismatch +- [ ] Unit test: Proper locking with `FOR UPDATE` +- [ ] Integration test: Concurrent phase changes handled correctly +- [ ] Integration test: API returns 409 Conflict with details + +--- + +### Task 1.3: Phase Guard Enforcement + +**Problem**: Supervisors can bypass `phase_guard` setting. + +| Attribute | Value | +|-----------|-------| +| Complexity | Small | +| Dependencies | None | +| Risk | Low | + +**Files to Modify**: +- `makima/src/server/handlers/contracts.rs` - Enforce phase_guard at API level +- `makima/src/daemon/api/supervisor.rs` - Handle confirmation requirement response + +**Implementation**: +- Move phase_guard check from supervisor logic to API handler +- All callers (including supervisors) must provide `confirmed: true` if phase_guard enabled +- Return `RequiresConfirmation` response with deliverables list + +**Testing Requirements**: +- [ ] Unit test: Supervisor blocked by phase_guard +- [ ] Unit test: Supervisor proceeds with confirmation +- [ ] Integration test: API enforces regardless of caller + +--- + +### Task 1.4: Enhanced Contract Status Tracking + +**Problem**: Only 3 states (active/completed/archived) don't capture real contract state. + +| Attribute | Value | +|-----------|-------| +| Complexity | Medium | +| Dependencies | Task 1.1, 1.2, 1.3 | +| Risk | Medium (schema change) | + +**Files to Create**: +- `makima/migrations/YYYYMMDDHHMMSS_enhanced_contract_status.sql` + +**Files to Modify**: +- `makima/src/db/models.rs` - Extend `ContractStatus` enum +- `makima/src/db/repository.rs` - Add status transition validation +- `makima/src/server/handlers/contracts.rs` - Add status change endpoint + +**Database Changes**: +```sql +-- New columns for status tracking +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS status_changed_at TIMESTAMPTZ DEFAULT NOW(); +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS last_activity_at TIMESTAMPTZ DEFAULT NOW(); +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS waiting_for TEXT; +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS blocked_reason TEXT; +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS failure_reason TEXT; + +CREATE INDEX IF NOT EXISTS idx_contracts_status ON contracts(status); +CREATE INDEX IF NOT EXISTS idx_contracts_last_activity ON contracts(last_activity_at); + +-- Backfill last_activity_at +UPDATE contracts SET last_activity_at = updated_at WHERE last_activity_at IS NULL; +``` + +**New Status Values** (additive, backwards compatible): +```rust +pub enum ContractStatus { + // Existing (keep as-is for backwards compatibility) + Active, + Completed, + Archived, + // New states + Created, // Not yet started + WaitingForInput, // Pending user input + Paused, // User-requested pause + Blocked, // External dependency + Completing, // Running final validation + Failed, // Unrecoverable error +} +``` + +**Migration Strategy**: +- Default new contracts to `Created`, transition to `Active` when supervisor starts +- Existing `active` contracts remain `active` (no migration needed) +- New states only used for new contracts initially + +**Testing Requirements**: +- [ ] Unit test: Valid state transitions allowed +- [ ] Unit test: Invalid state transitions rejected +- [ ] Integration test: Status change triggers event recording +- [ ] Migration test: Backfill runs correctly +- [ ] Backwards compatibility: Old API responses still work + +--- + +### Task 1.5: Activity Timestamp Tracking + +**Problem**: No reliable way to detect stale contracts. + +| Attribute | Value | +|-----------|-------| +| Complexity | Small | +| Dependencies | Task 1.4 | +| Risk | Low | + +**Files to Modify**: +- `makima/src/db/repository.rs` - Update `last_activity_at` on relevant operations +- `makima/src/server/handlers/contracts.rs` - Update activity on API calls +- `makima/src/server/handlers/mesh_supervisor.rs` - Update activity on supervisor events + +**Operations that update activity**: +- Phase change +- Deliverable marked complete +- Task spawned/completed +- Chat message sent/received +- Supervisor heartbeat received + +**Testing Requirements**: +- [ ] Unit test: Activity updated on each operation +- [ ] Integration test: Query stale contracts by threshold + +--- + +### Phase 1 Deliverables Checklist + +- [ ] Deliverable validation implemented and tested +- [ ] Version conflict detection with proper error responses +- [ ] Phase guard enforced at API level +- [ ] Database migration for new status columns +- [ ] Extended ContractStatus enum +- [ ] Status transition validation +- [ ] Activity timestamp tracking +- [ ] All Phase 1 tests passing +- [ ] Documentation updated + +--- + +## Phase 2: Completion Detection + Auto-Advance + +**Goal**: Enable automatic detection when phases/contracts are ready to advance, reducing manual intervention. + +**Priority**: HIGH +**Duration**: 1 week +**Breaking Changes**: None (opt-in via configuration) + +### Task 2.1: Phase Completion Gates + +**Problem**: No automatic validation of phase requirements before advancement. + +| Attribute | Value | +|-----------|-------| +| Complexity | Medium | +| Dependencies | Phase 1 complete | +| Risk | Low (opt-in) | + +**Files to Create**: +- `makima/src/server/services/completion_detector.rs` - New service module + +**Files to Modify**: +- `makima/src/db/models.rs` - Add `PhaseCompletionGate`, `TaskRequirement` types +- `makima/src/server/mod.rs` - Register completion detector service + +**Implementation**: +```rust +pub struct PhaseCompletionGate { + pub required_deliverables: Vec<String>, + pub required_tasks: TaskRequirement, + pub custom_validator: Option<Box<dyn Fn(&Contract) -> bool>>, + pub auto_advance: bool, +} + +pub enum TaskRequirement { + None, + AllComplete, + MinComplete(usize), + NamedTasks(Vec<String>), +} + +pub enum PhaseReadinessResult { + Ready, + NotReady { missing: Vec<String> }, + NoGate, +} +``` + +**Testing Requirements**: +- [ ] Unit test: Gate with all deliverables required +- [ ] Unit test: Gate with task requirements +- [ ] Unit test: Mixed deliverable + task requirements +- [ ] Integration test: API endpoint returns readiness status + +--- + +### Task 2.2: Phase Readiness API + +**Problem**: No way to check if phase is ready to advance without attempting transition. + +| Attribute | Value | +|-----------|-------| +| Complexity | Small | +| Dependencies | Task 2.1 | +| Risk | Low | + +**Files to Modify**: +- `makima/src/server/handlers/contracts.rs` - Add `check_phase_readiness()` endpoint +- `makima/src/server/routes.rs` - Register new route + +**New Endpoint**: +``` +GET /api/v1/contracts/{id}/phase-readiness + +Response: +{ + "phase": "plan", + "ready": false, + "missing": ["Deliverable: plan-document", "1 tasks incomplete"], + "auto_advance_enabled": true +} +``` + +**Testing Requirements**: +- [ ] Integration test: Correct readiness for each phase +- [ ] Integration test: Response includes all missing items + +--- + +### Task 2.3: Auto-Advance Logic + +**Problem**: Users must manually advance phases even when all requirements met. + +| Attribute | Value | +|-----------|-------| +| Complexity | Medium | +| Dependencies | Task 2.1, 2.2 | +| Risk | Medium | + +**Files to Create**: +- `makima/migrations/YYYYMMDDHHMMSS_auto_complete_settings.sql` + +**Files to Modify**: +- `makima/src/server/services/completion_detector.rs` - Add auto-advance trigger +- `makima/src/server/handlers/contracts.rs` - Hook into task completion +- `makima/src/db/models.rs` - Add `auto_complete_enabled` field + +**Database Changes**: +```sql +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS auto_complete_enabled BOOLEAN DEFAULT TRUE; +ALTER TABLE contracts ADD COLUMN IF NOT EXISTS completion_detected_at TIMESTAMPTZ; +``` + +**Auto-Advance Flow**: +1. Task completes or deliverable marked +2. Check if phase gate satisfied +3. If auto_advance enabled and gate satisfied: + - If phase_guard enabled: Set status = WaitingForInput + - If phase_guard disabled: Auto-advance to next phase +4. If terminal phase and all requirements met: + - Set status = Completing + - Run cleanup (worktrees, etc.) + - Set status = Completed + +**Feature Flag**: +```toml +[contracts] +auto_complete_enabled = true # Global default +auto_advance_phases = true # Allow phase auto-advance +``` + +**Testing Requirements**: +- [ ] Unit test: Auto-advance triggers on gate satisfaction +- [ ] Unit test: Phase guard blocks auto-advance +- [ ] Unit test: Terminal phase triggers completion flow +- [ ] Integration test: Full contract auto-completes +- [ ] Integration test: Disable auto-complete per contract + +--- + +### Task 2.4: Contract Completion API + +**Problem**: No way to force-check or manually trigger completion flow. + +| Attribute | Value | +|-----------|-------| +| Complexity | Small | +| Dependencies | Task 2.3 | +| Risk | Low | + +**Files to Modify**: +- `makima/src/server/handlers/contracts.rs` - Add `check_completion()` and `set_auto_complete()` endpoints + +**New Endpoints**: +``` +POST /api/v1/contracts/{id}/check-completion +Response: { "ready": true, "would_complete": true, "missing": [] } + +PUT /api/v1/contracts/{id}/auto-complete +Body: { "enabled": true } +Response: ContractSummary +``` + +**Testing Requirements**: +- [ ] Integration test: Manual completion check +- [ ] Integration test: Toggle auto-complete setting + +--- + +### Phase 2 Deliverables Checklist + +- [ ] PhaseCompletionGate implementation +- [ ] TaskRequirement types +- [ ] CompletionDetector service +- [ ] Phase readiness API endpoint +- [ ] Auto-advance logic with phase_guard respect +- [ ] Contract completion flow +- [ ] Auto-complete toggle per contract +- [ ] Feature flags in configuration +- [ ] All Phase 2 tests passing +- [ ] Documentation updated + +--- + +## Phase 3: Supervisor Heartbeat + Restoration + +**Goal**: Reliable supervisor state tracking and restoration after crashes. + +**Priority**: HIGH +**Duration**: 1.5 weeks +**Breaking Changes**: Minor (new heartbeat protocol) + +### Task 3.1: Enhanced Supervisor State Enum + +**Problem**: Current supervisor state doesn't capture activity details. + +| Attribute | Value | +|-----------|-------| +| Complexity | Small | +| Dependencies | Phase 1, Phase 2 | +| Risk | Low | + +**Files to Modify**: +- `makima/src/db/models.rs` - Add `SupervisorState` enum +- `makima/src/daemon/ws/protocol.rs` - Add state to heartbeat message + +**New Enum**: +```rust +pub enum SupervisorState { + Initializing, + Idle, + Working, + WaitingForUser, + WaitingForTasks, + Blocked, + Completed, + Failed, + Interrupted, +} +``` + +**Testing Requirements**: +- [ ] Unit test: State serialization/deserialization +- [ ] Unit test: All state values recognized + +--- + +### Task 3.2: Heartbeat Infrastructure + +**Problem**: No reliable way to detect dead supervisors. + +| Attribute | Value | +|-----------|-------| +| Complexity | Medium | +| Dependencies | Task 3.1 | +| Risk | Medium | + +**Files to Create**: +- `makima/migrations/YYYYMMDDHHMMSS_supervisor_heartbeats.sql` + +**Files to Modify**: +- `makima/src/db/models.rs` - Add `SupervisorHeartbeat` struct +- `makima/src/db/repository.rs` - Add heartbeat storage functions +- `makima/src/daemon/ws/protocol.rs` - Enhanced heartbeat message +- `makima/src/daemon/ws/client.rs` - Send enhanced heartbeats +- `makima/src/server/handlers/mesh_daemon.rs` - Process heartbeats + +**Database Changes**: +```sql +CREATE TABLE IF NOT EXISTS supervisor_heartbeats ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + supervisor_task_id UUID NOT NULL REFERENCES tasks(id) ON DELETE CASCADE, + contract_id UUID NOT NULL REFERENCES contracts(id) ON DELETE CASCADE, + state VARCHAR(50) NOT NULL, + phase VARCHAR(50) NOT NULL, + current_activity TEXT, + progress INTEGER DEFAULT 0, + pending_task_ids UUID[] DEFAULT ARRAY[]::UUID[], + timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_heartbeats_supervisor ON supervisor_heartbeats(supervisor_task_id); +CREATE INDEX idx_heartbeats_timestamp ON supervisor_heartbeats(timestamp); + +-- Retention policy: keep only 24 hours +ALTER TABLE supervisor_heartbeats + ADD CONSTRAINT heartbeat_ttl CHECK (timestamp > NOW() - INTERVAL '24 hours'); +``` + +**Enhanced Heartbeat Message**: +```rust +pub struct SupervisorHeartbeat { + pub task_id: Uuid, + pub contract_id: Uuid, + pub state: SupervisorState, + pub phase: String, + pub current_activity: String, + pub progress: u8, + pub pending_task_ids: Vec<Uuid>, + pub timestamp: DateTime<Utc>, +} +``` + +**Configuration**: +```toml +[contracts] +heartbeat_interval_seconds = 30 +heartbeat_timeout_seconds = 120 +``` + +**Testing Requirements**: +- [ ] Unit test: Heartbeat message serialization +- [ ] Integration test: Heartbeats stored correctly +- [ ] Integration test: Old heartbeats cleaned up +- [ ] Load test: Handle many concurrent heartbeats + +--- + +### Task 3.3: Supervisor State Persistence + +**Problem**: Supervisor context lost after daemon crash. + +| Attribute | Value | +|-----------|-------| +| Complexity | Large | +| Dependencies | Task 3.2 | +| Risk | High | + +**Files to Create**: +- `makima/migrations/YYYYMMDDHHMMSS_enhanced_supervisor_state.sql` + +**Files to Modify**: +- `makima/src/db/models.rs` - Enhance `SupervisorState` table model +- `makima/src/db/repository.rs` - State persistence functions +- `makima/src/server/handlers/mesh_supervisor.rs` - State save/restore logic +- `makima/src/daemon/ws/client.rs` - Send state updates + +**Database Changes**: +```sql +ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS state VARCHAR(50) NOT NULL DEFAULT 'initializing'; +ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS current_activity TEXT; +ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS progress INTEGER DEFAULT 0; +ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS error_message TEXT; +ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS spawned_task_ids UUID[] DEFAULT ARRAY[]::UUID[]; +``` + +**Persistent State Model**: +```rust +pub struct SupervisorPersistentState { + pub state: SupervisorState, + pub phase: String, + pub conversation_history: Vec<ConversationMessage>, + pub pending_questions: Vec<PendingQuestion>, + pub spawned_task_ids: Vec<Uuid>, + pub waiting_on_task_ids: Vec<Uuid>, + pub last_checkpoint: Option<CheckpointInfo>, + pub current_activity: String, + pub error: Option<String>, + pub created_at: DateTime<Utc>, + pub updated_at: DateTime<Utc>, +} +``` + +**State Save Points**: +- On every LLM response +- On task spawn +- On question asked +- On phase change +- On heartbeat (lightweight update) + +**Testing Requirements**: +- [ ] Unit test: State serialization round-trip +- [ ] Integration test: State persisted on each save point +- [ ] Integration test: State survives daemon restart + +--- + +### Task 3.4: Supervisor Restoration Protocol + +**Problem**: No reliable restoration after crash. + +| Attribute | Value | +|-----------|-------| +| Complexity | Large | +| Dependencies | Task 3.3 | +| Risk | High | + +**Files to Modify**: +- `makima/src/server/handlers/mesh_supervisor.rs` - Restoration logic +- `makima/src/server/handlers/mesh_daemon.rs` - Task reassignment +- `makima/src/daemon/ws/client.rs` - Handle restoration handoff + +**Restoration Flow**: +``` +1. Daemon restarts or task reassigned + │ + ▼ +2. Load supervisor state from supervisor_states + │ + ├─ NOT FOUND ──► Start fresh, log warning + │ + ▼ FOUND +3. Validate state consistency + │ + ├─ INVALID ──► Start from last checkpoint + │ + ▼ VALID +4. Restore conversation history + │ + ▼ +5. Check for pending questions + │ + ├─ HAS PENDING ──► Re-deliver to user + │ + ▼ NO PENDING +6. Check for waiting tasks + │ + ├─ HAS WAITING ──► Resume waiting state + │ + ▼ NO WAITING +7. Send restoration context to Claude + │ + ▼ +8. Resume execution from last state +``` + +**Testing Requirements**: +- [ ] Integration test: Restore after clean shutdown +- [ ] Integration test: Restore after crash (simulated) +- [ ] Integration test: Handle corrupted state gracefully +- [ ] Integration test: Resume with pending questions +- [ ] Integration test: Resume with waiting tasks + +--- + +### Task 3.5: Supervisor Status API + +**Problem**: No way to query supervisor status. + +| Attribute | Value | +|-----------|-------| +| Complexity | Small | +| Dependencies | Task 3.3 | +| Risk | Low | + +**Files to Modify**: +- `makima/src/server/handlers/contracts.rs` - Add supervisor status endpoints +- `makima/src/server/routes.rs` - Register routes + +**New Endpoints**: +``` +GET /api/v1/contracts/{id}/supervisor/status +Response: { + "task_id": "uuid", + "state": "working", + "phase": "execute", + "current_activity": "Implementing user authentication", + "progress": 45, + "last_heartbeat": "2025-01-31T10:30:00Z", + "pending_task_ids": ["uuid1", "uuid2"] +} + +GET /api/v1/contracts/{id}/supervisor/heartbeats?limit=10 +Response: { + "heartbeats": [ + { "timestamp": "...", "state": "...", "activity": "..." } + ] +} + +POST /api/v1/contracts/{id}/supervisor/sync +Response: { "synced": true, "state": "working" } +``` + +**Testing Requirements**: +- [ ] Integration test: Status endpoint returns correct data +- [ ] Integration test: Heartbeat history retrieval +- [ ] Integration test: Force sync triggers state update + +--- + +### Phase 3 Deliverables Checklist + +- [ ] SupervisorState enum with all states +- [ ] Heartbeat infrastructure (DB, messages, storage) +- [ ] Enhanced supervisor state persistence +- [ ] Restoration protocol implementation +- [ ] Supervisor status API endpoints +- [ ] Configuration for heartbeat intervals +- [ ] Cleanup job for old heartbeats +- [ ] All Phase 3 tests passing +- [ ] Documentation updated + +--- + +## Phase 4: CLI Improvements + +**Goal**: Enhanced CLI for contract management and monitoring. + +**Priority**: MEDIUM +**Duration**: 1 week +**Breaking Changes**: None (additive) + +### Task 4.1: Contract List with Filters + +**Problem**: No way to filter contracts by status, phase, or staleness. + +| Attribute | Value | +|-----------|-------| +| Complexity | Medium | +| Dependencies | Phase 1 (status tracking) | +| Risk | Low | + +**Files to Modify**: +- `makima/src/daemon/cli/contract.rs` - Add list subcommand +- `makima/src/daemon/cli/mod.rs` - Register subcommand +- `makima/src/daemon/api/contract.rs` - Add list API call + +**New Command**: +```bash +makima contracts list [OPTIONS] + +Options: + --status <STATUS> Filter by status (active,completed,failed) + --stale Show only stale contracts + --threshold <DURATION> Stale threshold (default: 30m) + --waiting Show contracts waiting for input + --phase <PHASE> Filter by phase + --format <FORMAT> Output format (table,json,compact) + -n, --limit <N> Limit results +``` + +**Testing Requirements**: +- [ ] Unit test: Argument parsing +- [ ] Integration test: Filter by status +- [ ] Integration test: Filter by stale +- [ ] Integration test: Multiple filters combined +- [ ] Integration test: All output formats + +--- + +### Task 4.2: Cleanup Command + +**Problem**: No automated way to clean up old contracts and worktrees. + +| Attribute | Value | +|-----------|-------| +| Complexity | Medium | +| Dependencies | Phase 1 | +| Risk | Medium (destructive) | + +**Files to Modify**: +- `makima/src/daemon/cli/contract.rs` - Add cleanup subcommand +- `makima/src/daemon/api/contract.rs` - Add cleanup API calls + +**New Command**: +```bash +makima contracts cleanup [OPTIONS] + +Options: + --archive Archive completed/failed contracts + --delete-archived Delete archived contracts + --worktrees Clean up orphaned worktrees + --all Run all cleanup operations + --older-than <DURATION> Threshold (default: 7d) + --dry-run Show what would be affected + --force Skip confirmation prompts +``` + +**Safety Features**: +- Dry-run mode by default for destructive operations +- Confirmation prompt unless --force specified +- Log all deletions + +**Testing Requirements**: +- [ ] Unit test: Argument parsing +- [ ] Integration test: Dry-run shows correct count +- [ ] Integration test: Archive older than threshold +- [ ] Integration test: Delete archived +- [ ] Integration test: Worktree cleanup + +--- + +### Task 4.3: Monitor Command + +**Problem**: No real-time monitoring from CLI. + +| Attribute | Value | +|-----------|-------| +| Complexity | Large | +| Dependencies | Phase 3 (heartbeats) | +| Risk | Low | + +**Files to Create**: +- `makima/src/daemon/cli/monitor.rs` - Monitor command implementation + +**Files to Modify**: +- `makima/src/daemon/cli/mod.rs` - Register monitor subcommand +- `makima/src/daemon/tui/` - Enhance TUI for monitoring + +**New Command**: +```bash +makima contracts monitor [CONTRACT_IDS...] [OPTIONS] + +Options: + --status <STATUS> Filter by status + --stale Only show stale contracts + --quiet Only important events + --format <FORMAT> Output format (tui,text,json) +``` + +**Output Formats**: +- `tui`: Full terminal UI dashboard +- `text`: Plain text event stream +- `json`: JSON event stream (for scripting) + +**Testing Requirements**: +- [ ] Unit test: Argument parsing +- [ ] Integration test: WebSocket connection established +- [ ] Integration test: Events displayed correctly +- [ ] Integration test: Filter by contract ID + +--- + +### Task 4.4: Additional Helper Commands + +**Problem**: Common operations require multiple steps. + +| Attribute | Value | +|-----------|-------| +| Complexity | Small per command | +| Dependencies | Phase 1, 2, 3 | +| Risk | Low | + +**New Commands**: +```bash +# Pause/Resume +makima contracts pause <ID> --reason "Waiting for review" +makima contracts resume <ID> + +# Phase management +makima contracts advance <ID> --phase execute --force + +# Supervisor management +makima contracts restart-supervisor <ID> + +# Information +makima contracts show <ID> --verbose +makima contracts health <ID> + +# Export +makima contracts export <ID> --format json --output contract.json +``` + +**Testing Requirements**: +- [ ] Integration test for each command +- [ ] Help text for all commands + +--- + +### Phase 4 Deliverables Checklist + +- [ ] `contracts list` with all filters +- [ ] `contracts cleanup` with dry-run +- [ ] `contracts monitor` with TUI +- [ ] Helper commands (pause, resume, advance, etc.) +- [ ] JSON output for scripting +- [ ] All Phase 4 tests passing +- [ ] CLI documentation updated + +--- + +## Phase 5: Monitoring Dashboard + +**Goal**: Real-time monitoring with WebSocket events and batch operations. + +**Priority**: MEDIUM +**Duration**: 1.5 weeks +**Breaking Changes**: None + +### Task 5.1: WebSocket Monitor Events + +**Problem**: No real-time event stream for monitoring. + +| Attribute | Value | +|-----------|-------| +| Complexity | Large | +| Dependencies | Phase 3 (heartbeats) | +| Risk | Medium | + +**Files to Create**: +- `makima/src/server/handlers/contract_monitor.rs` - WebSocket handler +- `makima/src/server/services/event_broadcaster.rs` - Event broadcast service + +**Files to Modify**: +- `makima/src/server/routes.rs` - Register WebSocket endpoint +- `makima/src/server/handlers/contracts.rs` - Emit events on changes + +**Event Types**: +```rust +pub enum ContractMonitorEvent { + StatusChanged { contract_id, old_status, new_status, reason }, + PhaseChanged { contract_id, old_phase, new_phase }, + SupervisorStateChanged { contract_id, supervisor_task_id, old_state, new_state }, + Heartbeat { contract_id, state, activity, progress }, + StaleDetected { contract_id, last_activity, stale_duration }, + TaskCompleted { contract_id, task_id, task_name, success }, + DeliverableCompleted { contract_id, phase, deliverable_id }, + QuestionAsked { contract_id, question_id, question, question_type }, +} +``` + +**WebSocket Endpoint**: +``` +WS /api/v1/contracts/monitor + +Filter message: +{ + "contract_ids": ["uuid1", "uuid2"], + "statuses": ["active", "waiting_for_input"], + "include_stale": true, + "include_heartbeats": true +} +``` + +**Testing Requirements**: +- [ ] Unit test: Event serialization +- [ ] Integration test: WebSocket connection +- [ ] Integration test: Filter by contract ID +- [ ] Integration test: Filter by status +- [ ] Load test: Many concurrent connections + +--- + +### Task 5.2: Stale Contract Detection Service + +**Problem**: No automated detection of stale contracts. + +| Attribute | Value | +|-----------|-------| +| Complexity | Medium | +| Dependencies | Task 5.1 | +| Risk | Low | + +**Files to Create**: +- `makima/src/server/services/stale_detector.rs` + +**Files to Modify**: +- `makima/src/server/mod.rs` - Start detector service + +**Implementation**: +```rust +pub struct StaleContractDetector { + pool: PgPool, + config: ContractTimeoutConfig, + event_tx: Sender<ContractMonitorEvent>, +} + +impl StaleContractDetector { + pub async fn run(&self) { + let mut interval = tokio::time::interval(Duration::from_secs(60)); + loop { + interval.tick().await; + let stale = self.detect_stale_contracts().await; + for (contract_id, last_activity) in stale { + let _ = self.event_tx.send(ContractMonitorEvent::StaleDetected { + contract_id, + last_activity, + stale_duration: Utc::now() - last_activity, + }).await; + } + } + } +} +``` + +**Configuration**: +```toml +[contracts] +stale_detection_interval_seconds = 60 +stale_threshold_minutes = 30 +``` + +**Testing Requirements**: +- [ ] Unit test: Stale detection logic +- [ ] Integration test: Events emitted for stale contracts +- [ ] Integration test: Configurable threshold + +--- + +### Task 5.3: Batch Operations API + +**Problem**: No way to perform bulk operations on contracts. + +| Attribute | Value | +|-----------|-------| +| Complexity | Medium | +| Dependencies | Phase 1 | +| Risk | Medium (destructive) | + +**Files to Modify**: +- `makima/src/server/handlers/contracts.rs` - Add batch operation endpoint + +**New Endpoint**: +``` +POST /api/v1/contracts/batch + +Body: +{ + "operation": "archive_old", + "older_than_hours": 168, + "status_filter": ["completed", "failed"] +} + +Response: +{ + "operation": "archive_old", + "affected_count": 15, + "affected_ids": ["uuid1", "uuid2", ...], + "errors": [] +} +``` + +**Supported Operations**: +- `archive_old` - Archive completed/failed contracts older than threshold +- `pause_all` - Pause all active contracts +- `resume_all` - Resume all paused contracts +- `cleanup_archived` - Delete archived contracts older than threshold +- `restart_stale` - Restart stale supervisors + +**Testing Requirements**: +- [ ] Integration test for each operation +- [ ] Integration test: Partial failure handling +- [ ] Authorization test: Only admins can batch + +--- + +### Task 5.4: Dashboard API + +**Problem**: No aggregated view of contract health. + +| Attribute | Value | +|-----------|-------| +| Complexity | Medium | +| Dependencies | Task 5.1, 5.2 | +| Risk | Low | + +**Files to Modify**: +- `makima/src/server/handlers/contracts.rs` - Add dashboard endpoint + +**New Endpoint**: +``` +GET /api/v1/contracts/dashboard + +Response: +{ + "statusCounts": { "active": 10, "completed": 50, "failed": 2 }, + "phaseCounts": { "plan": 3, "execute": 5, "review": 2 }, + "staleContracts": [ + { "id": "uuid", "name": "...", "lastActivity": "...", "staleDurationSecs": 3600 } + ], + "waitingForInput": [ + { "id": "uuid", "name": "...", "waitingFor": "question", "question": "..." } + ], + "recentEvents": [...], + "resourceUsage": { + "activeSupervisors": 10, + "runningTasks": 25, + "pendingTasks": 5, + "activeDaemons": 3, + "totalWorktrees": 45 + } +} +``` + +**Testing Requirements**: +- [ ] Integration test: Correct counts +- [ ] Integration test: Stale contracts listed +- [ ] Integration test: Waiting contracts listed +- [ ] Performance test: Dashboard response time + +--- + +### Phase 5 Deliverables Checklist + +- [ ] WebSocket monitor endpoint +- [ ] Event broadcaster service +- [ ] All event types implemented +- [ ] Stale detection service +- [ ] Batch operations API +- [ ] Dashboard API +- [ ] All Phase 5 tests passing +- [ ] API documentation updated + +--- + +## Cross-Cutting Concerns + +### Feature Flags + +All new features should be behind feature flags for gradual rollout: + +```toml +[feature_flags] +enhanced_contract_status = true +auto_complete = true +supervisor_heartbeat = true +monitoring_dashboard = true +``` + +### Backwards Compatibility + +1. **API Responses**: Include both old and new field names during transition +2. **Status Values**: Old statuses (`active`, `completed`, `archived`) continue to work +3. **CLI**: New commands are additive, existing commands unchanged +4. **Database**: All migrations are additive (no column drops) + +### Migration Strategy + +1. **Zero-Downtime**: All migrations can run while system is live +2. **Rollback**: Each migration has a corresponding rollback script +3. **Data Backfill**: Run as background job, not blocking migration + +### Error Codes + +| Code | HTTP | Description | +|------|------|-------------| +| `CONTRACT_NOT_FOUND` | 404 | Contract does not exist | +| `INVALID_TRANSITION` | 400 | State transition not allowed | +| `VERSION_CONFLICT` | 409 | Optimistic locking conflict | +| `PHASE_GUARD_REQUIRED` | 409 | Phase guard confirmation needed | +| `INVALID_DELIVERABLE` | 400 | Deliverable ID not valid for phase | +| `SUPERVISOR_NOT_FOUND` | 404 | No supervisor for contract | +| `SUPERVISOR_DEAD` | 503 | Supervisor heartbeat timeout | +| `VALIDATION_FAILED` | 400 | Phase requirements not met | + +### Metrics + +Add Prometheus metrics: +- `contracts_by_status` (gauge) +- `contracts_stale_count` (gauge) +- `phase_transitions_total` (counter) +- `completion_detections_total` (counter) +- `supervisor_heartbeats_total` (counter) +- `supervisor_restarts_total` (counter) +- `batch_operations_total` (counter) + +--- + +## Testing Strategy + +### Unit Tests +- All new functions have unit tests +- Mock database for repository tests +- State machine transition tests + +### Integration Tests +- API endpoint tests with test database +- WebSocket connection tests +- CLI command tests + +### Load Tests +- Heartbeat throughput (target: 1000/sec) +- WebSocket connections (target: 100 concurrent) +- Dashboard API response time (target: <100ms) + +### Regression Tests +- Existing functionality unchanged +- Old API responses compatible +- Database migrations reversible + +--- + +## Timeline Summary + +| Week | Phase | Key Deliverables | +|------|-------|------------------| +| 1 | Phase 1 | Bug fixes, enhanced status, activity tracking | +| 2 | Phase 2 | Completion gates, auto-advance, readiness API | +| 3-4 | Phase 3 | Heartbeat, state persistence, restoration | +| 5 | Phase 4 | CLI list, cleanup, monitor commands | +| 6 | Phase 5 | WebSocket events, dashboard, batch ops | + +--- + +## Appendix: File Change Summary + +### New Files +- `makima/src/server/services/completion_detector.rs` +- `makima/src/server/services/stale_detector.rs` +- `makima/src/server/services/event_broadcaster.rs` +- `makima/src/server/handlers/contract_monitor.rs` +- `makima/src/daemon/cli/monitor.rs` +- `makima/migrations/YYYYMMDDHHMMSS_enhanced_contract_status.sql` +- `makima/migrations/YYYYMMDDHHMMSS_auto_complete_settings.sql` +- `makima/migrations/YYYYMMDDHHMMSS_supervisor_heartbeats.sql` +- `makima/migrations/YYYYMMDDHHMMSS_enhanced_supervisor_state.sql` + +### Modified Files +- `makima/src/db/models.rs` - New enums, structs +- `makima/src/db/repository.rs` - New queries, validation +- `makima/src/server/handlers/contracts.rs` - New endpoints, validation +- `makima/src/server/handlers/mesh_supervisor.rs` - State persistence +- `makima/src/server/handlers/mesh_daemon.rs` - Heartbeat processing +- `makima/src/server/routes.rs` - New routes +- `makima/src/daemon/ws/protocol.rs` - Enhanced messages +- `makima/src/daemon/ws/client.rs` - Heartbeat sending +- `makima/src/daemon/cli/contract.rs` - New subcommands +- `makima/src/daemon/cli/mod.rs` - Command registration +- `makima/src/daemon/api/contract.rs` - New API calls +- `makima/src/daemon/api/supervisor.rs` - Handle new responses +- `makima/src/daemon/config.rs` - New configuration options |
