# Contract Management System - Implementation Plan **Version**: 1.0.0 **Status**: Draft **Based On**: [Contract Management Specification](../contract-management-spec.md) **Date**: 2025-01-31 ## Executive Summary This implementation plan breaks down the Contract Management System specification into 5 phases, prioritizing critical bug fixes and core improvements first, then building more advanced features. Each phase is designed to be independently deployable with feature flags for gradual rollout. --- ## Phase Overview | Phase | Focus | Duration | Risk Level | |-------|-------|----------|------------| | Phase 1 | Critical Fixes + Core Status | 1 week | Low | | Phase 2 | Completion Detection + Auto-Advance | 1 week | Medium | | Phase 3 | Supervisor Heartbeat + Restoration | 1.5 weeks | Medium-High | | Phase 4 | CLI Improvements | 1 week | Low | | Phase 5 | Monitoring Dashboard | 1.5 weeks | Medium | **Total Estimated Duration**: 6 weeks --- ## Phase 1: Critical Bug Fixes + Core Status Improvements **Goal**: Fix immediate issues causing data inconsistency and silent failures. Lay groundwork for enhanced status tracking. **Priority**: CRITICAL **Duration**: 1 week **Breaking Changes**: None (additive only) ### Task 1.1: Deliverable Validation **Problem**: Can mark non-existent deliverables as complete, causing data inconsistency. | Attribute | Value | |-----------|-------| | Complexity | Small | | Dependencies | None | | Risk | Low | **Files to Modify**: - `makima/src/server/handlers/contracts.rs` - Add validation in `mark_deliverable_complete()` - `makima/src/db/models.rs` - Add `DeliverableValidationError` type **Implementation**: ```rust // In contracts.rs, before marking complete: fn validate_deliverable( contract_type: &str, phase: &str, deliverable_id: &str, phase_config: Option<&PhaseConfig>, ) -> Result<(), DeliverableValidationError> ``` **Testing Requirements**: - [ ] Unit test: Reject invalid deliverable IDs - [ ] Unit test: Accept valid deliverable IDs for each phase - [ ] Integration test: API returns 400 for invalid deliverable - [ ] Regression test: Existing valid deliverables still work --- ### Task 1.2: Version Conflict Detection **Problem**: Phase changes can fail silently when version conflicts occur. | Attribute | Value | |-----------|-------| | Complexity | Medium | | Dependencies | None | | Risk | Low | **Files to Modify**: - `makima/src/db/repository.rs` - Enhance `change_contract_phase_for_owner()` with explicit version checking - `makima/src/server/handlers/contracts.rs` - Return proper error responses for conflicts - `makima/src/db/models.rs` - Add `PhaseChangeResult` enum **Implementation**: ```rust pub enum PhaseChangeResult { Success(Contract), VersionConflict { expected: i32, actual: i32, current_phase: String }, ValidationFailed { reason: String, missing_requirements: Vec }, Unauthorized, NotFound, } ``` **Testing Requirements**: - [ ] Unit test: Detect version mismatch - [ ] Unit test: Proper locking with `FOR UPDATE` - [ ] Integration test: Concurrent phase changes handled correctly - [ ] Integration test: API returns 409 Conflict with details --- ### Task 1.3: Phase Guard Enforcement **Problem**: Supervisors can bypass `phase_guard` setting. | Attribute | Value | |-----------|-------| | Complexity | Small | | Dependencies | None | | Risk | Low | **Files to Modify**: - `makima/src/server/handlers/contracts.rs` - Enforce phase_guard at API level - `makima/src/daemon/api/supervisor.rs` - Handle confirmation requirement response **Implementation**: - Move phase_guard check from supervisor logic to API handler - All callers (including supervisors) must provide `confirmed: true` if phase_guard enabled - Return `RequiresConfirmation` response with deliverables list **Testing Requirements**: - [ ] Unit test: Supervisor blocked by phase_guard - [ ] Unit test: Supervisor proceeds with confirmation - [ ] Integration test: API enforces regardless of caller --- ### Task 1.4: Enhanced Contract Status Tracking **Problem**: Only 3 states (active/completed/archived) don't capture real contract state. | Attribute | Value | |-----------|-------| | Complexity | Medium | | Dependencies | Task 1.1, 1.2, 1.3 | | Risk | Medium (schema change) | **Files to Create**: - `makima/migrations/YYYYMMDDHHMMSS_enhanced_contract_status.sql` **Files to Modify**: - `makima/src/db/models.rs` - Extend `ContractStatus` enum - `makima/src/db/repository.rs` - Add status transition validation - `makima/src/server/handlers/contracts.rs` - Add status change endpoint **Database Changes**: ```sql -- New columns for status tracking ALTER TABLE contracts ADD COLUMN IF NOT EXISTS status_changed_at TIMESTAMPTZ DEFAULT NOW(); ALTER TABLE contracts ADD COLUMN IF NOT EXISTS last_activity_at TIMESTAMPTZ DEFAULT NOW(); ALTER TABLE contracts ADD COLUMN IF NOT EXISTS waiting_for TEXT; ALTER TABLE contracts ADD COLUMN IF NOT EXISTS blocked_reason TEXT; ALTER TABLE contracts ADD COLUMN IF NOT EXISTS failure_reason TEXT; CREATE INDEX IF NOT EXISTS idx_contracts_status ON contracts(status); CREATE INDEX IF NOT EXISTS idx_contracts_last_activity ON contracts(last_activity_at); -- Backfill last_activity_at UPDATE contracts SET last_activity_at = updated_at WHERE last_activity_at IS NULL; ``` **New Status Values** (additive, backwards compatible): ```rust pub enum ContractStatus { // Existing (keep as-is for backwards compatibility) Active, Completed, Archived, // New states Created, // Not yet started WaitingForInput, // Pending user input Paused, // User-requested pause Blocked, // External dependency Completing, // Running final validation Failed, // Unrecoverable error } ``` **Migration Strategy**: - Default new contracts to `Created`, transition to `Active` when supervisor starts - Existing `active` contracts remain `active` (no migration needed) - New states only used for new contracts initially **Testing Requirements**: - [ ] Unit test: Valid state transitions allowed - [ ] Unit test: Invalid state transitions rejected - [ ] Integration test: Status change triggers event recording - [ ] Migration test: Backfill runs correctly - [ ] Backwards compatibility: Old API responses still work --- ### Task 1.5: Activity Timestamp Tracking **Problem**: No reliable way to detect stale contracts. | Attribute | Value | |-----------|-------| | Complexity | Small | | Dependencies | Task 1.4 | | Risk | Low | **Files to Modify**: - `makima/src/db/repository.rs` - Update `last_activity_at` on relevant operations - `makima/src/server/handlers/contracts.rs` - Update activity on API calls - `makima/src/server/handlers/mesh_supervisor.rs` - Update activity on supervisor events **Operations that update activity**: - Phase change - Deliverable marked complete - Task spawned/completed - Chat message sent/received - Supervisor heartbeat received **Testing Requirements**: - [ ] Unit test: Activity updated on each operation - [ ] Integration test: Query stale contracts by threshold --- ### Phase 1 Deliverables Checklist - [ ] Deliverable validation implemented and tested - [ ] Version conflict detection with proper error responses - [ ] Phase guard enforced at API level - [ ] Database migration for new status columns - [ ] Extended ContractStatus enum - [ ] Status transition validation - [ ] Activity timestamp tracking - [ ] All Phase 1 tests passing - [ ] Documentation updated --- ## Phase 2: Completion Detection + Auto-Advance **Goal**: Enable automatic detection when phases/contracts are ready to advance, reducing manual intervention. **Priority**: HIGH **Duration**: 1 week **Breaking Changes**: None (opt-in via configuration) ### Task 2.1: Phase Completion Gates **Problem**: No automatic validation of phase requirements before advancement. | Attribute | Value | |-----------|-------| | Complexity | Medium | | Dependencies | Phase 1 complete | | Risk | Low (opt-in) | **Files to Create**: - `makima/src/server/services/completion_detector.rs` - New service module **Files to Modify**: - `makima/src/db/models.rs` - Add `PhaseCompletionGate`, `TaskRequirement` types - `makima/src/server/mod.rs` - Register completion detector service **Implementation**: ```rust pub struct PhaseCompletionGate { pub required_deliverables: Vec, pub required_tasks: TaskRequirement, pub custom_validator: Option bool>>, pub auto_advance: bool, } pub enum TaskRequirement { None, AllComplete, MinComplete(usize), NamedTasks(Vec), } pub enum PhaseReadinessResult { Ready, NotReady { missing: Vec }, NoGate, } ``` **Testing Requirements**: - [ ] Unit test: Gate with all deliverables required - [ ] Unit test: Gate with task requirements - [ ] Unit test: Mixed deliverable + task requirements - [ ] Integration test: API endpoint returns readiness status --- ### Task 2.2: Phase Readiness API **Problem**: No way to check if phase is ready to advance without attempting transition. | Attribute | Value | |-----------|-------| | Complexity | Small | | Dependencies | Task 2.1 | | Risk | Low | **Files to Modify**: - `makima/src/server/handlers/contracts.rs` - Add `check_phase_readiness()` endpoint - `makima/src/server/routes.rs` - Register new route **New Endpoint**: ``` GET /api/v1/contracts/{id}/phase-readiness Response: { "phase": "plan", "ready": false, "missing": ["Deliverable: plan-document", "1 tasks incomplete"], "auto_advance_enabled": true } ``` **Testing Requirements**: - [ ] Integration test: Correct readiness for each phase - [ ] Integration test: Response includes all missing items --- ### Task 2.3: Auto-Advance Logic **Problem**: Users must manually advance phases even when all requirements met. | Attribute | Value | |-----------|-------| | Complexity | Medium | | Dependencies | Task 2.1, 2.2 | | Risk | Medium | **Files to Create**: - `makima/migrations/YYYYMMDDHHMMSS_auto_complete_settings.sql` **Files to Modify**: - `makima/src/server/services/completion_detector.rs` - Add auto-advance trigger - `makima/src/server/handlers/contracts.rs` - Hook into task completion - `makima/src/db/models.rs` - Add `auto_complete_enabled` field **Database Changes**: ```sql ALTER TABLE contracts ADD COLUMN IF NOT EXISTS auto_complete_enabled BOOLEAN DEFAULT TRUE; ALTER TABLE contracts ADD COLUMN IF NOT EXISTS completion_detected_at TIMESTAMPTZ; ``` **Auto-Advance Flow**: 1. Task completes or deliverable marked 2. Check if phase gate satisfied 3. If auto_advance enabled and gate satisfied: - If phase_guard enabled: Set status = WaitingForInput - If phase_guard disabled: Auto-advance to next phase 4. If terminal phase and all requirements met: - Set status = Completing - Run cleanup (worktrees, etc.) - Set status = Completed **Feature Flag**: ```toml [contracts] auto_complete_enabled = true # Global default auto_advance_phases = true # Allow phase auto-advance ``` **Testing Requirements**: - [ ] Unit test: Auto-advance triggers on gate satisfaction - [ ] Unit test: Phase guard blocks auto-advance - [ ] Unit test: Terminal phase triggers completion flow - [ ] Integration test: Full contract auto-completes - [ ] Integration test: Disable auto-complete per contract --- ### Task 2.4: Contract Completion API **Problem**: No way to force-check or manually trigger completion flow. | Attribute | Value | |-----------|-------| | Complexity | Small | | Dependencies | Task 2.3 | | Risk | Low | **Files to Modify**: - `makima/src/server/handlers/contracts.rs` - Add `check_completion()` and `set_auto_complete()` endpoints **New Endpoints**: ``` POST /api/v1/contracts/{id}/check-completion Response: { "ready": true, "would_complete": true, "missing": [] } PUT /api/v1/contracts/{id}/auto-complete Body: { "enabled": true } Response: ContractSummary ``` **Testing Requirements**: - [ ] Integration test: Manual completion check - [ ] Integration test: Toggle auto-complete setting --- ### Phase 2 Deliverables Checklist - [ ] PhaseCompletionGate implementation - [ ] TaskRequirement types - [ ] CompletionDetector service - [ ] Phase readiness API endpoint - [ ] Auto-advance logic with phase_guard respect - [ ] Contract completion flow - [ ] Auto-complete toggle per contract - [ ] Feature flags in configuration - [ ] All Phase 2 tests passing - [ ] Documentation updated --- ## Phase 3: Supervisor Heartbeat + Restoration **Goal**: Reliable supervisor state tracking and restoration after crashes. **Priority**: HIGH **Duration**: 1.5 weeks **Breaking Changes**: Minor (new heartbeat protocol) ### Task 3.1: Enhanced Supervisor State Enum **Problem**: Current supervisor state doesn't capture activity details. | Attribute | Value | |-----------|-------| | Complexity | Small | | Dependencies | Phase 1, Phase 2 | | Risk | Low | **Files to Modify**: - `makima/src/db/models.rs` - Add `SupervisorState` enum - `makima/src/daemon/ws/protocol.rs` - Add state to heartbeat message **New Enum**: ```rust pub enum SupervisorState { Initializing, Idle, Working, WaitingForUser, WaitingForTasks, Blocked, Completed, Failed, Interrupted, } ``` **Testing Requirements**: - [ ] Unit test: State serialization/deserialization - [ ] Unit test: All state values recognized --- ### Task 3.2: Heartbeat Infrastructure **Problem**: No reliable way to detect dead supervisors. | Attribute | Value | |-----------|-------| | Complexity | Medium | | Dependencies | Task 3.1 | | Risk | Medium | **Files to Create**: - `makima/migrations/YYYYMMDDHHMMSS_supervisor_heartbeats.sql` **Files to Modify**: - `makima/src/db/models.rs` - Add `SupervisorHeartbeat` struct - `makima/src/db/repository.rs` - Add heartbeat storage functions - `makima/src/daemon/ws/protocol.rs` - Enhanced heartbeat message - `makima/src/daemon/ws/client.rs` - Send enhanced heartbeats - `makima/src/server/handlers/mesh_daemon.rs` - Process heartbeats **Database Changes**: ```sql CREATE TABLE IF NOT EXISTS supervisor_heartbeats ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), supervisor_task_id UUID NOT NULL REFERENCES tasks(id) ON DELETE CASCADE, contract_id UUID NOT NULL REFERENCES contracts(id) ON DELETE CASCADE, state VARCHAR(50) NOT NULL, phase VARCHAR(50) NOT NULL, current_activity TEXT, progress INTEGER DEFAULT 0, pending_task_ids UUID[] DEFAULT ARRAY[]::UUID[], timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW() ); CREATE INDEX idx_heartbeats_supervisor ON supervisor_heartbeats(supervisor_task_id); CREATE INDEX idx_heartbeats_timestamp ON supervisor_heartbeats(timestamp); -- Retention policy: keep only 24 hours ALTER TABLE supervisor_heartbeats ADD CONSTRAINT heartbeat_ttl CHECK (timestamp > NOW() - INTERVAL '24 hours'); ``` **Enhanced Heartbeat Message**: ```rust pub struct SupervisorHeartbeat { pub task_id: Uuid, pub contract_id: Uuid, pub state: SupervisorState, pub phase: String, pub current_activity: String, pub progress: u8, pub pending_task_ids: Vec, pub timestamp: DateTime, } ``` **Configuration**: ```toml [contracts] heartbeat_interval_seconds = 30 heartbeat_timeout_seconds = 120 ``` **Testing Requirements**: - [ ] Unit test: Heartbeat message serialization - [ ] Integration test: Heartbeats stored correctly - [ ] Integration test: Old heartbeats cleaned up - [ ] Load test: Handle many concurrent heartbeats --- ### Task 3.3: Supervisor State Persistence **Problem**: Supervisor context lost after daemon crash. | Attribute | Value | |-----------|-------| | Complexity | Large | | Dependencies | Task 3.2 | | Risk | High | **Files to Create**: - `makima/migrations/YYYYMMDDHHMMSS_enhanced_supervisor_state.sql` **Files to Modify**: - `makima/src/db/models.rs` - Enhance `SupervisorState` table model - `makima/src/db/repository.rs` - State persistence functions - `makima/src/server/handlers/mesh_supervisor.rs` - State save/restore logic - `makima/src/daemon/ws/client.rs` - Send state updates **Database Changes**: ```sql ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS state VARCHAR(50) NOT NULL DEFAULT 'initializing'; ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS current_activity TEXT; ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS progress INTEGER DEFAULT 0; ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS error_message TEXT; ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS spawned_task_ids UUID[] DEFAULT ARRAY[]::UUID[]; ``` **Persistent State Model**: ```rust pub struct SupervisorPersistentState { pub state: SupervisorState, pub phase: String, pub conversation_history: Vec, pub pending_questions: Vec, pub spawned_task_ids: Vec, pub waiting_on_task_ids: Vec, pub last_checkpoint: Option, pub current_activity: String, pub error: Option, pub created_at: DateTime, pub updated_at: DateTime, } ``` **State Save Points**: - On every LLM response - On task spawn - On question asked - On phase change - On heartbeat (lightweight update) **Testing Requirements**: - [ ] Unit test: State serialization round-trip - [ ] Integration test: State persisted on each save point - [ ] Integration test: State survives daemon restart --- ### Task 3.4: Supervisor Restoration Protocol **Problem**: No reliable restoration after crash. | Attribute | Value | |-----------|-------| | Complexity | Large | | Dependencies | Task 3.3 | | Risk | High | **Files to Modify**: - `makima/src/server/handlers/mesh_supervisor.rs` - Restoration logic - `makima/src/server/handlers/mesh_daemon.rs` - Task reassignment - `makima/src/daemon/ws/client.rs` - Handle restoration handoff **Restoration Flow**: ``` 1. Daemon restarts or task reassigned │ ▼ 2. Load supervisor state from supervisor_states │ ├─ NOT FOUND ──► Start fresh, log warning │ ▼ FOUND 3. Validate state consistency │ ├─ INVALID ──► Start from last checkpoint │ ▼ VALID 4. Restore conversation history │ ▼ 5. Check for pending questions │ ├─ HAS PENDING ──► Re-deliver to user │ ▼ NO PENDING 6. Check for waiting tasks │ ├─ HAS WAITING ──► Resume waiting state │ ▼ NO WAITING 7. Send restoration context to Claude │ ▼ 8. Resume execution from last state ``` **Testing Requirements**: - [ ] Integration test: Restore after clean shutdown - [ ] Integration test: Restore after crash (simulated) - [ ] Integration test: Handle corrupted state gracefully - [ ] Integration test: Resume with pending questions - [ ] Integration test: Resume with waiting tasks --- ### Task 3.5: Supervisor Status API **Problem**: No way to query supervisor status. | Attribute | Value | |-----------|-------| | Complexity | Small | | Dependencies | Task 3.3 | | Risk | Low | **Files to Modify**: - `makima/src/server/handlers/contracts.rs` - Add supervisor status endpoints - `makima/src/server/routes.rs` - Register routes **New Endpoints**: ``` GET /api/v1/contracts/{id}/supervisor/status Response: { "task_id": "uuid", "state": "working", "phase": "execute", "current_activity": "Implementing user authentication", "progress": 45, "last_heartbeat": "2025-01-31T10:30:00Z", "pending_task_ids": ["uuid1", "uuid2"] } GET /api/v1/contracts/{id}/supervisor/heartbeats?limit=10 Response: { "heartbeats": [ { "timestamp": "...", "state": "...", "activity": "..." } ] } POST /api/v1/contracts/{id}/supervisor/sync Response: { "synced": true, "state": "working" } ``` **Testing Requirements**: - [ ] Integration test: Status endpoint returns correct data - [ ] Integration test: Heartbeat history retrieval - [ ] Integration test: Force sync triggers state update --- ### Phase 3 Deliverables Checklist - [ ] SupervisorState enum with all states - [ ] Heartbeat infrastructure (DB, messages, storage) - [ ] Enhanced supervisor state persistence - [ ] Restoration protocol implementation - [ ] Supervisor status API endpoints - [ ] Configuration for heartbeat intervals - [ ] Cleanup job for old heartbeats - [ ] All Phase 3 tests passing - [ ] Documentation updated --- ## Phase 4: CLI Improvements **Goal**: Enhanced CLI for contract management and monitoring. **Priority**: MEDIUM **Duration**: 1 week **Breaking Changes**: None (additive) ### Task 4.1: Contract List with Filters **Problem**: No way to filter contracts by status, phase, or staleness. | Attribute | Value | |-----------|-------| | Complexity | Medium | | Dependencies | Phase 1 (status tracking) | | Risk | Low | **Files to Modify**: - `makima/src/daemon/cli/contract.rs` - Add list subcommand - `makima/src/daemon/cli/mod.rs` - Register subcommand - `makima/src/daemon/api/contract.rs` - Add list API call **New Command**: ```bash makima contracts list [OPTIONS] Options: --status Filter by status (active,completed,failed) --stale Show only stale contracts --threshold Stale threshold (default: 30m) --waiting Show contracts waiting for input --phase Filter by phase --format Output format (table,json,compact) -n, --limit Limit results ``` **Testing Requirements**: - [ ] Unit test: Argument parsing - [ ] Integration test: Filter by status - [ ] Integration test: Filter by stale - [ ] Integration test: Multiple filters combined - [ ] Integration test: All output formats --- ### Task 4.2: Cleanup Command **Problem**: No automated way to clean up old contracts and worktrees. | Attribute | Value | |-----------|-------| | Complexity | Medium | | Dependencies | Phase 1 | | Risk | Medium (destructive) | **Files to Modify**: - `makima/src/daemon/cli/contract.rs` - Add cleanup subcommand - `makima/src/daemon/api/contract.rs` - Add cleanup API calls **New Command**: ```bash makima contracts cleanup [OPTIONS] Options: --archive Archive completed/failed contracts --delete-archived Delete archived contracts --worktrees Clean up orphaned worktrees --all Run all cleanup operations --older-than Threshold (default: 7d) --dry-run Show what would be affected --force Skip confirmation prompts ``` **Safety Features**: - Dry-run mode by default for destructive operations - Confirmation prompt unless --force specified - Log all deletions **Testing Requirements**: - [ ] Unit test: Argument parsing - [ ] Integration test: Dry-run shows correct count - [ ] Integration test: Archive older than threshold - [ ] Integration test: Delete archived - [ ] Integration test: Worktree cleanup --- ### Task 4.3: Monitor Command **Problem**: No real-time monitoring from CLI. | Attribute | Value | |-----------|-------| | Complexity | Large | | Dependencies | Phase 3 (heartbeats) | | Risk | Low | **Files to Create**: - `makima/src/daemon/cli/monitor.rs` - Monitor command implementation **Files to Modify**: - `makima/src/daemon/cli/mod.rs` - Register monitor subcommand - `makima/src/daemon/tui/` - Enhance TUI for monitoring **New Command**: ```bash makima contracts monitor [CONTRACT_IDS...] [OPTIONS] Options: --status Filter by status --stale Only show stale contracts --quiet Only important events --format Output format (tui,text,json) ``` **Output Formats**: - `tui`: Full terminal UI dashboard - `text`: Plain text event stream - `json`: JSON event stream (for scripting) **Testing Requirements**: - [ ] Unit test: Argument parsing - [ ] Integration test: WebSocket connection established - [ ] Integration test: Events displayed correctly - [ ] Integration test: Filter by contract ID --- ### Task 4.4: Additional Helper Commands **Problem**: Common operations require multiple steps. | Attribute | Value | |-----------|-------| | Complexity | Small per command | | Dependencies | Phase 1, 2, 3 | | Risk | Low | **New Commands**: ```bash # Pause/Resume makima contracts pause --reason "Waiting for review" makima contracts resume # Phase management makima contracts advance --phase execute --force # Supervisor management makima contracts restart-supervisor # Information makima contracts show --verbose makima contracts health # Export makima contracts export --format json --output contract.json ``` **Testing Requirements**: - [ ] Integration test for each command - [ ] Help text for all commands --- ### Phase 4 Deliverables Checklist - [ ] `contracts list` with all filters - [ ] `contracts cleanup` with dry-run - [ ] `contracts monitor` with TUI - [ ] Helper commands (pause, resume, advance, etc.) - [ ] JSON output for scripting - [ ] All Phase 4 tests passing - [ ] CLI documentation updated --- ## Phase 5: Monitoring Dashboard **Goal**: Real-time monitoring with WebSocket events and batch operations. **Priority**: MEDIUM **Duration**: 1.5 weeks **Breaking Changes**: None ### Task 5.1: WebSocket Monitor Events **Problem**: No real-time event stream for monitoring. | Attribute | Value | |-----------|-------| | Complexity | Large | | Dependencies | Phase 3 (heartbeats) | | Risk | Medium | **Files to Create**: - `makima/src/server/handlers/contract_monitor.rs` - WebSocket handler - `makima/src/server/services/event_broadcaster.rs` - Event broadcast service **Files to Modify**: - `makima/src/server/routes.rs` - Register WebSocket endpoint - `makima/src/server/handlers/contracts.rs` - Emit events on changes **Event Types**: ```rust pub enum ContractMonitorEvent { StatusChanged { contract_id, old_status, new_status, reason }, PhaseChanged { contract_id, old_phase, new_phase }, SupervisorStateChanged { contract_id, supervisor_task_id, old_state, new_state }, Heartbeat { contract_id, state, activity, progress }, StaleDetected { contract_id, last_activity, stale_duration }, TaskCompleted { contract_id, task_id, task_name, success }, DeliverableCompleted { contract_id, phase, deliverable_id }, QuestionAsked { contract_id, question_id, question, question_type }, } ``` **WebSocket Endpoint**: ``` WS /api/v1/contracts/monitor Filter message: { "contract_ids": ["uuid1", "uuid2"], "statuses": ["active", "waiting_for_input"], "include_stale": true, "include_heartbeats": true } ``` **Testing Requirements**: - [ ] Unit test: Event serialization - [ ] Integration test: WebSocket connection - [ ] Integration test: Filter by contract ID - [ ] Integration test: Filter by status - [ ] Load test: Many concurrent connections --- ### Task 5.2: Stale Contract Detection Service **Problem**: No automated detection of stale contracts. | Attribute | Value | |-----------|-------| | Complexity | Medium | | Dependencies | Task 5.1 | | Risk | Low | **Files to Create**: - `makima/src/server/services/stale_detector.rs` **Files to Modify**: - `makima/src/server/mod.rs` - Start detector service **Implementation**: ```rust pub struct StaleContractDetector { pool: PgPool, config: ContractTimeoutConfig, event_tx: Sender, } impl StaleContractDetector { pub async fn run(&self) { let mut interval = tokio::time::interval(Duration::from_secs(60)); loop { interval.tick().await; let stale = self.detect_stale_contracts().await; for (contract_id, last_activity) in stale { let _ = self.event_tx.send(ContractMonitorEvent::StaleDetected { contract_id, last_activity, stale_duration: Utc::now() - last_activity, }).await; } } } } ``` **Configuration**: ```toml [contracts] stale_detection_interval_seconds = 60 stale_threshold_minutes = 30 ``` **Testing Requirements**: - [ ] Unit test: Stale detection logic - [ ] Integration test: Events emitted for stale contracts - [ ] Integration test: Configurable threshold --- ### Task 5.3: Batch Operations API **Problem**: No way to perform bulk operations on contracts. | Attribute | Value | |-----------|-------| | Complexity | Medium | | Dependencies | Phase 1 | | Risk | Medium (destructive) | **Files to Modify**: - `makima/src/server/handlers/contracts.rs` - Add batch operation endpoint **New Endpoint**: ``` POST /api/v1/contracts/batch Body: { "operation": "archive_old", "older_than_hours": 168, "status_filter": ["completed", "failed"] } Response: { "operation": "archive_old", "affected_count": 15, "affected_ids": ["uuid1", "uuid2", ...], "errors": [] } ``` **Supported Operations**: - `archive_old` - Archive completed/failed contracts older than threshold - `pause_all` - Pause all active contracts - `resume_all` - Resume all paused contracts - `cleanup_archived` - Delete archived contracts older than threshold - `restart_stale` - Restart stale supervisors **Testing Requirements**: - [ ] Integration test for each operation - [ ] Integration test: Partial failure handling - [ ] Authorization test: Only admins can batch --- ### Task 5.4: Dashboard API **Problem**: No aggregated view of contract health. | Attribute | Value | |-----------|-------| | Complexity | Medium | | Dependencies | Task 5.1, 5.2 | | Risk | Low | **Files to Modify**: - `makima/src/server/handlers/contracts.rs` - Add dashboard endpoint **New Endpoint**: ``` GET /api/v1/contracts/dashboard Response: { "statusCounts": { "active": 10, "completed": 50, "failed": 2 }, "phaseCounts": { "plan": 3, "execute": 5, "review": 2 }, "staleContracts": [ { "id": "uuid", "name": "...", "lastActivity": "...", "staleDurationSecs": 3600 } ], "waitingForInput": [ { "id": "uuid", "name": "...", "waitingFor": "question", "question": "..." } ], "recentEvents": [...], "resourceUsage": { "activeSupervisors": 10, "runningTasks": 25, "pendingTasks": 5, "activeDaemons": 3, "totalWorktrees": 45 } } ``` **Testing Requirements**: - [ ] Integration test: Correct counts - [ ] Integration test: Stale contracts listed - [ ] Integration test: Waiting contracts listed - [ ] Performance test: Dashboard response time --- ### Phase 5 Deliverables Checklist - [ ] WebSocket monitor endpoint - [ ] Event broadcaster service - [ ] All event types implemented - [ ] Stale detection service - [ ] Batch operations API - [ ] Dashboard API - [ ] All Phase 5 tests passing - [ ] API documentation updated --- ## Cross-Cutting Concerns ### Feature Flags All new features should be behind feature flags for gradual rollout: ```toml [feature_flags] enhanced_contract_status = true auto_complete = true supervisor_heartbeat = true monitoring_dashboard = true ``` ### Backwards Compatibility 1. **API Responses**: Include both old and new field names during transition 2. **Status Values**: Old statuses (`active`, `completed`, `archived`) continue to work 3. **CLI**: New commands are additive, existing commands unchanged 4. **Database**: All migrations are additive (no column drops) ### Migration Strategy 1. **Zero-Downtime**: All migrations can run while system is live 2. **Rollback**: Each migration has a corresponding rollback script 3. **Data Backfill**: Run as background job, not blocking migration ### Error Codes | Code | HTTP | Description | |------|------|-------------| | `CONTRACT_NOT_FOUND` | 404 | Contract does not exist | | `INVALID_TRANSITION` | 400 | State transition not allowed | | `VERSION_CONFLICT` | 409 | Optimistic locking conflict | | `PHASE_GUARD_REQUIRED` | 409 | Phase guard confirmation needed | | `INVALID_DELIVERABLE` | 400 | Deliverable ID not valid for phase | | `SUPERVISOR_NOT_FOUND` | 404 | No supervisor for contract | | `SUPERVISOR_DEAD` | 503 | Supervisor heartbeat timeout | | `VALIDATION_FAILED` | 400 | Phase requirements not met | ### Metrics Add Prometheus metrics: - `contracts_by_status` (gauge) - `contracts_stale_count` (gauge) - `phase_transitions_total` (counter) - `completion_detections_total` (counter) - `supervisor_heartbeats_total` (counter) - `supervisor_restarts_total` (counter) - `batch_operations_total` (counter) --- ## Testing Strategy ### Unit Tests - All new functions have unit tests - Mock database for repository tests - State machine transition tests ### Integration Tests - API endpoint tests with test database - WebSocket connection tests - CLI command tests ### Load Tests - Heartbeat throughput (target: 1000/sec) - WebSocket connections (target: 100 concurrent) - Dashboard API response time (target: <100ms) ### Regression Tests - Existing functionality unchanged - Old API responses compatible - Database migrations reversible --- ## Timeline Summary | Week | Phase | Key Deliverables | |------|-------|------------------| | 1 | Phase 1 | Bug fixes, enhanced status, activity tracking | | 2 | Phase 2 | Completion gates, auto-advance, readiness API | | 3-4 | Phase 3 | Heartbeat, state persistence, restoration | | 5 | Phase 4 | CLI list, cleanup, monitor commands | | 6 | Phase 5 | WebSocket events, dashboard, batch ops | --- ## Appendix: File Change Summary ### New Files - `makima/src/server/services/completion_detector.rs` - `makima/src/server/services/stale_detector.rs` - `makima/src/server/services/event_broadcaster.rs` - `makima/src/server/handlers/contract_monitor.rs` - `makima/src/daemon/cli/monitor.rs` - `makima/migrations/YYYYMMDDHHMMSS_enhanced_contract_status.sql` - `makima/migrations/YYYYMMDDHHMMSS_auto_complete_settings.sql` - `makima/migrations/YYYYMMDDHHMMSS_supervisor_heartbeats.sql` - `makima/migrations/YYYYMMDDHHMMSS_enhanced_supervisor_state.sql` ### Modified Files - `makima/src/db/models.rs` - New enums, structs - `makima/src/db/repository.rs` - New queries, validation - `makima/src/server/handlers/contracts.rs` - New endpoints, validation - `makima/src/server/handlers/mesh_supervisor.rs` - State persistence - `makima/src/server/handlers/mesh_daemon.rs` - Heartbeat processing - `makima/src/server/routes.rs` - New routes - `makima/src/daemon/ws/protocol.rs` - Enhanced messages - `makima/src/daemon/ws/client.rs` - Heartbeat sending - `makima/src/daemon/cli/contract.rs` - New subcommands - `makima/src/daemon/cli/mod.rs` - Command registration - `makima/src/daemon/api/contract.rs` - New API calls - `makima/src/daemon/api/supervisor.rs` - Handle new responses - `makima/src/daemon/config.rs` - New configuration options