# Contract Management System Specification **Version**: 1.0.0 **Status**: Draft **Author**: AI Assistant **Date**: 2025-01-31 ## Executive Summary This specification addresses critical issues in the current contract management system: 1. **Manual Completion Required** - Contracts stay 'active' indefinitely 2. **No Phase Readiness Validation** - No automatic checking before phase advancement 3. **Supervisor State Restoration Broken** - Context lost after daemon crash 4. **Version Conflicts Silent** - Phase changes can fail silently 5. **No Deliverable Validation** - Can mark non-existent deliverables as complete 6. **Phase Guard Supervisor Bypass** - Supervisors can bypass phase_guard setting --- ## 1. Contract Lifecycle State Machine ### 1.1 Current State (ContractStatus) The current implementation uses three states: - `active` - Contract is being worked on - `completed` - Contract finished successfully - `archived` - Contract archived (soft delete) ### 1.2 Proposed State Machine ``` ┌─────────────────────────────────────────────┐ │ │ ▼ │ ┌────────┐ ┌─────────┐ ┌──────────────────┐ ┌────────────┴───┐ │ created ├───►│ active ├───►│ waiting_for_input├───►│ completing │ └────────┘ └────┬────┘ └────────┬─────────┘ └───────┬────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────┐ │ │ │ completed │ │ │ └───────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌──────────┐ ┌───────────────┐ │ paused │ │ blocked │ │ archived │ └────┬────┘ └────┬─────┘ └───────────────┘ │ │ ▲ └────────────────┴───────────────────────┤ │ ┌────────┐ │ │ failed ├───────────────────────────────┘ └────────┘ ``` ### 1.3 State Definitions ```rust /// Contract lifecycle states #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] #[serde(rename_all = "snake_case")] pub enum ContractStatus { /// Contract created but not yet started Created, /// Contract is actively being worked on Active, /// Waiting for user input (phase confirmation, question, etc.) WaitingForInput, /// Contract is paused by user request Paused, /// Contract is blocked on external dependency Blocked, /// All phases complete, running final validation Completing, /// Contract completed successfully Completed, /// Contract failed with errors Failed, /// Contract archived (soft delete) Archived, } ``` ### 1.4 State Transitions | From State | To State | Guard Conditions | Trigger | |------------|----------|------------------|---------| | Created | Active | Has supervisor task | Supervisor starts | | Active | WaitingForInput | Pending question exists | Supervisor asks question | | Active | Paused | - | User requests pause | | Active | Blocked | Has external blocker | Blocker detected | | Active | Completing | Final phase, all deliverables met | Auto-completion check | | WaitingForInput | Active | Question answered | User responds | | WaitingForInput | Paused | - | Timeout or user pause | | Paused | Active | - | User resumes | | Blocked | Active | Blocker resolved | Blocker cleared | | Completing | Completed | All cleanup done | Completion confirmed | | Completing | Active | Completion rejected | User rejects | | Any | Failed | Unrecoverable error | Error detected | | Completed | Archived | - | User archives | | Failed | Archived | - | User archives | ### 1.5 Timeout and Stale Detection ```rust /// Configuration for contract timeout and stale detection pub struct ContractTimeoutConfig { /// Time after last supervisor activity before contract is considered stale pub stale_threshold: Duration, // Default: 30 minutes /// Time to wait for user input before timing out pub input_timeout: Duration, // Default: 24 hours /// Time before completing contracts are auto-completed pub completion_grace_period: Duration, // Default: 5 minutes /// Time before archived contracts are deleted pub archive_retention: Duration, // Default: 30 days } impl Default for ContractTimeoutConfig { fn default() -> Self { Self { stale_threshold: Duration::from_secs(30 * 60), input_timeout: Duration::from_secs(24 * 60 * 60), completion_grace_period: Duration::from_secs(5 * 60), archive_retention: Duration::from_secs(30 * 24 * 60 * 60), } } } ``` ### 1.6 Database Schema Changes ```sql -- Add new status column options and tracking fields ALTER TABLE contracts ADD COLUMN IF NOT EXISTS status_changed_at TIMESTAMPTZ DEFAULT NOW(); -- Track last activity for stale detection ALTER TABLE contracts ADD COLUMN IF NOT EXISTS last_activity_at TIMESTAMPTZ DEFAULT NOW(); -- Track pending input ALTER TABLE contracts ADD COLUMN IF NOT EXISTS waiting_for TEXT; -- 'question', 'phase_confirmation', 'completion_confirmation' -- Track blockers ALTER TABLE contracts ADD COLUMN IF NOT EXISTS blocked_reason TEXT; -- Track failure reason ALTER TABLE contracts ADD COLUMN IF NOT EXISTS failure_reason TEXT; -- Index for status queries CREATE INDEX idx_contracts_status ON contracts(status); CREATE INDEX idx_contracts_last_activity ON contracts(last_activity_at); ``` ### 1.7 API Changes ```rust /// Request to change contract state #[derive(Debug, Deserialize)] #[serde(rename_all = "camelCase")] pub struct ChangeStatusRequest { pub target_status: ContractStatus, /// Required for some transitions pub reason: Option, /// For blocking states, what is blocking pub blocker: Option, } /// Response for status change #[derive(Debug, Serialize)] #[serde(rename_all = "camelCase")] pub struct ChangeStatusResponse { pub success: bool, pub previous_status: ContractStatus, pub new_status: ContractStatus, /// If transition failed, why pub rejection_reason: Option, } ``` --- ## 2. Automatic Completion Detection ### 2.1 Current Problem Contracts currently require manual `supervisor_complete()` calls. Supervisors may exit without completing contracts, leaving them active indefinitely. ### 2.2 Proposed Solution: Completion Gates #### 2.2.1 Phase Completion Gates ```rust /// Gate that must be satisfied before advancing to next phase pub struct PhaseCompletionGate { /// Required deliverables for this phase pub required_deliverables: Vec, /// Required tasks to be completed pub required_tasks: TaskRequirement, /// Optional custom validation function pub custom_validator: Option bool>>, /// Whether to auto-advance when gate is satisfied pub auto_advance: bool, } /// Task completion requirements pub enum TaskRequirement { /// No task requirements None, /// All spawned tasks must complete AllComplete, /// At least N tasks must complete MinComplete(usize), /// Specific named tasks must complete NamedTasks(Vec), } ``` #### 2.2.2 Contract Completion Detection ```rust /// Contract completion detector pub struct CompletionDetector { /// Phase-specific gates phase_gates: HashMap, } impl CompletionDetector { /// Check if current phase is ready to advance pub fn check_phase_readiness( &self, contract: &Contract, tasks: &[TaskSummary], ) -> PhaseReadinessResult { let gate = match self.phase_gates.get(&contract.phase) { Some(g) => g, None => return PhaseReadinessResult::NoGate, }; let mut missing = Vec::new(); // Check deliverables let completed = contract.get_completed_deliverables(&contract.phase); for req in &gate.required_deliverables { if !completed.contains(req) { missing.push(format!("Deliverable: {}", req)); } } // Check tasks match &gate.required_tasks { TaskRequirement::None => {}, TaskRequirement::AllComplete => { let incomplete = tasks.iter() .filter(|t| !t.is_supervisor && t.status != "done") .count(); if incomplete > 0 { missing.push(format!("{} tasks incomplete", incomplete)); } }, TaskRequirement::MinComplete(n) => { let complete = tasks.iter() .filter(|t| !t.is_supervisor && t.status == "done") .count(); if complete < *n { missing.push(format!("Need {} tasks complete, have {}", n, complete)); } }, TaskRequirement::NamedTasks(names) => { for name in names { let found = tasks.iter() .find(|t| &t.name == name && t.status == "done"); if found.is_none() { missing.push(format!("Task '{}' not complete", name)); } } } } if missing.is_empty() { PhaseReadinessResult::Ready } else { PhaseReadinessResult::NotReady { missing } } } /// Check if contract should auto-complete pub fn check_contract_completion(&self, contract: &Contract) -> bool { // Must be in terminal phase if contract.phase != contract.terminal_phase_id() { return false; } // Terminal phase gate must be satisfied matches!( self.check_phase_readiness(contract, &[]), PhaseReadinessResult::Ready ) } } /// Result of phase readiness check pub enum PhaseReadinessResult { /// Phase is ready to advance Ready, /// Phase is not ready, with list of missing items NotReady { missing: Vec }, /// No gate defined for this phase NoGate, } ``` #### 2.2.3 Auto-Completion Flow ``` ┌─────────────────────────────────────────────────────────────────────┐ │ AUTO-COMPLETION FLOW │ └─────────────────────────────────────────────────────────────────────┘ 1. Task completes (status = "done") │ ▼ 2. Check if any phase gate is now satisfied │ ├─ NO ──► Return, wait for more tasks │ ▼ YES 3. Is auto_advance enabled for phase? │ ├─ NO ──► Notify user, wait for manual advance │ ▼ YES 4. Is phase_guard enabled? │ ├─ YES ─► Set status = WaitingForInput, ask for confirmation │ ▼ NO 5. Auto-advance to next phase │ ▼ 6. Is this the terminal phase? │ ├─ NO ──► Continue working │ ▼ YES 7. All terminal deliverables complete? │ ├─ NO ──► Continue working │ ▼ YES 8. Set status = Completing │ ▼ 9. Cleanup worktrees, stop supervisor │ ▼ 10. Set status = Completed ``` ### 2.3 Database Schema Changes ```sql -- Track auto-completion state ALTER TABLE contracts ADD COLUMN IF NOT EXISTS auto_complete_enabled BOOLEAN DEFAULT TRUE; -- Track when completion was detected ALTER TABLE contracts ADD COLUMN IF NOT EXISTS completion_detected_at TIMESTAMPTZ; ``` ### 2.4 API Endpoints ```rust /// Check phase readiness /// GET /api/v1/contracts/{id}/phase-readiness pub async fn check_phase_readiness( contract_id: Uuid, ) -> PhaseReadinessResponse; /// Force completion check /// POST /api/v1/contracts/{id}/check-completion pub async fn check_completion( contract_id: Uuid, ) -> CompletionCheckResponse; /// Enable/disable auto-completion /// PUT /api/v1/contracts/{id}/auto-complete pub async fn set_auto_complete( contract_id: Uuid, enabled: bool, ) -> ContractSummary; ``` --- ## 3. Supervisor Status Reporting ### 3.1 Current Problem The `supervisor_states` table exists but: - State is not reliably persisted during daemon operations - Restoration after crash doesn't properly resume context - No clear indication of supervisor's current activity ### 3.2 Proposed Supervisor States ```rust /// Supervisor execution states #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] #[serde(rename_all = "snake_case")] pub enum SupervisorState { /// Supervisor is starting up Initializing, /// Supervisor is idle, no pending work Idle, /// Supervisor is actively working (LLM processing) Working, /// Supervisor is waiting for user input WaitingForUser, /// Supervisor is waiting for child tasks WaitingForTasks, /// Supervisor is blocked on external resource Blocked, /// Supervisor has completed its work Completed, /// Supervisor has failed Failed, /// Supervisor was interrupted Interrupted, } ``` ### 3.3 Heartbeat Mechanism ```rust /// Heartbeat message from supervisor to server #[derive(Debug, Serialize, Deserialize)] pub struct SupervisorHeartbeat { pub task_id: Uuid, pub contract_id: Uuid, pub state: SupervisorState, pub phase: String, /// What the supervisor is currently doing pub current_activity: String, /// Progress percentage (0-100) pub progress: u8, /// IDs of tasks supervisor is waiting on pub pending_task_ids: Vec, /// Timestamp pub timestamp: DateTime, } /// Heartbeat configuration pub struct HeartbeatConfig { /// How often to send heartbeats pub interval: Duration, // Default: 30 seconds /// How long before a supervisor is considered dead pub timeout: Duration, // Default: 2 minutes } ``` ### 3.4 State Persistence ```rust /// Enhanced supervisor state for persistence #[derive(Debug, Clone, Serialize, Deserialize)] pub struct SupervisorPersistentState { /// Current supervisor state pub state: SupervisorState, /// Current contract phase pub phase: String, /// Conversation history for resumption pub conversation_history: Vec, /// Currently pending questions pub pending_questions: Vec, /// Tasks spawned by this supervisor pub spawned_task_ids: Vec, /// Tasks we're waiting on pub waiting_on_task_ids: Vec, /// Last checkpoint created pub last_checkpoint: Option, /// Current activity description pub current_activity: String, /// Error if in failed state pub error: Option, /// Timestamps pub created_at: DateTime, pub updated_at: DateTime, } ``` ### 3.5 Database Schema Changes ```sql -- Enhance supervisor_states table ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS state VARCHAR(50) NOT NULL DEFAULT 'initializing'; ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS current_activity TEXT; ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS progress INTEGER DEFAULT 0; ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS error_message TEXT; ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS spawned_task_ids UUID[] DEFAULT ARRAY[]::UUID[]; -- Create heartbeat tracking table CREATE TABLE IF NOT EXISTS supervisor_heartbeats ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), supervisor_task_id UUID NOT NULL REFERENCES tasks(id) ON DELETE CASCADE, contract_id UUID NOT NULL REFERENCES contracts(id) ON DELETE CASCADE, state VARCHAR(50) NOT NULL, phase VARCHAR(50) NOT NULL, current_activity TEXT, progress INTEGER DEFAULT 0, pending_task_ids UUID[] DEFAULT ARRAY[]::UUID[], timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(), -- Keep only recent heartbeats CONSTRAINT heartbeat_ttl CHECK (timestamp > NOW() - INTERVAL '24 hours') ); CREATE INDEX idx_heartbeats_supervisor ON supervisor_heartbeats(supervisor_task_id); CREATE INDEX idx_heartbeats_timestamp ON supervisor_heartbeats(timestamp); ``` ### 3.6 Restoration Protocol ``` ┌─────────────────────────────────────────────────────────────────────┐ │ SUPERVISOR RESTORATION PROTOCOL │ └─────────────────────────────────────────────────────────────────────┘ 1. Daemon restarts or task is assigned to new daemon │ ▼ 2. Load supervisor state from supervisor_states table │ ├─ NOT FOUND ──► Start fresh, log warning │ ▼ FOUND 3. Validate state consistency │ ├─ INVALID ──► Start from last checkpoint │ ▼ VALID 4. Restore conversation history │ ▼ 5. Check for pending questions │ ├─ HAS PENDING ──► Re-deliver questions to user │ ▼ NO PENDING 6. Check for waiting tasks │ ├─ HAS WAITING ──► Resume waiting state │ ▼ NO WAITING 7. Send restoration context to Claude │ ▼ 8. Resume execution from last state ``` ### 3.7 API Endpoints ```rust /// Get supervisor status /// GET /api/v1/contracts/{id}/supervisor/status pub async fn get_supervisor_status( contract_id: Uuid, ) -> SupervisorStatusResponse; /// Get supervisor heartbeat history /// GET /api/v1/contracts/{id}/supervisor/heartbeats pub async fn get_heartbeats( contract_id: Uuid, limit: Option, ) -> HeartbeatListResponse; /// Force supervisor state sync /// POST /api/v1/contracts/{id}/supervisor/sync pub async fn sync_supervisor_state( contract_id: Uuid, ) -> SyncResponse; ``` --- ## 4. Contract Monitoring Dashboard ### 4.1 Real-Time Status Updates #### 4.1.1 WebSocket Events ```rust /// Contract monitoring events #[derive(Debug, Serialize)] #[serde(tag = "type", rename_all = "snake_case")] pub enum ContractMonitorEvent { /// Contract status changed StatusChanged { contract_id: Uuid, old_status: ContractStatus, new_status: ContractStatus, reason: Option, }, /// Phase changed PhaseChanged { contract_id: Uuid, old_phase: String, new_phase: String, }, /// Supervisor state changed SupervisorStateChanged { contract_id: Uuid, supervisor_task_id: Uuid, old_state: SupervisorState, new_state: SupervisorState, }, /// Supervisor heartbeat received Heartbeat { contract_id: Uuid, state: SupervisorState, activity: String, progress: u8, }, /// Contract became stale StaleDetected { contract_id: Uuid, last_activity: DateTime, stale_duration: Duration, }, /// Task completed TaskCompleted { contract_id: Uuid, task_id: Uuid, task_name: String, success: bool, }, /// Deliverable marked complete DeliverableCompleted { contract_id: Uuid, phase: String, deliverable_id: String, }, /// Question asked (needs user attention) QuestionAsked { contract_id: Uuid, question_id: Uuid, question: String, question_type: String, }, } ``` #### 4.1.2 Subscription API ```rust /// Subscribe to contract monitoring events /// WS /api/v1/contracts/monitor pub async fn monitor_contracts( ws: WebSocket, filter: ContractMonitorFilter, ) -> Result<(), Error>; #[derive(Debug, Deserialize)] pub struct ContractMonitorFilter { /// Filter by specific contract IDs pub contract_ids: Option>, /// Filter by status pub statuses: Option>, /// Include stale detection events pub include_stale: bool, /// Include heartbeat events pub include_heartbeats: bool, } ``` ### 4.2 Stale Contract Detection ```rust /// Stale contract detector service pub struct StaleContractDetector { pool: PgPool, config: ContractTimeoutConfig, } impl StaleContractDetector { /// Run stale detection loop pub async fn run(&self, event_tx: Sender) { let mut interval = tokio::time::interval(Duration::from_secs(60)); loop { interval.tick().await; let stale = self.detect_stale_contracts().await; for (contract_id, last_activity) in stale { let _ = event_tx.send(ContractMonitorEvent::StaleDetected { contract_id, last_activity, stale_duration: Utc::now() - last_activity, }).await; } } } /// Detect stale contracts async fn detect_stale_contracts(&self) -> Vec<(Uuid, DateTime)> { let threshold = Utc::now() - self.config.stale_threshold; sqlx::query_as::<_, (Uuid, DateTime)>( r#" SELECT id, last_activity_at FROM contracts WHERE status = 'active' AND last_activity_at < $1 "# ) .bind(threshold) .fetch_all(&self.pool) .await .unwrap_or_default() } } ``` ### 4.3 Batch Operations ```rust /// Batch operation types #[derive(Debug, Deserialize)] #[serde(tag = "operation", rename_all = "snake_case")] pub enum BatchOperation { /// Archive completed contracts older than threshold ArchiveOld { older_than: Duration, status_filter: Vec, }, /// Pause all active contracts PauseAll { reason: String, }, /// Resume all paused contracts ResumeAll, /// Delete archived contracts older than threshold CleanupArchived { older_than: Duration, }, /// Restart stale supervisors RestartStale { stale_threshold: Duration, }, } /// Batch operation result #[derive(Debug, Serialize)] pub struct BatchOperationResult { pub operation: String, pub affected_count: usize, pub affected_ids: Vec, pub errors: Vec, } #[derive(Debug, Serialize)] pub struct BatchOperationError { pub contract_id: Uuid, pub error: String, } ``` ### 4.4 Dashboard API ```rust /// Get dashboard summary /// GET /api/v1/contracts/dashboard pub async fn get_dashboard() -> DashboardResponse; #[derive(Debug, Serialize)] #[serde(rename_all = "camelCase")] pub struct DashboardResponse { /// Count by status pub status_counts: HashMap, /// Count by phase (for active contracts) pub phase_counts: HashMap, /// Stale contracts pub stale_contracts: Vec, /// Contracts waiting for input pub waiting_for_input: Vec, /// Recent activity pub recent_events: Vec, /// Resource usage pub resource_usage: ResourceUsage, } #[derive(Debug, Serialize)] #[serde(rename_all = "camelCase")] pub struct StaleContractInfo { pub id: Uuid, pub name: String, pub phase: String, pub last_activity: DateTime, pub stale_duration_secs: i64, } #[derive(Debug, Serialize)] #[serde(rename_all = "camelCase")] pub struct WaitingContractInfo { pub id: Uuid, pub name: String, pub waiting_for: String, // 'question', 'phase_confirmation', etc. pub waiting_since: DateTime, pub question: Option, } #[derive(Debug, Serialize)] #[serde(rename_all = "camelCase")] pub struct ResourceUsage { pub active_supervisors: usize, pub running_tasks: usize, pub pending_tasks: usize, pub active_daemons: usize, pub total_worktrees: usize, } ``` --- ## 5. Improved CLI Commands ### 5.1 Contract Listing with Filters ```bash # List all contracts makima contracts list # List with status filter makima contracts list --status active makima contracts list --status completed,failed # List stale contracts makima contracts list --stale makima contracts list --stale --threshold 30m # List contracts waiting for input makima contracts list --waiting # List by phase makima contracts list --phase execute # Combine filters makima contracts list --status active --phase plan --stale # Output formats makima contracts list --format json makima contracts list --format table makima contracts list --format compact ``` #### Implementation ```rust #[derive(Debug, Args)] pub struct ListContractsArgs { /// Filter by status (comma-separated) #[arg(long)] pub status: Option, /// Show only stale contracts #[arg(long)] pub stale: bool, /// Stale threshold (e.g., "30m", "1h") #[arg(long, default_value = "30m")] pub threshold: String, /// Show contracts waiting for input #[arg(long)] pub waiting: bool, /// Filter by phase #[arg(long)] pub phase: Option, /// Output format #[arg(long, default_value = "table")] pub format: OutputFormat, /// Limit results #[arg(long, short = 'n')] pub limit: Option, } ``` ### 5.2 Cleanup Command ```bash # Archive completed contracts older than 7 days makima contracts cleanup --archive --older-than 7d # Delete archived contracts older than 30 days makima contracts cleanup --delete-archived --older-than 30d # Dry run (show what would be affected) makima contracts cleanup --archive --older-than 7d --dry-run # Force cleanup without confirmation makima contracts cleanup --archive --older-than 7d --force # Cleanup stale worktrees makima contracts cleanup --worktrees # Full cleanup: archive old, delete archived, clean worktrees makima contracts cleanup --all --older-than 7d ``` #### Implementation ```rust #[derive(Debug, Args)] pub struct CleanupContractsArgs { /// Archive completed/failed contracts #[arg(long)] pub archive: bool, /// Delete archived contracts #[arg(long)] pub delete_archived: bool, /// Clean up orphaned worktrees #[arg(long)] pub worktrees: bool, /// Run all cleanup operations #[arg(long)] pub all: bool, /// Threshold for cleanup (e.g., "7d", "30d") #[arg(long, default_value = "7d")] pub older_than: String, /// Dry run - show what would be affected #[arg(long)] pub dry_run: bool, /// Skip confirmation prompts #[arg(long)] pub force: bool, } ``` ### 5.3 Monitor Command ```bash # Real-time monitoring dashboard makima contracts monitor # Monitor specific contracts makima contracts monitor # Monitor with filters makima contracts monitor --status active makima contracts monitor --stale # Quiet mode - only show important events makima contracts monitor --quiet # JSON output for scripting makima contracts monitor --format json ``` #### Implementation ```rust #[derive(Debug, Args)] pub struct MonitorContractsArgs { /// Contract IDs to monitor (empty = all) pub contract_ids: Vec, /// Filter by status #[arg(long)] pub status: Option, /// Only show stale contracts #[arg(long)] pub stale: bool, /// Quiet mode - only important events #[arg(long, short)] pub quiet: bool, /// Output format #[arg(long, default_value = "tui")] pub format: MonitorFormat, } #[derive(Debug, Clone, ValueEnum)] pub enum MonitorFormat { /// Terminal UI dashboard Tui, /// Plain text output Text, /// JSON stream Json, } ``` ### 5.4 Additional Commands ```bash # Resume a paused contract makima contracts resume # Pause an active contract makima contracts pause --reason "Waiting for external review" # Force advance phase makima contracts advance --phase execute --force # Restart stale supervisor makima contracts restart-supervisor # Show contract details makima contracts show --verbose # Check contract health makima contracts health # Export contract history makima contracts export --format json --output contract.json ``` --- ## 6. Bug Fixes ### 6.1 Version Conflicts (Silent Failures) **Problem**: Phase changes can fail silently when version conflicts occur. **Solution**: Implement explicit version checking and conflict reporting. ```rust /// Result type for phase changes with explicit conflict handling pub enum PhaseChangeResult { Success(Contract), VersionConflict { expected: i32, actual: i32, current_phase: String, }, ValidationFailed { reason: String, missing_requirements: Vec, }, Unauthorized, NotFound, } /// Enhanced phase change handler pub async fn change_phase_with_validation( pool: &PgPool, contract_id: Uuid, owner_id: Uuid, new_phase: &str, expected_version: Option, ) -> Result { // Start transaction let mut tx = pool.begin().await?; // Get current contract with lock let contract = sqlx::query_as::<_, Contract>( "SELECT * FROM contracts WHERE id = $1 AND owner_id = $2 FOR UPDATE" ) .bind(contract_id) .bind(owner_id) .fetch_optional(&mut *tx) .await?; let contract = match contract { Some(c) => c, None => return Ok(PhaseChangeResult::NotFound), }; // Check version if provided if let Some(expected) = expected_version { if contract.version != expected { return Ok(PhaseChangeResult::VersionConflict { expected, actual: contract.version, current_phase: contract.phase.clone(), }); } } // Validate phase transition let validation = validate_phase_transition(&contract, new_phase); if !validation.valid { return Ok(PhaseChangeResult::ValidationFailed { reason: validation.reason, missing_requirements: validation.missing, }); } // Update phase let updated = sqlx::query_as::<_, Contract>( r#" UPDATE contracts SET phase = $1, version = version + 1, updated_at = NOW() WHERE id = $2 RETURNING * "# ) .bind(new_phase) .bind(contract_id) .fetch_one(&mut *tx) .await?; tx.commit().await?; Ok(PhaseChangeResult::Success(updated)) } ``` ### 6.2 Deliverable Validation **Problem**: Can mark non-existent deliverables as complete. **Solution**: Validate deliverable IDs before marking complete. ```rust /// Validate deliverable exists for contract type and phase pub fn validate_deliverable( contract_type: &str, phase: &str, deliverable_id: &str, phase_config: Option<&PhaseConfig>, ) -> Result<(), DeliverableValidationError> { let deliverables = if let Some(config) = phase_config { get_phase_deliverables_from_config(phase, config) } else { get_phase_deliverables_for_type(phase, contract_type) }; let valid_ids: Vec<&str> = deliverables .deliverables .iter() .map(|d| d.id.as_str()) .collect(); if !valid_ids.contains(&deliverable_id) { return Err(DeliverableValidationError::InvalidDeliverable { deliverable_id: deliverable_id.to_string(), phase: phase.to_string(), valid_ids: valid_ids.into_iter().map(String::from).collect(), }); } Ok(()) } #[derive(Debug, thiserror::Error)] pub enum DeliverableValidationError { #[error("Invalid deliverable '{deliverable_id}' for {phase} phase. Valid IDs: {valid_ids:?}")] InvalidDeliverable { deliverable_id: String, phase: String, valid_ids: Vec, }, } ``` ### 6.3 Phase Guard Bypass **Problem**: Supervisors can bypass phase_guard setting. **Solution**: Enforce phase_guard at the API level, not just in supervisor logic. ```rust /// Enhanced phase change with phase_guard enforcement pub async fn change_phase_enforced( pool: &PgPool, contract_id: Uuid, owner_id: Uuid, request: ChangePhaseRequest, is_supervisor: bool, ) -> Result { let contract = get_contract_for_owner(pool, contract_id, owner_id).await? .ok_or_else(|| Error::NotFound)?; // Phase guard is enforced for EVERYONE, including supervisors if contract.phase_guard && !request.confirmed.unwrap_or(false) { // Must return phase review info, regardless of caller return Ok(PhaseChangeResponse::RequiresConfirmation { current_phase: contract.phase, next_phase: request.phase, deliverables: get_phase_deliverables(&contract.phase), message: "Phase guard is enabled. User confirmation required.".to_string(), }); } // Proceed with phase change // ... } ``` --- ## 7. Migration Plan ### 7.1 Phase 1: Database Schema (Week 1) 1. Add new columns to `contracts` table 2. Add new columns to `supervisor_states` table 3. Create `supervisor_heartbeats` table 4. Create indexes 5. Backfill `last_activity_at` from existing data ```sql -- Migration 001: Contract status enhancements ALTER TABLE contracts ADD COLUMN IF NOT EXISTS status_changed_at TIMESTAMPTZ DEFAULT NOW(); ALTER TABLE contracts ADD COLUMN IF NOT EXISTS last_activity_at TIMESTAMPTZ DEFAULT NOW(); ALTER TABLE contracts ADD COLUMN IF NOT EXISTS waiting_for TEXT; ALTER TABLE contracts ADD COLUMN IF NOT EXISTS blocked_reason TEXT; ALTER TABLE contracts ADD COLUMN IF NOT EXISTS failure_reason TEXT; ALTER TABLE contracts ADD COLUMN IF NOT EXISTS auto_complete_enabled BOOLEAN DEFAULT TRUE; ALTER TABLE contracts ADD COLUMN IF NOT EXISTS completion_detected_at TIMESTAMPTZ; CREATE INDEX IF NOT EXISTS idx_contracts_status ON contracts(status); CREATE INDEX IF NOT EXISTS idx_contracts_last_activity ON contracts(last_activity_at); -- Backfill last_activity_at from updated_at UPDATE contracts SET last_activity_at = updated_at WHERE last_activity_at IS NULL; ``` ### 7.2 Phase 2: Core Logic (Week 2) 1. Implement `ContractStatus` enum with new states 2. Implement state transition validation 3. Implement `CompletionDetector` 4. Update phase change handlers with validation 5. Implement deliverable validation ### 7.3 Phase 3: Supervisor Enhancements (Week 3) 1. Implement `SupervisorState` enum 2. Implement heartbeat mechanism 3. Implement state persistence 4. Implement restoration protocol 5. Update supervisor API endpoints ### 7.4 Phase 4: Monitoring (Week 4) 1. Implement WebSocket monitoring events 2. Implement stale detection service 3. Implement batch operations 4. Implement dashboard API ### 7.5 Phase 5: CLI (Week 5) 1. Implement `contracts list` with filters 2. Implement `contracts cleanup` 3. Implement `contracts monitor` 4. Implement additional helper commands ### 7.6 Phase 6: Testing & Rollout (Week 6) 1. Unit tests for all new components 2. Integration tests for state machines 3. Load testing for monitoring 4. Staged rollout with feature flags 5. Documentation updates --- ## 8. Appendix ### 8.1 Configuration Options ```toml [contracts] # Timeout configuration stale_threshold_minutes = 30 input_timeout_hours = 24 completion_grace_period_minutes = 5 archive_retention_days = 30 # Auto-completion auto_complete_enabled = true auto_advance_phases = true # Heartbeat heartbeat_interval_seconds = 30 heartbeat_timeout_seconds = 120 # Monitoring monitor_ws_buffer_size = 1000 stale_detection_interval_seconds = 60 ``` ### 8.2 Error Codes | Code | Description | |------|-------------| | `CONTRACT_NOT_FOUND` | Contract does not exist | | `INVALID_TRANSITION` | State transition not allowed | | `VERSION_CONFLICT` | Optimistic locking conflict | | `PHASE_GUARD_REQUIRED` | Phase guard confirmation needed | | `INVALID_DELIVERABLE` | Deliverable ID not valid for phase | | `SUPERVISOR_NOT_FOUND` | No supervisor for contract | | `SUPERVISOR_DEAD` | Supervisor heartbeat timeout | | `VALIDATION_FAILED` | Phase requirements not met | ### 8.3 Metrics The following metrics should be tracked: - `contracts_by_status` (gauge) - Count of contracts by status - `contracts_stale_count` (gauge) - Number of stale contracts - `phase_transitions_total` (counter) - Phase changes by from/to - `completion_detections_total` (counter) - Auto-completions detected - `supervisor_heartbeats_total` (counter) - Heartbeats received - `supervisor_restarts_total` (counter) - Supervisor restarts - `batch_operations_total` (counter) - Batch operations by type