# Contract Management System Specification
**Version**: 1.0.0
**Status**: Draft
**Author**: AI Assistant
**Date**: 2025-01-31
## Executive Summary
This specification addresses critical issues in the current contract management system:
1. **Manual Completion Required** - Contracts stay 'active' indefinitely
2. **No Phase Readiness Validation** - No automatic checking before phase advancement
3. **Supervisor State Restoration Broken** - Context lost after daemon crash
4. **Version Conflicts Silent** - Phase changes can fail silently
5. **No Deliverable Validation** - Can mark non-existent deliverables as complete
6. **Phase Guard Supervisor Bypass** - Supervisors can bypass phase_guard setting
---
## 1. Contract Lifecycle State Machine
### 1.1 Current State (ContractStatus)
The current implementation uses three states:
- `active` - Contract is being worked on
- `completed` - Contract finished successfully
- `archived` - Contract archived (soft delete)
### 1.2 Proposed State Machine
```
┌─────────────────────────────────────────────┐
│ │
▼ │
┌────────┐ ┌─────────┐ ┌──────────────────┐ ┌────────────┴───┐
│ created ├───►│ active ├───►│ waiting_for_input├───►│ completing │
└────────┘ └────┬────┘ └────────┬─────────┘ └───────┬────────┘
│ │ │
│ │ ▼
│ │ ┌───────────────┐
│ │ │ completed │
│ │ └───────────────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌───────────────┐
│ paused │ │ blocked │ │ archived │
└────┬────┘ └────┬─────┘ └───────────────┘
│ │ ▲
└────────────────┴───────────────────────┤
│
┌────────┐ │
│ failed ├───────────────────────────────┘
└────────┘
```
### 1.3 State Definitions
```rust
/// Contract lifecycle states
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum ContractStatus {
/// Contract created but not yet started
Created,
/// Contract is actively being worked on
Active,
/// Waiting for user input (phase confirmation, question, etc.)
WaitingForInput,
/// Contract is paused by user request
Paused,
/// Contract is blocked on external dependency
Blocked,
/// All phases complete, running final validation
Completing,
/// Contract completed successfully
Completed,
/// Contract failed with errors
Failed,
/// Contract archived (soft delete)
Archived,
}
```
### 1.4 State Transitions
| From State | To State | Guard Conditions | Trigger |
|------------|----------|------------------|---------|
| Created | Active | Has supervisor task | Supervisor starts |
| Active | WaitingForInput | Pending question exists | Supervisor asks question |
| Active | Paused | - | User requests pause |
| Active | Blocked | Has external blocker | Blocker detected |
| Active | Completing | Final phase, all deliverables met | Auto-completion check |
| WaitingForInput | Active | Question answered | User responds |
| WaitingForInput | Paused | - | Timeout or user pause |
| Paused | Active | - | User resumes |
| Blocked | Active | Blocker resolved | Blocker cleared |
| Completing | Completed | All cleanup done | Completion confirmed |
| Completing | Active | Completion rejected | User rejects |
| Any | Failed | Unrecoverable error | Error detected |
| Completed | Archived | - | User archives |
| Failed | Archived | - | User archives |
### 1.5 Timeout and Stale Detection
```rust
/// Configuration for contract timeout and stale detection
pub struct ContractTimeoutConfig {
/// Time after last supervisor activity before contract is considered stale
pub stale_threshold: Duration, // Default: 30 minutes
/// Time to wait for user input before timing out
pub input_timeout: Duration, // Default: 24 hours
/// Time before completing contracts are auto-completed
pub completion_grace_period: Duration, // Default: 5 minutes
/// Time before archived contracts are deleted
pub archive_retention: Duration, // Default: 30 days
}
impl Default for ContractTimeoutConfig {
fn default() -> Self {
Self {
stale_threshold: Duration::from_secs(30 * 60),
input_timeout: Duration::from_secs(24 * 60 * 60),
completion_grace_period: Duration::from_secs(5 * 60),
archive_retention: Duration::from_secs(30 * 24 * 60 * 60),
}
}
}
```
### 1.6 Database Schema Changes
```sql
-- Add new status column options and tracking fields
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS
status_changed_at TIMESTAMPTZ DEFAULT NOW();
-- Track last activity for stale detection
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS
last_activity_at TIMESTAMPTZ DEFAULT NOW();
-- Track pending input
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS
waiting_for TEXT; -- 'question', 'phase_confirmation', 'completion_confirmation'
-- Track blockers
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS
blocked_reason TEXT;
-- Track failure reason
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS
failure_reason TEXT;
-- Index for status queries
CREATE INDEX idx_contracts_status ON contracts(status);
CREATE INDEX idx_contracts_last_activity ON contracts(last_activity_at);
```
### 1.7 API Changes
```rust
/// Request to change contract state
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct ChangeStatusRequest {
pub target_status: ContractStatus,
/// Required for some transitions
pub reason: Option<String>,
/// For blocking states, what is blocking
pub blocker: Option<String>,
}
/// Response for status change
#[derive(Debug, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct ChangeStatusResponse {
pub success: bool,
pub previous_status: ContractStatus,
pub new_status: ContractStatus,
/// If transition failed, why
pub rejection_reason: Option<String>,
}
```
---
## 2. Automatic Completion Detection
### 2.1 Current Problem
Contracts currently require manual `supervisor_complete()` calls. Supervisors may exit without completing contracts, leaving them active indefinitely.
### 2.2 Proposed Solution: Completion Gates
#### 2.2.1 Phase Completion Gates
```rust
/// Gate that must be satisfied before advancing to next phase
pub struct PhaseCompletionGate {
/// Required deliverables for this phase
pub required_deliverables: Vec<String>,
/// Required tasks to be completed
pub required_tasks: TaskRequirement,
/// Optional custom validation function
pub custom_validator: Option<Box<dyn Fn(&Contract) -> bool>>,
/// Whether to auto-advance when gate is satisfied
pub auto_advance: bool,
}
/// Task completion requirements
pub enum TaskRequirement {
/// No task requirements
None,
/// All spawned tasks must complete
AllComplete,
/// At least N tasks must complete
MinComplete(usize),
/// Specific named tasks must complete
NamedTasks(Vec<String>),
}
```
#### 2.2.2 Contract Completion Detection
```rust
/// Contract completion detector
pub struct CompletionDetector {
/// Phase-specific gates
phase_gates: HashMap<String, PhaseCompletionGate>,
}
impl CompletionDetector {
/// Check if current phase is ready to advance
pub fn check_phase_readiness(
&self,
contract: &Contract,
tasks: &[TaskSummary],
) -> PhaseReadinessResult {
let gate = match self.phase_gates.get(&contract.phase) {
Some(g) => g,
None => return PhaseReadinessResult::NoGate,
};
let mut missing = Vec::new();
// Check deliverables
let completed = contract.get_completed_deliverables(&contract.phase);
for req in &gate.required_deliverables {
if !completed.contains(req) {
missing.push(format!("Deliverable: {}", req));
}
}
// Check tasks
match &gate.required_tasks {
TaskRequirement::None => {},
TaskRequirement::AllComplete => {
let incomplete = tasks.iter()
.filter(|t| !t.is_supervisor && t.status != "done")
.count();
if incomplete > 0 {
missing.push(format!("{} tasks incomplete", incomplete));
}
},
TaskRequirement::MinComplete(n) => {
let complete = tasks.iter()
.filter(|t| !t.is_supervisor && t.status == "done")
.count();
if complete < *n {
missing.push(format!("Need {} tasks complete, have {}", n, complete));
}
},
TaskRequirement::NamedTasks(names) => {
for name in names {
let found = tasks.iter()
.find(|t| &t.name == name && t.status == "done");
if found.is_none() {
missing.push(format!("Task '{}' not complete", name));
}
}
}
}
if missing.is_empty() {
PhaseReadinessResult::Ready
} else {
PhaseReadinessResult::NotReady { missing }
}
}
/// Check if contract should auto-complete
pub fn check_contract_completion(&self, contract: &Contract) -> bool {
// Must be in terminal phase
if contract.phase != contract.terminal_phase_id() {
return false;
}
// Terminal phase gate must be satisfied
matches!(
self.check_phase_readiness(contract, &[]),
PhaseReadinessResult::Ready
)
}
}
/// Result of phase readiness check
pub enum PhaseReadinessResult {
/// Phase is ready to advance
Ready,
/// Phase is not ready, with list of missing items
NotReady { missing: Vec<String> },
/// No gate defined for this phase
NoGate,
}
```
#### 2.2.3 Auto-Completion Flow
```
┌─────────────────────────────────────────────────────────────────────┐
│ AUTO-COMPLETION FLOW │
└─────────────────────────────────────────────────────────────────────┘
1. Task completes (status = "done")
│
▼
2. Check if any phase gate is now satisfied
│
├─ NO ──► Return, wait for more tasks
│
▼ YES
3. Is auto_advance enabled for phase?
│
├─ NO ──► Notify user, wait for manual advance
│
▼ YES
4. Is phase_guard enabled?
│
├─ YES ─► Set status = WaitingForInput, ask for confirmation
│
▼ NO
5. Auto-advance to next phase
│
▼
6. Is this the terminal phase?
│
├─ NO ──► Continue working
│
▼ YES
7. All terminal deliverables complete?
│
├─ NO ──► Continue working
│
▼ YES
8. Set status = Completing
│
▼
9. Cleanup worktrees, stop supervisor
│
▼
10. Set status = Completed
```
### 2.3 Database Schema Changes
```sql
-- Track auto-completion state
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS
auto_complete_enabled BOOLEAN DEFAULT TRUE;
-- Track when completion was detected
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS
completion_detected_at TIMESTAMPTZ;
```
### 2.4 API Endpoints
```rust
/// Check phase readiness
/// GET /api/v1/contracts/{id}/phase-readiness
pub async fn check_phase_readiness(
contract_id: Uuid,
) -> PhaseReadinessResponse;
/// Force completion check
/// POST /api/v1/contracts/{id}/check-completion
pub async fn check_completion(
contract_id: Uuid,
) -> CompletionCheckResponse;
/// Enable/disable auto-completion
/// PUT /api/v1/contracts/{id}/auto-complete
pub async fn set_auto_complete(
contract_id: Uuid,
enabled: bool,
) -> ContractSummary;
```
---
## 3. Supervisor Status Reporting
### 3.1 Current Problem
The `supervisor_states` table exists but:
- State is not reliably persisted during daemon operations
- Restoration after crash doesn't properly resume context
- No clear indication of supervisor's current activity
### 3.2 Proposed Supervisor States
```rust
/// Supervisor execution states
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum SupervisorState {
/// Supervisor is starting up
Initializing,
/// Supervisor is idle, no pending work
Idle,
/// Supervisor is actively working (LLM processing)
Working,
/// Supervisor is waiting for user input
WaitingForUser,
/// Supervisor is waiting for child tasks
WaitingForTasks,
/// Supervisor is blocked on external resource
Blocked,
/// Supervisor has completed its work
Completed,
/// Supervisor has failed
Failed,
/// Supervisor was interrupted
Interrupted,
}
```
### 3.3 Heartbeat Mechanism
```rust
/// Heartbeat message from supervisor to server
#[derive(Debug, Serialize, Deserialize)]
pub struct SupervisorHeartbeat {
pub task_id: Uuid,
pub contract_id: Uuid,
pub state: SupervisorState,
pub phase: String,
/// What the supervisor is currently doing
pub current_activity: String,
/// Progress percentage (0-100)
pub progress: u8,
/// IDs of tasks supervisor is waiting on
pub pending_task_ids: Vec<Uuid>,
/// Timestamp
pub timestamp: DateTime<Utc>,
}
/// Heartbeat configuration
pub struct HeartbeatConfig {
/// How often to send heartbeats
pub interval: Duration, // Default: 30 seconds
/// How long before a supervisor is considered dead
pub timeout: Duration, // Default: 2 minutes
}
```
### 3.4 State Persistence
```rust
/// Enhanced supervisor state for persistence
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SupervisorPersistentState {
/// Current supervisor state
pub state: SupervisorState,
/// Current contract phase
pub phase: String,
/// Conversation history for resumption
pub conversation_history: Vec<ConversationMessage>,
/// Currently pending questions
pub pending_questions: Vec<PendingQuestion>,
/// Tasks spawned by this supervisor
pub spawned_task_ids: Vec<Uuid>,
/// Tasks we're waiting on
pub waiting_on_task_ids: Vec<Uuid>,
/// Last checkpoint created
pub last_checkpoint: Option<CheckpointInfo>,
/// Current activity description
pub current_activity: String,
/// Error if in failed state
pub error: Option<String>,
/// Timestamps
pub created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
}
```
### 3.5 Database Schema Changes
```sql
-- Enhance supervisor_states table
ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS
state VARCHAR(50) NOT NULL DEFAULT 'initializing';
ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS
current_activity TEXT;
ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS
progress INTEGER DEFAULT 0;
ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS
error_message TEXT;
ALTER TABLE supervisor_states ADD COLUMN IF NOT EXISTS
spawned_task_ids UUID[] DEFAULT ARRAY[]::UUID[];
-- Create heartbeat tracking table
CREATE TABLE IF NOT EXISTS supervisor_heartbeats (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
supervisor_task_id UUID NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
contract_id UUID NOT NULL REFERENCES contracts(id) ON DELETE CASCADE,
state VARCHAR(50) NOT NULL,
phase VARCHAR(50) NOT NULL,
current_activity TEXT,
progress INTEGER DEFAULT 0,
pending_task_ids UUID[] DEFAULT ARRAY[]::UUID[],
timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Keep only recent heartbeats
CONSTRAINT heartbeat_ttl CHECK (timestamp > NOW() - INTERVAL '24 hours')
);
CREATE INDEX idx_heartbeats_supervisor ON supervisor_heartbeats(supervisor_task_id);
CREATE INDEX idx_heartbeats_timestamp ON supervisor_heartbeats(timestamp);
```
### 3.6 Restoration Protocol
```
┌─────────────────────────────────────────────────────────────────────┐
│ SUPERVISOR RESTORATION PROTOCOL │
└─────────────────────────────────────────────────────────────────────┘
1. Daemon restarts or task is assigned to new daemon
│
▼
2. Load supervisor state from supervisor_states table
│
├─ NOT FOUND ──► Start fresh, log warning
│
▼ FOUND
3. Validate state consistency
│
├─ INVALID ──► Start from last checkpoint
│
▼ VALID
4. Restore conversation history
│
▼
5. Check for pending questions
│
├─ HAS PENDING ──► Re-deliver questions to user
│
▼ NO PENDING
6. Check for waiting tasks
│
├─ HAS WAITING ──► Resume waiting state
│
▼ NO WAITING
7. Send restoration context to Claude
│
▼
8. Resume execution from last state
```
### 3.7 API Endpoints
```rust
/// Get supervisor status
/// GET /api/v1/contracts/{id}/supervisor/status
pub async fn get_supervisor_status(
contract_id: Uuid,
) -> SupervisorStatusResponse;
/// Get supervisor heartbeat history
/// GET /api/v1/contracts/{id}/supervisor/heartbeats
pub async fn get_heartbeats(
contract_id: Uuid,
limit: Option<i32>,
) -> HeartbeatListResponse;
/// Force supervisor state sync
/// POST /api/v1/contracts/{id}/supervisor/sync
pub async fn sync_supervisor_state(
contract_id: Uuid,
) -> SyncResponse;
```
---
## 4. Contract Monitoring Dashboard
### 4.1 Real-Time Status Updates
#### 4.1.1 WebSocket Events
```rust
/// Contract monitoring events
#[derive(Debug, Serialize)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum ContractMonitorEvent {
/// Contract status changed
StatusChanged {
contract_id: Uuid,
old_status: ContractStatus,
new_status: ContractStatus,
reason: Option<String>,
},
/// Phase changed
PhaseChanged {
contract_id: Uuid,
old_phase: String,
new_phase: String,
},
/// Supervisor state changed
SupervisorStateChanged {
contract_id: Uuid,
supervisor_task_id: Uuid,
old_state: SupervisorState,
new_state: SupervisorState,
},
/// Supervisor heartbeat received
Heartbeat {
contract_id: Uuid,
state: SupervisorState,
activity: String,
progress: u8,
},
/// Contract became stale
StaleDetected {
contract_id: Uuid,
last_activity: DateTime<Utc>,
stale_duration: Duration,
},
/// Task completed
TaskCompleted {
contract_id: Uuid,
task_id: Uuid,
task_name: String,
success: bool,
},
/// Deliverable marked complete
DeliverableCompleted {
contract_id: Uuid,
phase: String,
deliverable_id: String,
},
/// Question asked (needs user attention)
QuestionAsked {
contract_id: Uuid,
question_id: Uuid,
question: String,
question_type: String,
},
}
```
#### 4.1.2 Subscription API
```rust
/// Subscribe to contract monitoring events
/// WS /api/v1/contracts/monitor
pub async fn monitor_contracts(
ws: WebSocket,
filter: ContractMonitorFilter,
) -> Result<(), Error>;
#[derive(Debug, Deserialize)]
pub struct ContractMonitorFilter {
/// Filter by specific contract IDs
pub contract_ids: Option<Vec<Uuid>>,
/// Filter by status
pub statuses: Option<Vec<ContractStatus>>,
/// Include stale detection events
pub include_stale: bool,
/// Include heartbeat events
pub include_heartbeats: bool,
}
```
### 4.2 Stale Contract Detection
```rust
/// Stale contract detector service
pub struct StaleContractDetector {
pool: PgPool,
config: ContractTimeoutConfig,
}
impl StaleContractDetector {
/// Run stale detection loop
pub async fn run(&self, event_tx: Sender<ContractMonitorEvent>) {
let mut interval = tokio::time::interval(Duration::from_secs(60));
loop {
interval.tick().await;
let stale = self.detect_stale_contracts().await;
for (contract_id, last_activity) in stale {
let _ = event_tx.send(ContractMonitorEvent::StaleDetected {
contract_id,
last_activity,
stale_duration: Utc::now() - last_activity,
}).await;
}
}
}
/// Detect stale contracts
async fn detect_stale_contracts(&self) -> Vec<(Uuid, DateTime<Utc>)> {
let threshold = Utc::now() - self.config.stale_threshold;
sqlx::query_as::<_, (Uuid, DateTime<Utc>)>(
r#"
SELECT id, last_activity_at
FROM contracts
WHERE status = 'active'
AND last_activity_at < $1
"#
)
.bind(threshold)
.fetch_all(&self.pool)
.await
.unwrap_or_default()
}
}
```
### 4.3 Batch Operations
```rust
/// Batch operation types
#[derive(Debug, Deserialize)]
#[serde(tag = "operation", rename_all = "snake_case")]
pub enum BatchOperation {
/// Archive completed contracts older than threshold
ArchiveOld {
older_than: Duration,
status_filter: Vec<ContractStatus>,
},
/// Pause all active contracts
PauseAll {
reason: String,
},
/// Resume all paused contracts
ResumeAll,
/// Delete archived contracts older than threshold
CleanupArchived {
older_than: Duration,
},
/// Restart stale supervisors
RestartStale {
stale_threshold: Duration,
},
}
/// Batch operation result
#[derive(Debug, Serialize)]
pub struct BatchOperationResult {
pub operation: String,
pub affected_count: usize,
pub affected_ids: Vec<Uuid>,
pub errors: Vec<BatchOperationError>,
}
#[derive(Debug, Serialize)]
pub struct BatchOperationError {
pub contract_id: Uuid,
pub error: String,
}
```
### 4.4 Dashboard API
```rust
/// Get dashboard summary
/// GET /api/v1/contracts/dashboard
pub async fn get_dashboard() -> DashboardResponse;
#[derive(Debug, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct DashboardResponse {
/// Count by status
pub status_counts: HashMap<ContractStatus, usize>,
/// Count by phase (for active contracts)
pub phase_counts: HashMap<String, usize>,
/// Stale contracts
pub stale_contracts: Vec<StaleContractInfo>,
/// Contracts waiting for input
pub waiting_for_input: Vec<WaitingContractInfo>,
/// Recent activity
pub recent_events: Vec<ContractMonitorEvent>,
/// Resource usage
pub resource_usage: ResourceUsage,
}
#[derive(Debug, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct StaleContractInfo {
pub id: Uuid,
pub name: String,
pub phase: String,
pub last_activity: DateTime<Utc>,
pub stale_duration_secs: i64,
}
#[derive(Debug, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct WaitingContractInfo {
pub id: Uuid,
pub name: String,
pub waiting_for: String, // 'question', 'phase_confirmation', etc.
pub waiting_since: DateTime<Utc>,
pub question: Option<String>,
}
#[derive(Debug, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct ResourceUsage {
pub active_supervisors: usize,
pub running_tasks: usize,
pub pending_tasks: usize,
pub active_daemons: usize,
pub total_worktrees: usize,
}
```
---
## 5. Improved CLI Commands
### 5.1 Contract Listing with Filters
```bash
# List all contracts
makima contracts list
# List with status filter
makima contracts list --status active
makima contracts list --status completed,failed
# List stale contracts
makima contracts list --stale
makima contracts list --stale --threshold 30m
# List contracts waiting for input
makima contracts list --waiting
# List by phase
makima contracts list --phase execute
# Combine filters
makima contracts list --status active --phase plan --stale
# Output formats
makima contracts list --format json
makima contracts list --format table
makima contracts list --format compact
```
#### Implementation
```rust
#[derive(Debug, Args)]
pub struct ListContractsArgs {
/// Filter by status (comma-separated)
#[arg(long)]
pub status: Option<String>,
/// Show only stale contracts
#[arg(long)]
pub stale: bool,
/// Stale threshold (e.g., "30m", "1h")
#[arg(long, default_value = "30m")]
pub threshold: String,
/// Show contracts waiting for input
#[arg(long)]
pub waiting: bool,
/// Filter by phase
#[arg(long)]
pub phase: Option<String>,
/// Output format
#[arg(long, default_value = "table")]
pub format: OutputFormat,
/// Limit results
#[arg(long, short = 'n')]
pub limit: Option<usize>,
}
```
### 5.2 Cleanup Command
```bash
# Archive completed contracts older than 7 days
makima contracts cleanup --archive --older-than 7d
# Delete archived contracts older than 30 days
makima contracts cleanup --delete-archived --older-than 30d
# Dry run (show what would be affected)
makima contracts cleanup --archive --older-than 7d --dry-run
# Force cleanup without confirmation
makima contracts cleanup --archive --older-than 7d --force
# Cleanup stale worktrees
makima contracts cleanup --worktrees
# Full cleanup: archive old, delete archived, clean worktrees
makima contracts cleanup --all --older-than 7d
```
#### Implementation
```rust
#[derive(Debug, Args)]
pub struct CleanupContractsArgs {
/// Archive completed/failed contracts
#[arg(long)]
pub archive: bool,
/// Delete archived contracts
#[arg(long)]
pub delete_archived: bool,
/// Clean up orphaned worktrees
#[arg(long)]
pub worktrees: bool,
/// Run all cleanup operations
#[arg(long)]
pub all: bool,
/// Threshold for cleanup (e.g., "7d", "30d")
#[arg(long, default_value = "7d")]
pub older_than: String,
/// Dry run - show what would be affected
#[arg(long)]
pub dry_run: bool,
/// Skip confirmation prompts
#[arg(long)]
pub force: bool,
}
```
### 5.3 Monitor Command
```bash
# Real-time monitoring dashboard
makima contracts monitor
# Monitor specific contracts
makima contracts monitor <contract-id> <contract-id>
# Monitor with filters
makima contracts monitor --status active
makima contracts monitor --stale
# Quiet mode - only show important events
makima contracts monitor --quiet
# JSON output for scripting
makima contracts monitor --format json
```
#### Implementation
```rust
#[derive(Debug, Args)]
pub struct MonitorContractsArgs {
/// Contract IDs to monitor (empty = all)
pub contract_ids: Vec<Uuid>,
/// Filter by status
#[arg(long)]
pub status: Option<String>,
/// Only show stale contracts
#[arg(long)]
pub stale: bool,
/// Quiet mode - only important events
#[arg(long, short)]
pub quiet: bool,
/// Output format
#[arg(long, default_value = "tui")]
pub format: MonitorFormat,
}
#[derive(Debug, Clone, ValueEnum)]
pub enum MonitorFormat {
/// Terminal UI dashboard
Tui,
/// Plain text output
Text,
/// JSON stream
Json,
}
```
### 5.4 Additional Commands
```bash
# Resume a paused contract
makima contracts resume <contract-id>
# Pause an active contract
makima contracts pause <contract-id> --reason "Waiting for external review"
# Force advance phase
makima contracts advance <contract-id> --phase execute --force
# Restart stale supervisor
makima contracts restart-supervisor <contract-id>
# Show contract details
makima contracts show <contract-id> --verbose
# Check contract health
makima contracts health <contract-id>
# Export contract history
makima contracts export <contract-id> --format json --output contract.json
```
---
## 6. Bug Fixes
### 6.1 Version Conflicts (Silent Failures)
**Problem**: Phase changes can fail silently when version conflicts occur.
**Solution**: Implement explicit version checking and conflict reporting.
```rust
/// Result type for phase changes with explicit conflict handling
pub enum PhaseChangeResult {
Success(Contract),
VersionConflict {
expected: i32,
actual: i32,
current_phase: String,
},
ValidationFailed {
reason: String,
missing_requirements: Vec<String>,
},
Unauthorized,
NotFound,
}
/// Enhanced phase change handler
pub async fn change_phase_with_validation(
pool: &PgPool,
contract_id: Uuid,
owner_id: Uuid,
new_phase: &str,
expected_version: Option<i32>,
) -> Result<PhaseChangeResult, Error> {
// Start transaction
let mut tx = pool.begin().await?;
// Get current contract with lock
let contract = sqlx::query_as::<_, Contract>(
"SELECT * FROM contracts WHERE id = $1 AND owner_id = $2 FOR UPDATE"
)
.bind(contract_id)
.bind(owner_id)
.fetch_optional(&mut *tx)
.await?;
let contract = match contract {
Some(c) => c,
None => return Ok(PhaseChangeResult::NotFound),
};
// Check version if provided
if let Some(expected) = expected_version {
if contract.version != expected {
return Ok(PhaseChangeResult::VersionConflict {
expected,
actual: contract.version,
current_phase: contract.phase.clone(),
});
}
}
// Validate phase transition
let validation = validate_phase_transition(&contract, new_phase);
if !validation.valid {
return Ok(PhaseChangeResult::ValidationFailed {
reason: validation.reason,
missing_requirements: validation.missing,
});
}
// Update phase
let updated = sqlx::query_as::<_, Contract>(
r#"
UPDATE contracts
SET phase = $1, version = version + 1, updated_at = NOW()
WHERE id = $2
RETURNING *
"#
)
.bind(new_phase)
.bind(contract_id)
.fetch_one(&mut *tx)
.await?;
tx.commit().await?;
Ok(PhaseChangeResult::Success(updated))
}
```
### 6.2 Deliverable Validation
**Problem**: Can mark non-existent deliverables as complete.
**Solution**: Validate deliverable IDs before marking complete.
```rust
/// Validate deliverable exists for contract type and phase
pub fn validate_deliverable(
contract_type: &str,
phase: &str,
deliverable_id: &str,
phase_config: Option<&PhaseConfig>,
) -> Result<(), DeliverableValidationError> {
let deliverables = if let Some(config) = phase_config {
get_phase_deliverables_from_config(phase, config)
} else {
get_phase_deliverables_for_type(phase, contract_type)
};
let valid_ids: Vec<&str> = deliverables
.deliverables
.iter()
.map(|d| d.id.as_str())
.collect();
if !valid_ids.contains(&deliverable_id) {
return Err(DeliverableValidationError::InvalidDeliverable {
deliverable_id: deliverable_id.to_string(),
phase: phase.to_string(),
valid_ids: valid_ids.into_iter().map(String::from).collect(),
});
}
Ok(())
}
#[derive(Debug, thiserror::Error)]
pub enum DeliverableValidationError {
#[error("Invalid deliverable '{deliverable_id}' for {phase} phase. Valid IDs: {valid_ids:?}")]
InvalidDeliverable {
deliverable_id: String,
phase: String,
valid_ids: Vec<String>,
},
}
```
### 6.3 Phase Guard Bypass
**Problem**: Supervisors can bypass phase_guard setting.
**Solution**: Enforce phase_guard at the API level, not just in supervisor logic.
```rust
/// Enhanced phase change with phase_guard enforcement
pub async fn change_phase_enforced(
pool: &PgPool,
contract_id: Uuid,
owner_id: Uuid,
request: ChangePhaseRequest,
is_supervisor: bool,
) -> Result<PhaseChangeResponse, Error> {
let contract = get_contract_for_owner(pool, contract_id, owner_id).await?
.ok_or_else(|| Error::NotFound)?;
// Phase guard is enforced for EVERYONE, including supervisors
if contract.phase_guard && !request.confirmed.unwrap_or(false) {
// Must return phase review info, regardless of caller
return Ok(PhaseChangeResponse::RequiresConfirmation {
current_phase: contract.phase,
next_phase: request.phase,
deliverables: get_phase_deliverables(&contract.phase),
message: "Phase guard is enabled. User confirmation required.".to_string(),
});
}
// Proceed with phase change
// ...
}
```
---
## 7. Migration Plan
### 7.1 Phase 1: Database Schema (Week 1)
1. Add new columns to `contracts` table
2. Add new columns to `supervisor_states` table
3. Create `supervisor_heartbeats` table
4. Create indexes
5. Backfill `last_activity_at` from existing data
```sql
-- Migration 001: Contract status enhancements
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS status_changed_at TIMESTAMPTZ DEFAULT NOW();
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS last_activity_at TIMESTAMPTZ DEFAULT NOW();
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS waiting_for TEXT;
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS blocked_reason TEXT;
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS failure_reason TEXT;
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS auto_complete_enabled BOOLEAN DEFAULT TRUE;
ALTER TABLE contracts ADD COLUMN IF NOT EXISTS completion_detected_at TIMESTAMPTZ;
CREATE INDEX IF NOT EXISTS idx_contracts_status ON contracts(status);
CREATE INDEX IF NOT EXISTS idx_contracts_last_activity ON contracts(last_activity_at);
-- Backfill last_activity_at from updated_at
UPDATE contracts SET last_activity_at = updated_at WHERE last_activity_at IS NULL;
```
### 7.2 Phase 2: Core Logic (Week 2)
1. Implement `ContractStatus` enum with new states
2. Implement state transition validation
3. Implement `CompletionDetector`
4. Update phase change handlers with validation
5. Implement deliverable validation
### 7.3 Phase 3: Supervisor Enhancements (Week 3)
1. Implement `SupervisorState` enum
2. Implement heartbeat mechanism
3. Implement state persistence
4. Implement restoration protocol
5. Update supervisor API endpoints
### 7.4 Phase 4: Monitoring (Week 4)
1. Implement WebSocket monitoring events
2. Implement stale detection service
3. Implement batch operations
4. Implement dashboard API
### 7.5 Phase 5: CLI (Week 5)
1. Implement `contracts list` with filters
2. Implement `contracts cleanup`
3. Implement `contracts monitor`
4. Implement additional helper commands
### 7.6 Phase 6: Testing & Rollout (Week 6)
1. Unit tests for all new components
2. Integration tests for state machines
3. Load testing for monitoring
4. Staged rollout with feature flags
5. Documentation updates
---
## 8. Appendix
### 8.1 Configuration Options
```toml
[contracts]
# Timeout configuration
stale_threshold_minutes = 30
input_timeout_hours = 24
completion_grace_period_minutes = 5
archive_retention_days = 30
# Auto-completion
auto_complete_enabled = true
auto_advance_phases = true
# Heartbeat
heartbeat_interval_seconds = 30
heartbeat_timeout_seconds = 120
# Monitoring
monitor_ws_buffer_size = 1000
stale_detection_interval_seconds = 60
```
### 8.2 Error Codes
| Code | Description |
|------|-------------|
| `CONTRACT_NOT_FOUND` | Contract does not exist |
| `INVALID_TRANSITION` | State transition not allowed |
| `VERSION_CONFLICT` | Optimistic locking conflict |
| `PHASE_GUARD_REQUIRED` | Phase guard confirmation needed |
| `INVALID_DELIVERABLE` | Deliverable ID not valid for phase |
| `SUPERVISOR_NOT_FOUND` | No supervisor for contract |
| `SUPERVISOR_DEAD` | Supervisor heartbeat timeout |
| `VALIDATION_FAILED` | Phase requirements not met |
### 8.3 Metrics
The following metrics should be tracked:
- `contracts_by_status` (gauge) - Count of contracts by status
- `contracts_stale_count` (gauge) - Number of stale contracts
- `phase_transitions_total` (counter) - Phase changes by from/to
- `completion_detections_total` (counter) - Auto-completions detected
- `supervisor_heartbeats_total` (counter) - Heartbeats received
- `supervisor_restarts_total` (counter) - Supervisor restarts
- `batch_operations_total` (counter) - Batch operations by type