diff options
| author | soryu <soryu@soryu.co> | 2026-01-26 23:39:52 +0000 |
|---|---|---|
| committer | soryu <soryu@soryu.co> | 2026-01-26 23:39:52 +0000 |
| commit | 6105817da79992cee3cd7d9f7127b94d874b6cda (patch) | |
| tree | 941ec6e52cb00ba6fec32788259a56968c1e9b14 | |
| parent | de1eb0a923f6c5b768ac49f0425b7213a89301b7 (diff) | |
| download | soryu-6105817da79992cee3cd7d9f7127b94d874b6cda.tar.gz soryu-6105817da79992cee3cd7d9f7127b94d874b6cda.zip | |
Add comprehensive Red Team system specification
Defines the adversarial review feature for contracts that monitors work tasks
in real-time to catch quality issues, plan deviations, and standards violations.
Key components specified:
- Contract configuration (red_team_enabled, red_team_prompt)
- Red team task lifecycle and spawning logic
- makima red-team notify CLI command for supervisor alerts
- Task output subscription for real-time monitoring
- Database schema changes (contracts, tasks, notifications table)
- API endpoints for notification and status
- System prompt template for red team behavior
- Security considerations and access control
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| -rw-r--r-- | .makima/specs/red-team-system.md | 748 |
1 files changed, 748 insertions, 0 deletions
diff --git a/.makima/specs/red-team-system.md b/.makima/specs/red-team-system.md new file mode 100644 index 0000000..31f4b78 --- /dev/null +++ b/.makima/specs/red-team-system.md @@ -0,0 +1,748 @@ +# Red Team System Specification + +## Overview + +The Red Team system is an adversarial review feature for makima contracts that provides real-time quality assurance during task execution. When enabled, a parallel "red team" task instance monitors the output of work tasks, verifying that implementations adhere to the contract requirements, repository standards, and the execution plan. + +### Goals + +1. **Quality Assurance**: Catch deviations from the plan before they compound +2. **Standards Compliance**: Ensure code follows repository conventions (CONTRIBUTING.md, linting rules, etc.) +3. **Contract Adherence**: Verify implementations match the specification and requirements +4. **Proactive Issue Detection**: Flag potential problems early, not after task completion + +### Non-Goals + +1. The red team should NOT write code or make commits +2. The red team should NOT be overly pedantic or block progress for minor style issues +3. The red team is NOT a replacement for code review - it's an early warning system + +--- + +## 1. Feature Overview + +### 1.1 Concept + +The Red Team operates as a parallel observer task that: +- Monitors all work task outputs in real-time via the broadcast system +- Has read-only access to task diffs and outputs +- Can access contract specifications, plans, and repository standards +- Can notify the supervisor when it detects issues requiring attention + +### 1.2 Relationship to Existing Components + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Contract │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Supervisor │ │ Work Task 1 │ │ Work Task 2 │ │ +│ │ │<───│ │ │ │ │ +│ │ │<───│ │ │ │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ ^ │ │ │ +│ │ outputs outputs │ +│ │ │ │ │ +│ [NOTIFY] v v │ +│ │ ┌─────────────────────────────┐ │ +│ └────────────│ Red Team Task │ │ +│ │ (Monitoring & Validation) │ │ +│ └─────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────┘ +``` + +### 1.3 Task Type + +The Red Team task is a special task variant with the following characteristics: +- `is_red_team: true` flag on the Task model +- Has tool key for API access (like supervisor tasks) +- Does NOT have write permissions to the repository +- Subscribes to task output broadcasts +- Can use `makima red-team notify` command to alert supervisor + +--- + +## 2. Contract Configuration + +### 2.1 Contract Model Changes + +Add the following field to the `Contract` model in `makima/src/db/models.rs`: + +```rust +/// Contract record from the database +#[derive(Debug, Clone, FromRow, Serialize, ToSchema)] +#[serde(rename_all = "camelCase")] +pub struct Contract { + // ... existing fields ... + + /// Whether to spawn a red team task to monitor work tasks. + /// When enabled, a parallel task monitors outputs and can alert + /// the supervisor about potential issues. + #[serde(default)] + pub red_team_enabled: bool, + + /// Optional custom prompt/criteria for the red team to use + /// when evaluating task outputs. If not provided, uses default + /// quality criteria. + #[serde(skip_serializing_if = "Option::is_none")] + pub red_team_prompt: Option<String>, +} +``` + +### 2.2 CreateContractRequest Changes + +```rust +#[derive(Debug, Clone, Deserialize, ToSchema)] +#[serde(rename_all = "camelCase")] +pub struct CreateContractRequest { + // ... existing fields ... + + /// Enable red team monitoring for this contract. + /// When enabled, a parallel task monitors work task outputs + /// and can alert the supervisor about potential issues. + #[serde(default)] + pub red_team_enabled: Option<bool>, + + /// Optional custom criteria for the red team to evaluate. + /// Examples: "Focus on security vulnerabilities", + /// "Ensure all functions have tests", etc. + pub red_team_prompt: Option<String>, +} +``` + +### 2.3 CLI Flag for Contract Creation + +The daemon CLI should support red team enablement during contract creation: + +```bash +# Enable red team with default criteria +makima supervisor create --red-team "Contract Name" "Description" + +# Enable red team with custom review criteria +makima supervisor create --red-team --red-team-prompt "Focus on performance and memory usage" "Contract Name" "Description" +``` + +--- + +## 3. Red Team Task Lifecycle + +### 3.1 Spawning + +The red team task is spawned automatically when: +1. A contract has `red_team_enabled: true` +2. The first work task is spawned (not the supervisor itself) + +**Spawn Logic** (in `spawn_task` handler or supervisor spawn logic): + +```rust +// In spawn_task after creating a work task: +if contract.red_team_enabled && !is_supervisor_task { + // Check if red team task already exists + let existing_red_team = repository::get_red_team_task_for_contract(pool, contract_id).await?; + + if existing_red_team.is_none() { + // Spawn red team task + let red_team_task = spawn_red_team_task( + pool, + state, + contract_id, + owner_id, + contract.red_team_prompt.as_deref(), + ).await?; + + tracing::info!( + contract_id = %contract_id, + red_team_task_id = %red_team_task.id, + "Spawned red team task for contract" + ); + } +} +``` + +### 3.2 Task Properties + +When creating the red team task: + +```rust +CreateTaskRequest { + name: "Red Team Monitor".to_string(), + description: Some("Adversarial review task monitoring work task outputs".to_string()), + plan: generate_red_team_plan(contract, custom_prompt), + contract_id: Some(contract_id), + parent_task_id: None, // Not a child of supervisor + is_supervisor: false, + is_red_team: true, // NEW FIELD + // ... other fields ... +} +``` + +### 3.3 Lifespan + +The red team task: +- Lives for the duration of the **execute phase** +- Is automatically terminated when: + - The contract advances past the execute phase + - The contract is completed + - The contract is archived +- Can be paused/resumed along with other contract tasks +- Does NOT restart automatically after daemon failure (not critical path) + +### 3.4 Read-Only Enforcement + +The red team task: +- Has NO worktree of its own (or a read-only clone) +- Cannot use git operations (commit, branch, etc.) +- Can only READ files, not write them +- Has API access limited to read operations + +--- + +## 4. Red Team Notification CLI Command + +### 4.1 Command Specification + +New CLI command available only to red team tasks: + +```bash +makima red-team notify "<message>" +``` + +**Arguments:** +- `<message>`: A detailed description of the issue detected + +**Options:** +- `--severity <level>`: Issue severity: `low`, `medium`, `high`, `critical` (default: `medium`) +- `--task <task_id>`: The specific task this relates to (optional) +- `--file <path>`: The file path where the issue was detected (optional) +- `--context <text>`: Additional context about the issue (optional) + +**Example:** + +```bash +makima red-team notify "Task is adding console.log statements which violates the no-debug-logging rule in CONTRIBUTING.md" \ + --severity medium \ + --task 550e8400-e29b-41d4-a716-446655440000 \ + --file "src/api/handler.rs" +``` + +### 4.2 CLI Arguments Structure + +```rust +// In makima/src/daemon/cli/mod.rs + +/// Red Team subcommand - red team task commands. +#[derive(Subcommand, Debug)] +pub enum RedTeamCommand { + /// Send a notification to the supervisor about a detected issue. + /// Only available to red team tasks. + Notify(NotifyArgs), +} + +/// Arguments for red-team notify command. +#[derive(Args, Debug)] +pub struct NotifyArgs { + /// API URL + #[arg(long, env = "MAKIMA_API_URL", default_value = "https://api.makima.jp")] + pub api_url: String, + + /// API key for authentication + #[arg(long, env = "MAKIMA_API_KEY")] + pub api_key: String, + + /// Current task ID (must be a red team task) + #[arg(long, env = "MAKIMA_TASK_ID")] + pub task_id: Uuid, + + /// Contract ID + #[arg(long, env = "MAKIMA_CONTRACT_ID")] + pub contract_id: Uuid, + + /// The notification message + #[arg(index = 1)] + pub message: String, + + /// Severity level: low, medium, high, critical + #[arg(long, default_value = "medium")] + pub severity: String, + + /// Related task ID (optional) + #[arg(long)] + pub task: Option<Uuid>, + + /// Related file path (optional) + #[arg(long)] + pub file: Option<String>, + + /// Additional context (optional) + #[arg(long)] + pub context: Option<String>, +} +``` + +### 4.3 API Endpoint + +**POST** `/api/v1/mesh/red-team/notify` + +**Request Body:** +```json +{ + "message": "Issue description", + "severity": "medium", + "relatedTaskId": "uuid-optional", + "filePath": "src/path/optional.rs", + "context": "Additional context optional" +} +``` + +**Response:** +```json +{ + "notificationId": "uuid", + "delivered": true, + "supervisorTaskId": "uuid" +} +``` + +### 4.4 Notification Delivery + +When a red team notification is received: + +1. **Validate Caller**: Ensure the request comes from a valid red team task +2. **Find Supervisor**: Get the supervisor task for the contract +3. **Format Message**: Create an `[ACTION REQUIRED]` formatted message +4. **Send to Supervisor**: Inject the message into the supervisor's stdin via `SendMessage` command + +**Message Format:** + +``` +════════════════════════════════════════════════════════════════ +[RED TEAM ALERT] Severity: MEDIUM +════════════════════════════════════════════════════════════════ + +Issue: Task is adding console.log statements which violates the +no-debug-logging rule in CONTRIBUTING.md + +Related Task: 550e8400-e29b-41d4-a716-446655440000 +File: src/api/handler.rs + +Context: The CONTRIBUTING.md file explicitly states that debug +logging should use the tracing crate, not console.log or println! + +════════════════════════════════════════════════════════════════ +You can: +- Pause the related task to investigate +- Send feedback to the task to correct the issue +- Acknowledge this alert and continue monitoring +════════════════════════════════════════════════════════════════ +``` + +### 4.5 Supervisor Response Handling + +The supervisor can respond to red team notifications by: +1. **Pausing the task**: `makima supervisor pause <task_id>` +2. **Sending feedback**: `makima supervisor message <task_id> "Please use tracing instead of console.log"` +3. **Acknowledging**: Simply continue (the red team will keep monitoring) +4. **Dismissing**: Mark the alert as false positive (future consideration) + +--- + +## 5. Red Team Access Patterns + +### 5.1 Task Output Subscription + +The red team task subscribes to the `task_outputs` broadcast channel: + +```rust +// In red team task initialization +let mut task_output_rx = state.task_outputs.subscribe(); + +loop { + match task_output_rx.recv().await { + Ok(notification) => { + // Only process outputs from work tasks in our contract + if notification.contract_id == Some(self.contract_id) + && !notification.is_supervisor + && !notification.is_red_team { + self.analyze_output(notification).await; + } + } + Err(e) => { + tracing::warn!("Red team task output subscription error: {}", e); + } + } +} +``` + +### 5.2 Task Diff Access + +The red team can request diffs via the supervisor API: + +**GET** `/api/v1/mesh/supervisor/tasks/{task_id}/diff` + +This endpoint already exists and can be used by the red team (with tool key auth). + +### 5.3 Contract Information Access + +The red team can read: +- Contract plan and specifications (via contract files) +- Repository standards (CONTRIBUTING.md, .editorconfig, etc.) +- Task descriptions and plans + +**Existing endpoints used:** +- `GET /api/v1/contracts/{id}` - Contract details +- `GET /api/v1/contracts/{id}/files` - Contract files +- `GET /api/v1/files/{id}` - File content + +### 5.4 Repository File Access + +For repository standards, the red team uses the existing daemon file read capability: + +```bash +# Via makima CLI (from within the red team task) +makima supervisor read-file <task_id> "CONTRIBUTING.md" +makima supervisor read-file <task_id> ".editorconfig" +makima supervisor read-file <task_id> "rustfmt.toml" +``` + +Or direct filesystem access if the red team has a read-only worktree clone. + +--- + +## 6. System Prompt for Red Team Task + +The red team task receives a specialized system prompt that guides its behavior: + +```markdown +# Red Team Monitor + +You are an adversarial quality reviewer for a software development contract. Your role is to monitor work task outputs in real-time and flag potential issues BEFORE they compound into larger problems. + +## Your Mission + +Monitor all task outputs and verify: +1. **Plan Adherence**: Are tasks following the implementation plan? +2. **Code Quality**: Does the code meet repository standards? +3. **Contract Requirements**: Does the implementation match the specification? +4. **Best Practices**: Are there obvious anti-patterns or issues? + +## Access Available + +You have read-only access to: +- Task outputs (streamed in real-time) +- Task diffs (code changes) +- Contract specifications and plan documents +- Repository configuration files (CONTRIBUTING.md, linting configs, etc.) + +## How to Monitor + +1. **Subscribe to task outputs**: You'll receive outputs from all work tasks +2. **Analyze code changes**: Request diffs for completed tasks +3. **Cross-reference**: Compare outputs against the plan and specifications +4. **Report issues**: Use `makima red-team notify` when you detect problems + +## When to Notify + +NOTIFY the supervisor when you observe: +- **Critical**: Security vulnerabilities, data loss risks, breaking changes +- **High**: Significant deviations from the plan, major code quality issues +- **Medium**: Missing tests, suboptimal implementations, minor standard violations +- **Low**: Style inconsistencies, documentation gaps (use sparingly) + +## What NOT to Do + +- Do NOT nitpick minor style issues (that's what linters are for) +- Do NOT block progress for trivial concerns +- Do NOT write code or make changes yourself +- Do NOT notify for things that are already in progress and being addressed +- Do NOT create duplicate notifications for the same issue + +## Notification Format + +When notifying, always include: +1. A clear, concise description of the issue +2. The severity level (critical/high/medium/low) +3. The related task ID if applicable +4. The specific file or code location if known +5. Why this matters (reference to plan, spec, or standards) + +## Example Notification + +``` +makima red-team notify "Task is implementing authentication with plaintext password storage, which contradicts the security requirements in the specification document" \ + --severity critical \ + --task <task_id> \ + --file "src/auth/user.rs" \ + --context "Specification section 3.2 requires bcrypt hashing for all passwords" +``` + +## Custom Review Criteria + +{{#if red_team_prompt}} +Additional review criteria for this contract: +{{red_team_prompt}} +{{/if}} + +## Contract Context + +Contract: {{contract_name}} +Phase: {{contract_phase}} +Repository: {{repository_url}} + +Focus your monitoring on outputs that relate to the active work tasks. Prioritize issues that could affect the success of the contract or introduce technical debt. +``` + +--- + +## 7. API Changes Summary + +### 7.1 New Endpoints + +| Method | Path | Description | +|--------|------|-------------| +| POST | `/api/v1/mesh/red-team/notify` | Send notification from red team to supervisor | +| GET | `/api/v1/mesh/red-team/status` | Get red team task status for a contract | + +### 7.2 Modified Endpoints + +| Method | Path | Change | +|--------|------|--------| +| POST | `/api/v1/contracts` | Add `red_team_enabled` and `red_team_prompt` fields | +| GET | `/api/v1/contracts/{id}` | Include red team task info in response | + +### 7.3 New Request/Response Types + +**RedTeamNotifyRequest:** +```rust +#[derive(Debug, Deserialize, ToSchema)] +#[serde(rename_all = "camelCase")] +pub struct RedTeamNotifyRequest { + pub message: String, + #[serde(default = "default_severity")] + pub severity: String, + pub related_task_id: Option<Uuid>, + pub file_path: Option<String>, + pub context: Option<String>, +} +``` + +**RedTeamNotifyResponse:** +```rust +#[derive(Debug, Serialize, ToSchema)] +#[serde(rename_all = "camelCase")] +pub struct RedTeamNotifyResponse { + pub notification_id: Uuid, + pub delivered: bool, + pub supervisor_task_id: Uuid, +} +``` + +**RedTeamStatusResponse:** +```rust +#[derive(Debug, Serialize, ToSchema)] +#[serde(rename_all = "camelCase")] +pub struct RedTeamStatusResponse { + pub contract_id: Uuid, + pub red_team_task_id: Option<Uuid>, + pub status: Option<String>, + pub notifications_sent: i32, + pub last_activity: Option<DateTime<Utc>>, +} +``` + +--- + +## 8. Database Schema Changes + +### 8.1 Contracts Table + +```sql +ALTER TABLE contracts +ADD COLUMN red_team_enabled BOOLEAN NOT NULL DEFAULT FALSE, +ADD COLUMN red_team_prompt TEXT; +``` + +### 8.2 Tasks Table + +```sql +ALTER TABLE tasks +ADD COLUMN is_red_team BOOLEAN NOT NULL DEFAULT FALSE; +``` + +### 8.3 Red Team Notifications Table (New) + +```sql +CREATE TABLE red_team_notifications ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + contract_id UUID NOT NULL REFERENCES contracts(id) ON DELETE CASCADE, + red_team_task_id UUID NOT NULL REFERENCES tasks(id) ON DELETE CASCADE, + related_task_id UUID REFERENCES tasks(id) ON DELETE SET NULL, + + message TEXT NOT NULL, + severity VARCHAR(20) NOT NULL DEFAULT 'medium', + file_path TEXT, + context TEXT, + + -- Delivery status + delivered BOOLEAN NOT NULL DEFAULT FALSE, + delivered_at TIMESTAMP WITH TIME ZONE, + acknowledged BOOLEAN NOT NULL DEFAULT FALSE, + acknowledged_at TIMESTAMP WITH TIME ZONE, + + created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW() +); + +-- Indexes +CREATE INDEX idx_red_team_notifications_contract_id ON red_team_notifications(contract_id); +CREATE INDEX idx_red_team_notifications_red_team_task_id ON red_team_notifications(red_team_task_id); +CREATE INDEX idx_red_team_notifications_created_at ON red_team_notifications(created_at DESC); +``` + +### 8.4 Index for Red Team Task Lookup + +```sql +CREATE INDEX idx_tasks_contract_red_team ON tasks(contract_id, is_red_team) +WHERE is_red_team = TRUE; +``` + +--- + +## 9. Implementation Phases + +### Phase 1: Foundation (MVP) +- [ ] Add `red_team_enabled` and `red_team_prompt` to Contract model +- [ ] Add `is_red_team` to Task model +- [ ] Database migrations +- [ ] Basic red team task spawning logic +- [ ] `makima red-team notify` CLI command +- [ ] Red team notification API endpoint + +### Phase 2: Monitoring Infrastructure +- [ ] Task output subscription for red team +- [ ] Diff access for red team tasks +- [ ] Red team system prompt generation +- [ ] Notification delivery to supervisor + +### Phase 3: Polish & UX +- [ ] Red team status in contract view +- [ ] Notification history and acknowledgment +- [ ] TUI integration for red team alerts +- [ ] Frontend display of red team notifications + +### Phase 4: Future Enhancements +- [ ] Configurable notification thresholds +- [ ] Automatic pause on critical issues +- [ ] Red team notification digest/summary +- [ ] Integration with external code review tools + +--- + +## 10. Security Considerations + +### 10.1 Access Control + +- Red team tasks MUST only have read access +- Verify `is_red_team` flag before allowing notification API calls +- Red team cannot spawn tasks or modify contract state +- Tool key scope should be limited for red team tasks + +### 10.2 Abuse Prevention + +- Rate limit red team notifications (max 10 per minute per task) +- Prevent notification spam with deduplication +- Log all red team activities for audit + +### 10.3 Isolation + +- Red team task runs in separate worktree (or no worktree) +- Cannot affect work task execution directly +- Supervisor controls whether to act on notifications + +--- + +## 11. Testing Strategy + +### 11.1 Unit Tests + +- Contract model serialization with red team fields +- Red team task spawning conditions +- Notification message formatting + +### 11.2 Integration Tests + +- Full contract lifecycle with red team enabled +- Notification delivery to supervisor +- Red team output subscription + +### 11.3 E2E Tests + +- Create contract with `--red-team` flag +- Red team detects intentional violation +- Supervisor receives and responds to notification + +--- + +## 12. Success Metrics + +1. **Detection Rate**: Percentage of issues caught by red team before task completion +2. **False Positive Rate**: Percentage of notifications that are dismissed as not actionable +3. **Response Time**: Time between red team detection and supervisor acknowledgment +4. **Contract Success Rate**: Compare success rates for contracts with/without red team + +--- + +## Appendix A: Message Protocol + +### Task Output Notification Structure + +The red team subscribes to `TaskOutputNotification`: + +```rust +pub struct TaskOutputNotification { + pub task_id: Uuid, + pub owner_id: Option<Uuid>, + pub message_type: String, // "assistant", "tool_use", "tool_result", etc. + pub content: String, + pub tool_name: Option<String>, + pub tool_input: Option<serde_json::Value>, + pub is_error: Option<bool>, + pub cost_usd: Option<f64>, + pub duration_ms: Option<u64>, + pub is_partial: bool, +} +``` + +### Daemon Command for Supervisor Message + +```rust +DaemonCommand::SendMessage { + task_id: supervisor_id, + message: formatted_red_team_alert, +} +``` + +--- + +## Appendix B: Configuration Examples + +### Contract Creation with Red Team (API) + +```json +POST /api/v1/contracts +{ + "name": "Implement User Authentication", + "description": "Add OAuth2 authentication flow", + "contract_type": "specification", + "red_team_enabled": true, + "red_team_prompt": "Pay special attention to security best practices and OWASP guidelines. Flag any hardcoded secrets or insecure token handling." +} +``` + +### Contract Creation with Red Team (CLI) + +```bash +makima contract create \ + --type specification \ + --red-team \ + --red-team-prompt "Focus on API backwards compatibility and deprecation handling" \ + "API v2 Migration" \ + "Migrate public API from v1 to v2" +``` |
