soryu - soryu-co/soryu mirror

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add Qwen3-TTS streaming endpoint for voice synthesis (#40)	soryu	2026-01-28	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Task completion checkpoint * Task completion checkpoint * Task completion checkpoint * Add Qwen3-TTS research document for live TTS replacement Research findings for replacing Chatterbox TTS with Qwen3-TTS-12Hz-0.6B-Base: - Current TTS: Chatterbox-Turbo-ONNX with batch-only generation, no streaming - Qwen3-TTS: 97ms end-to-end latency, streaming support, 3-second voice cloning - Voice cloning: Requires 3s reference audio + transcript (Makima voice planned) - Integration: Python service with WebSocket bridge (no ONNX export available) - Languages: 10 supported including English and Japanese Document includes: - Current architecture analysis (makima/src/tts.rs) - Qwen3-TTS capabilities and requirements - Feasibility assessment for live/streaming TTS - Audio clip requirements for voice cloning - Preliminary technical approach with architecture diagrams Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * [WIP] Heartbeat checkpoint - 2026-01-27 03:11:15 UTC * Add Qwen3-TTS research documentation Comprehensive research on replacing Chatterbox TTS with Qwen3-TTS-12Hz-0.6B-Base: - Current TTS implementation analysis (Chatterbox-Turbo-ONNX in makima/src/tts.rs) - Qwen3-TTS capabilities: 97ms streaming latency, voice cloning with 3s reference - Cross-lingual support: Japanese voice (Makima/Tomori Kusunoki) speaking English - Python microservice architecture recommendation (FastAPI + WebSocket) - Implementation phases and technical approach - Hardware requirements and dependencies Key findings: - Live/streaming TTS is highly feasible with 97ms latency - Voice cloning fully supported with 0.95 speaker similarity - Recommended: Python microservice with WebSocket streaming Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add comprehensive Qwen3-TTS integration specification This specification document defines the complete integration of Qwen3-TTS-12Hz-0.6B-Base as a replacement for the existing Chatterbox-Turbo TTS implementation. The document covers: ## Functional Requirements - WebSocket endpoint /api/v1/speak for streaming TTS - Voice cloning with default Makima voice (Japanese VA speaking English) - Support for custom voice references - Detailed client-to-server and server-to-client message protocols - Integration with Listen page for bidirectional speech ## Non-Functional Requirements - Latency targets: < 200ms first audio byte - Audio quality: 24kHz, mono, PCM16/PCM32f - Hardware requirements: CUDA GPU with 4-8GB VRAM - Scalability: 10 concurrent sessions per GPU ## Architecture Specification - Python TTS microservice with FastAPI/WebSocket - Rust proxy endpoint in makima server - Voice prompt caching mechanism (LRU cache) - Error handling and recovery strategies ## API Contract - Complete WebSocket message format definitions (TypeScript) - Error codes and responses (TTS_UNAVAILABLE, SYNTHESIS_ERROR, etc.) - Session state machine and lifecycle management ## Voice Asset Requirements - Makima voice clip specifications (5-10s WAV, transcript required) - Storage location: models/voices/makima/ - Metadata format for voice management ## Testing Strategy - Unit tests for Python TTS service and Rust proxy - Integration tests for WebSocket flow - Latency benchmarks with performance targets - Test data fixtures for various text lengths Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add Qwen3-TTS implementation plan Comprehensive implementation plan for replacing Chatterbox-TTS with Qwen3-TTS streaming TTS service, including: - Task breakdown with estimated hours for each phase - Phase 1: Python TTS microservice (FastAPI, WebSocket) - Phase 2: Rust proxy integration (speak.rs, tts_client.rs) - Detailed file changes and new module structure - Testing plan with unit, integration, and latency benchmarks - Risk assessment with mitigation strategies - Success criteria for each phase Based on specification in docs/specs/qwen3-tts-spec.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add author and research references to TTS implementation plan Add links to research documentation and author attribution. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * [WIP] Heartbeat checkpoint - 2026-01-27 03:25:06 UTC * Add Python TTS service project structure (Phase 1.1-1.3) Create the initial makima-tts Python service directory structure with: - pyproject.toml with FastAPI, Qwen-TTS, and torch dependencies - config.py with pydantic-settings TTSConfig class - models.py with Pydantic message models (Start, Speak, Stop, Ready, etc.) This implements tasks P1.1, P1.2, and P1.3 from the Qwen3-TTS implementation plan. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add TTS engine and voice manager for Qwen3-TTS (Phase 1.4-1.5) Implement core TTS functionality: - tts_engine.py: Qwen3-TTS wrapper with streaming audio chunk generation - voice_manager.py: Voice prompt caching with LRU eviction and TTL support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * [WIP] Heartbeat checkpoint - 2026-01-27 03:30:06 UTC * Add TTS proxy client and message types (Phase 2.1, 2.2, 2.4) - Add tts_client.rs with TtsConfig, TtsCircuitBreaker, TtsError, TtsProxyClient, and TtsConnection structs for WebSocket proxying - Add TTS message types to messages.rs (TtsAudioEncoding, TtsPriority, TtsStartMessage, TtsSpeakMessage, TtsStopMessage, TtsClientMessage, TtsReadyMessage, TtsAudioChunkMessage, TtsCompleteMessage, TtsErrorMessage, TtsStoppedMessage, TtsServerMessage) - Export tts_client module from server mod.rs - tokio-tungstenite already present in Cargo.toml Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add TTS WebSocket handler and route (Phase 2.3, 2.5, 2.6) - Create speak.rs WebSocket handler that proxies to Python TTS service - Add TtsState fields (tts_client, tts_config) to AppState - Add with_tts() builder and is_tts_healthy() methods to AppState - Register /api/v1/speak route in the router - Add speak module export in handlers/mod.rs The handler forwards WebSocket messages bidirectionally between the client and the Python TTS microservice with proper error handling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add Makima voice profile assets for TTS voice cloning Creates the voice assets directory structure with: - manifest.json containing voice configuration (voice_id, speaker, language, reference audio path, and Japanese transcript placeholder) - README.md with instructions for obtaining voice reference audio Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add Rust-native Qwen3-TTS integration research document Research findings for integrating Qwen3-TTS-12Hz-0.6B-Base directly into the makima Rust codebase without Python. Key conclusions: - ONNX export is not viable (unsupported architecture) - Candle (HF Rust ML framework) is the recommended approach - Model weights available in safetensors format (2.52GB total) - Three components needed: LM backbone, code predictor, speech tokenizer - Crane project has Qwen3-TTS as highest priority (potential upstream) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * [WIP] Heartbeat checkpoint - 2026-01-27 11:21:43 UTC * [WIP] Heartbeat checkpoint - 2026-01-27 11:24:19 UTC * [WIP] Heartbeat checkpoint - 2026-01-27 11:26:43 UTC * feat: implement Rust-native Qwen3-TTS using candle framework Replace monolithic tts.rs with modular tts/ directory structure: - tts/mod.rs: TtsEngine trait, TtsEngineFactory, shared types (AudioChunk, TtsError), and utility functions (save_wav, resample, argmax) - tts/chatterbox.rs: existing ONNX-based ChatterboxTTS adapted to implement TtsEngine trait with Mutex-wrapped sessions for Send+Sync - tts/qwen3/mod.rs: Qwen3Tts entry point with HuggingFace model loading - tts/qwen3/config.rs: Qwen3TtsConfig parsing from HF config.json - tts/qwen3/model.rs: 28-layer Qwen3 transformer with RoPE, GQA (16 heads, 8 KV heads), SiLU MLP, RMS norm, and KV cache - tts/qwen3/code_predictor.rs: 5-layer MTP module predicting 16 codebooks - tts/qwen3/speech_tokenizer.rs: ConvNet encoder/decoder with 16-layer RVQ - tts/qwen3/generate.rs: autoregressive generation loop with streaming support Add candle-core, candle-nn, candle-transformers, safetensors to Cargo.toml. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: integrate TTS engine into speak WebSocket handler - Update speak.rs handler to use TTS engine directly from SharedState instead of returning a stub "not implemented" error - Add TtsEngine (OnceCell lazy-loaded) to AppState in state.rs with get_tts_engine() method for lazy initialization on first connection - Implement full WebSocket protocol: client sends JSON speak/cancel/stop messages, server streams binary PCM audio chunks and audio_end signals - Create voices/makima/manifest.json for Makima voice profile configuration - All files compile successfully with zero errors Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add /speak TTS page with WebSocket audio playback Add a new /speak frontend page for text-to-speech via WebSocket. The page accepts text input and streams synthesized PCM audio through the Web Audio API. Includes model loading indicator, cancel support, and connection status. Also adds a loading bar to the listen page ControlPanel during WebSocket connection. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
*	Move files tab and file pages to be accessible via contracts (#28)	soryu	2026-01-25	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* feat: remove Files from top-level navigation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: update file links to use contract-scoped routes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add contract context to FileDetail component - Add contractId, contractName, and onContractClick props to FileDetailProps - Update breadcrumb navigation to show contract name with path separator when viewing file within a contract context - Fall back to "Back to list" when no contract context is provided - This enables the FileDetail component to be used within the /contracts/:contractId/files/:fileId route Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: update routes to nest files under contracts - Add react-router-dom for client-side routing - Create ContractList component to list all contracts - Create ContractDetail component with tabs (overview, files, tasks, repos) - Create FileDetail component to view individual files - Configure routes: - /contracts - list all contracts - /contracts/:id - view contract details with Files tab - /contracts/:contractId/files/:fileId - view file in contract context - Remove standalone file routes (/files, /files/:id) Files are now only accessible through their parent contract. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Task completion checkpoint * Task completion checkpoint * Task completion checkpoint * Task completion checkpoint * Task completion checkpoint * Task completion checkpoint * Task completion checkpoint * Task completion checkpoint * feat: Add contract-scoped file route /contracts/:id/files/:fileId - Create ContractFilePage component for viewing files within contract context - Add route for /contracts/:id/files/:fileId in main.tsx - Update handleFileSelect in contracts.tsx to navigate to contract-scoped file URL - File viewer now has "Back to Contract" navigation instead of standalone /files This allows files accessed from a contract to maintain context and return to the contract page when going back. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Task completion checkpoint --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
*	feat: Simplify contract deliverables and add Templates UI	soryu	2026-01-24	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## Backend Changes ### Phase Deliverables Simplified - Simple contract type: - Plan phase: Only 'Plan' deliverable (required) - Execute phase: Only 'PR' deliverable (required) - Specification contract type: - Research phase: Only 'Research Notes' deliverable (required) - Specify phase: Only 'Requirements Document' deliverable (required) - Plan phase: Only 'Plan' deliverable (required) - Execute phase: Only 'PR' deliverable (required) - Review phase: Only 'Release Notes' deliverable (required) ### New 'execute' Contract Type - Only has 'execute' phase (no plan or review phases) - NO deliverables at all - executes tasks directly - Added to ContractType enum with proper Display/FromStr implementations - Added helper methods: `initial_phase()`, `terminal_phase()` ### API Updates - Added `get_phase_deliverables_for_type()` for contract-type-aware deliverables - Added `get_phase_checklist_for_type()` for contract-type-aware checklists - Added `check_phase_completion_for_type()` for contract-type-aware completion checks - Added `check_deliverables_met()` function for deliverable validation - Added `should_auto_progress()` for autonomous contract progression - Added new ContractToolRequest::CheckDeliverablesMet tool ## Frontend Changes (makima/frontend) ### Templates Page - Add TemplateEditor component for editing phase deliverables - Create Templates page with template card grid layout - Add navigation link in NavStrip - Implement three built-in templates: Simple, Specification, Execute - Support for creating custom templates with configurable phases/deliverables - Templates are persisted to localStorage Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
*	Add phase guard toggle for contract phase confirmation (#2)	soryu	2026-01-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add phase_guard field to Contract model and database This adds a new boolean field to control whether the supervisor should wait for user confirmation before progressing to the next phase. When enabled, users can review and potentially amend phase outputs (like plans, requirements docs) before the contract continues. Changes: - Add migration for phase_guard column (defaults to false) - Add phase_guard to Contract, CreateContractRequest, and UpdateContractRequest structs - Update create_contract_for_owner and update_contract_for_owner repository functions to handle phase_guard - Update all CreateContractRequest instantiations with phase_guard field Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: Add phase_guard for contract phase transitions Implement phase_guard logic in the advance_phase tool. When a contract has phase_guard enabled, the phase transition now: 1. Asks for user confirmation before advancing 2. Allows users to request changes to phase deliverables 3. Passes feedback to the task without advancing if changes requested Changes: - Add phase_guard field to Contract model and CreateContractRequest - Add PhaseTransitionRequest, PhaseFileInfo, PhaseTaskInfo structs - Update ChangePhaseRequest with confirmed and feedback fields - Update ContractToolRequest::AdvancePhase with confirmed/feedback - Modify advance_phase handling in contract_chat.rs with phase_guard logic - Update change_phase endpoint in contracts.rs with phase_guard support - Add database migration for phase_guard column When phase_guard=false: Phase advances immediately (current behavior) When phase_guard=true: Returns pending_confirmation status with deliverables If user provides feedback: Returns feedback to task, doesn't advance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(frontend): Add UI for phase transition confirmation requests When phase_guard is enabled and a supervisor tries to advance the contract phase, users now see a confirmation modal with: - Current and proposed next phase visualization - Phase deliverables checklist (if available) - Summary of the phase work - Options to "Approve & Advance" or "Request Changes" with feedback Components added: - PhaseConfirmationModal: Full modal dialog for phase confirmations - PhaseConfirmationInline: Inline variant for task output view - PhaseConfirmationNotification: Global notification wrapper - PhaseConfirmationToast: Alternative toast-style notification Integration: - Added phase_confirmation message type to TaskOutput renderer - Extended PendingQuestion API type with phase confirmation data - Integrated notification into main app layout The UI uses the existing supervisor question infrastructure (polling via /api/v1/mesh/questions) and responds with APPROVE or CHANGES_REQUESTED prefixed feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(frontend): Add Phase Guard toggle to AutopilotPanel Added the phase_guard toggle to the AutopilotPanel component, which allows users to enable/disable requiring confirmation before phase transitions. Changes: - Added phaseGuard and autonomousLoop fields to Contract interface in api.ts - Added phaseGuard field to UpdateContractRequest in api.ts - Added Phase Guard toggle UI in AutopilotPanel with similar styling to master - Toggle shows an 'active' badge when enabled - Connected toggle to contract update API The toggle appears below the autopilot control buttons and allows users to require confirmation before the supervisor advances to the next phase. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
*	Fixup: use default api.makima.jp URL and fix default branch detection	soryu	2026-01-16	1	-0/+17
\| \| \| \|	Also add checkpointing/history
*	Automatically derive repo URL and add notifications for input	soryu	2026-01-15	1	-5/+10
\|
*	Contract system	soryu	2026-01-15	1	-0/+26
\|
*	Initial Control system	soryu	2026-01-11	1	-9/+62
\|
*	Add Postgres for persistence and File cabinet	soryu	2025-12-23	1	-0/+3
\| \| \| \|	Migrations are local only currently, and must be run manually by setting POSTGRES_CONNECTION_URI
*	Update makima FE to add initial listening system	soryu	2025-12-23	1	-0/+19