<feed xmlns='http://www.w3.org/2005/Atom'>
<title>soryu, branch makima/red-team-makima-frontend</title>
<subtitle>soryu-co/soryu mirror</subtitle>
<id>http://src.eirin.xyz/soryu/atom?h=makima%2Fred-team-makima-frontend</id>
<link rel='self' href='http://src.eirin.xyz/soryu/atom?h=makima%2Fred-team-makima-frontend'/>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/'/>
<updated>2026-01-29T02:18:40+00:00</updated>
<entry>
<title>[WIP] Heartbeat checkpoint - 2026-01-29 02:18:40 UTC</title>
<updated>2026-01-29T02:18:40+00:00</updated>
<author>
<name>soryu</name>
<email>soryu@soryu.co</email>
</author>
<published>2026-01-29T02:18:40+00:00</published>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/commit/?id=aa6afe77553df206578c3d310e64090f8ad558c4'/>
<id>urn:sha1:aa6afe77553df206578c3d310e64090f8ad558c4</id>
<content type='text'>
</content>
</entry>
<entry>
<title>fix: Remove mistaken red team UI from VN frontend</title>
<updated>2026-01-29T02:17:22+00:00</updated>
<author>
<name>soryu</name>
<email>soryu@soryu.co</email>
</author>
<published>2026-01-29T02:17:22+00:00</published>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/commit/?id=30abd41f726bf250a381d62e46052097bbf3b73c'/>
<id>urn:sha1:30abd41f726bf250a381d62e46052097bbf3b73c</id>
<content type='text'>
PR #39 accidentally added red team UI code to the wrong frontend directory
(frontend/ instead of makima/frontend/). The correct implementation is
already in makima/frontend/. This commit removes the mistaken changes:

- Delete ContractCreateModal.tsx (was added by mistake)
- Revert ContractList.tsx to remove red team badge and create modal
- Revert ContractDetail.tsx to remove red team tab and notifications
- Revert types.ts to remove contract/task types
- Revert pc98.css to remove red team styling

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
</content>
</entry>
<entry>
<title>feat: Add Red Team UI to makima/frontend contract creation</title>
<updated>2026-01-29T01:24:58+00:00</updated>
<author>
<name>soryu</name>
<email>soryu@soryu.co</email>
</author>
<published>2026-01-29T01:24:58+00:00</published>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/commit/?id=8846eb27f16a27012f13bfb39e9907bfe7d8bdcd'/>
<id>urn:sha1:8846eb27f16a27012f13bfb39e9907bfe7d8bdcd</id>
<content type='text'>
- Add redTeamEnabled and redTeamPrompt state to contracts page
- Add "Enable Red Team Monitoring" checkbox with description
- Add conditional "Custom Review Criteria" textarea when enabled
- Include redTeamEnabled/redTeamPrompt in CreateContractRequest
- Reset red team fields when canceling contract creation
- Add redTeamEnabled to ContractSummary and Contract types
- Add redTeamEnabled/redTeamPrompt to CreateContractRequest type
- Add Red Team badge (🔍) to ContractList for enabled contracts

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
</content>
</entry>
<entry>
<title>Fix makima supervisor pr CLI command</title>
<updated>2026-01-29T01:14:17+00:00</updated>
<author>
<name>soryu</name>
<email>soryu@soryu.co</email>
</author>
<published>2026-01-29T01:14:17+00:00</published>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/commit/?id=f6a40e2304585f140ed5766b25fe71a6958f4425'/>
<id>urn:sha1:f6a40e2304585f140ed5766b25fe71a6958f4425</id>
<content type='text'>
</content>
</entry>
<entry>
<title>fix: Add Qwen3-TTS model download to Docker build (#44)</title>
<updated>2026-01-29T01:04:42+00:00</updated>
<author>
<name>soryu</name>
<email>soryu@soryu.co</email>
</author>
<published>2026-01-29T01:04:42+00:00</published>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/commit/?id=d7b0b576fb43902535f0ae8d4f257b50387ec01a'/>
<id>urn:sha1:d7b0b576fb43902535f0ae8d4f257b50387ec01a</id>
<content type='text'>
* chore: fix unused import warnings in qwen3-tts module

- Remove unused import 'IndexOp' in model.rs
- Remove unused import 'DType' in speech_tokenizer.rs
- Add #[allow(dead_code)] to codebook_dim field in RvqCodebook

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* feat: add voice loading and selection for TTS cloning

Add voice reference audio loading so the TTS speak handler can perform
voice cloning using reference WAV files from the voices/ directory.

- Add voice.rs module: loads manifest.json and reference.wav for a given
  voice_id, decodes via symphonia, resamples to 24kHz for the TTS engine
- Update speak.rs: resolve voice_id from the speak request (default
  "makima"), load reference audio, pass it to engine.generate()
- Add voices/makima/README.md with instructions for obtaining reference
  audio (extraction from YouTube, recording, ffmpeg conversion)
- Graceful fallback: if reference audio is missing, TTS proceeds without
  voice cloning using the model's default voice

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* feat: add inference cancellation support for TTS generation

Add cooperative cancellation via Arc&lt;AtomicBool&gt; cancel flag that
threads through TtsEngine::generate -&gt; Qwen3Tts -&gt; GenerationContext.
The autoregressive loop and streaming decoder check the flag each
iteration and break early when set. The speak WebSocket handler
creates a per-session flag, passes it to generate, and sets it on
Cancel/Stop/Close messages.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Add Qwen3-TTS model download to build process

Fix TTS engine failure due to missing tokenizer by downloading
Qwen3-TTS models during Docker build:
- Download model.safetensors, config.json, tokenizer.json, and
  tokenizer_config.json from Qwen/Qwen3-TTS-12Hz-0.6B-Base
- Download speech tokenizer from Qwen/Qwen3-TTS-Tokenizer-12Hz
- Add QWEN3_TTS_DIR environment variable to Dockerfile
- Script supports both env var override and default path

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;</content>
</entry>
<entry>
<title>feat: Add TTS inference cancellation and voice loading (#43)</title>
<updated>2026-01-28T12:47:18+00:00</updated>
<author>
<name>soryu</name>
<email>soryu@soryu.co</email>
</author>
<published>2026-01-28T12:47:18+00:00</published>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/commit/?id=fc2aa0e9fc63365a78f983634efb25d4444e64c5'/>
<id>urn:sha1:fc2aa0e9fc63365a78f983634efb25d4444e64c5</id>
<content type='text'>
* chore: fix unused import warnings in qwen3-tts module

- Remove unused import 'IndexOp' in model.rs
- Remove unused import 'DType' in speech_tokenizer.rs
- Add #[allow(dead_code)] to codebook_dim field in RvqCodebook

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* feat: add voice loading and selection for TTS cloning

Add voice reference audio loading so the TTS speak handler can perform
voice cloning using reference WAV files from the voices/ directory.

- Add voice.rs module: loads manifest.json and reference.wav for a given
  voice_id, decodes via symphonia, resamples to 24kHz for the TTS engine
- Update speak.rs: resolve voice_id from the speak request (default
  "makima"), load reference audio, pass it to engine.generate()
- Add voices/makima/README.md with instructions for obtaining reference
  audio (extraction from YouTube, recording, ffmpeg conversion)
- Graceful fallback: if reference audio is missing, TTS proceeds without
  voice cloning using the model's default voice

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* feat: add inference cancellation support for TTS generation

Add cooperative cancellation via Arc&lt;AtomicBool&gt; cancel flag that
threads through TtsEngine::generate -&gt; Qwen3Tts -&gt; GenerationContext.
The autoregressive loop and streaming decoder check the flag each
iteration and break early when set. The speak WebSocket handler
creates a per-session flag, passes it to generate, and sets it on
Cancel/Stop/Close messages.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;</content>
</entry>
<entry>
<title>Fix starting phase dropdown to show correct phase names from templates (#42)</title>
<updated>2026-01-28T03:51:07+00:00</updated>
<author>
<name>soryu</name>
<email>soryu@soryu.co</email>
</author>
<published>2026-01-28T03:51:07+00:00</published>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/commit/?id=b141fca0c0604bdeba9fa563a8049cf29cc03bcf'/>
<id>urn:sha1:b141fca0c0604bdeba9fa563a8049cf29cc03bcf</id>
<content type='text'>
* Add comprehensive Red Team system specification

Defines the adversarial review feature for contracts that monitors work tasks
in real-time to catch quality issues, plan deviations, and standards violations.

Key components specified:
- Contract configuration (red_team_enabled, red_team_prompt)
- Red team task lifecycle and spawning logic
- makima red-team notify CLI command for supervisor alerts
- Task output subscription for real-time monitoring
- Database schema changes (contracts, tasks, notifications table)
- API endpoints for notification and status
- System prompt template for red team behavior
- Security considerations and access control

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Task completion checkpoint

* Task completion checkpoint

* Task completion checkpoint

* Fix starting phase dropdown to show correct phase names from templates

Add phaseNames map to ContractTypeTemplate to preserve display names
from custom templates loaded from localStorage. The dropdown now uses
the template's phase name (e.g., 'Design &amp; Architecture') instead of
naive capitalization of the phase ID. Falls back to capitalization for
built-in templates that don't provide phaseNames.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;</content>
</entry>
<entry>
<title>Replace TTS endpoint with Rust-native Qwen3-TTS (#41)</title>
<updated>2026-01-28T03:50:45+00:00</updated>
<author>
<name>soryu</name>
<email>soryu@soryu.co</email>
</author>
<published>2026-01-28T03:50:45+00:00</published>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/commit/?id=9b53f6c6b01da85ef73bd5960b32ec319df0b947'/>
<id>urn:sha1:9b53f6c6b01da85ef73bd5960b32ec319df0b947</id>
<content type='text'>
* chore: fix unused import warnings in qwen3-tts module

- Remove unused import 'IndexOp' in model.rs
- Remove unused import 'DType' in speech_tokenizer.rs
- Add #[allow(dead_code)] to codebook_dim field in RvqCodebook

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* feat: add voice loading and selection for TTS cloning

Add voice reference audio loading so the TTS speak handler can perform
voice cloning using reference WAV files from the voices/ directory.

- Add voice.rs module: loads manifest.json and reference.wav for a given
  voice_id, decodes via symphonia, resamples to 24kHz for the TTS engine
- Update speak.rs: resolve voice_id from the speak request (default
  "makima"), load reference audio, pass it to engine.generate()
- Add voices/makima/README.md with instructions for obtaining reference
  audio (extraction from YouTube, recording, ffmpeg conversion)
- Graceful fallback: if reference audio is missing, TTS proceeds without
  voice cloning using the model's default voice

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* [WIP] Heartbeat checkpoint - 2026-01-28 03:49:13 UTC

---------

Co-authored-by: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;</content>
</entry>
<entry>
<title>Fix frontend build due to incorrect types</title>
<updated>2026-01-28T03:45:36+00:00</updated>
<author>
<name>soryu</name>
<email>soryu@soryu.co</email>
</author>
<published>2026-01-28T03:45:36+00:00</published>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/commit/?id=c14192cc8b0e82369c93c1aee615fcc9cfad5911'/>
<id>urn:sha1:c14192cc8b0e82369c93c1aee615fcc9cfad5911</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Add Qwen3-TTS streaming endpoint for voice synthesis (#40)</title>
<updated>2026-01-28T02:54:17+00:00</updated>
<author>
<name>soryu</name>
<email>soryu@soryu.co</email>
</author>
<published>2026-01-28T02:54:17+00:00</published>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/commit/?id=eabd1304cce0e053cd32ec910d2f0ea429e8af14'/>
<id>urn:sha1:eabd1304cce0e053cd32ec910d2f0ea429e8af14</id>
<content type='text'>
* Task completion checkpoint

* Task completion checkpoint

* Task completion checkpoint

* Add Qwen3-TTS research document for live TTS replacement

Research findings for replacing Chatterbox TTS with Qwen3-TTS-12Hz-0.6B-Base:

- Current TTS: Chatterbox-Turbo-ONNX with batch-only generation, no streaming
- Qwen3-TTS: 97ms end-to-end latency, streaming support, 3-second voice cloning
- Voice cloning: Requires 3s reference audio + transcript (Makima voice planned)
- Integration: Python service with WebSocket bridge (no ONNX export available)
- Languages: 10 supported including English and Japanese

Document includes:
- Current architecture analysis (makima/src/tts.rs)
- Qwen3-TTS capabilities and requirements
- Feasibility assessment for live/streaming TTS
- Audio clip requirements for voice cloning
- Preliminary technical approach with architecture diagrams

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* [WIP] Heartbeat checkpoint - 2026-01-27 03:11:15 UTC

* Add Qwen3-TTS research documentation

Comprehensive research on replacing Chatterbox TTS with Qwen3-TTS-12Hz-0.6B-Base:

- Current TTS implementation analysis (Chatterbox-Turbo-ONNX in makima/src/tts.rs)
- Qwen3-TTS capabilities: 97ms streaming latency, voice cloning with 3s reference
- Cross-lingual support: Japanese voice (Makima/Tomori Kusunoki) speaking English
- Python microservice architecture recommendation (FastAPI + WebSocket)
- Implementation phases and technical approach
- Hardware requirements and dependencies

Key findings:
- Live/streaming TTS is highly feasible with 97ms latency
- Voice cloning fully supported with 0.95 speaker similarity
- Recommended: Python microservice with WebSocket streaming

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Add comprehensive Qwen3-TTS integration specification

This specification document defines the complete integration of
Qwen3-TTS-12Hz-0.6B-Base as a replacement for the existing Chatterbox-Turbo
TTS implementation. The document covers:

## Functional Requirements
- WebSocket endpoint /api/v1/speak for streaming TTS
- Voice cloning with default Makima voice (Japanese VA speaking English)
- Support for custom voice references
- Detailed client-to-server and server-to-client message protocols
- Integration with Listen page for bidirectional speech

## Non-Functional Requirements
- Latency targets: &lt; 200ms first audio byte
- Audio quality: 24kHz, mono, PCM16/PCM32f
- Hardware requirements: CUDA GPU with 4-8GB VRAM
- Scalability: 10 concurrent sessions per GPU

## Architecture Specification
- Python TTS microservice with FastAPI/WebSocket
- Rust proxy endpoint in makima server
- Voice prompt caching mechanism (LRU cache)
- Error handling and recovery strategies

## API Contract
- Complete WebSocket message format definitions (TypeScript)
- Error codes and responses (TTS_UNAVAILABLE, SYNTHESIS_ERROR, etc.)
- Session state machine and lifecycle management

## Voice Asset Requirements
- Makima voice clip specifications (5-10s WAV, transcript required)
- Storage location: models/voices/makima/
- Metadata format for voice management

## Testing Strategy
- Unit tests for Python TTS service and Rust proxy
- Integration tests for WebSocket flow
- Latency benchmarks with performance targets
- Test data fixtures for various text lengths

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Add Qwen3-TTS implementation plan

Comprehensive implementation plan for replacing Chatterbox-TTS with
Qwen3-TTS streaming TTS service, including:

- Task breakdown with estimated hours for each phase
- Phase 1: Python TTS microservice (FastAPI, WebSocket)
- Phase 2: Rust proxy integration (speak.rs, tts_client.rs)
- Detailed file changes and new module structure
- Testing plan with unit, integration, and latency benchmarks
- Risk assessment with mitigation strategies
- Success criteria for each phase

Based on specification in docs/specs/qwen3-tts-spec.md

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Add author and research references to TTS implementation plan

Add links to research documentation and author attribution.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* [WIP] Heartbeat checkpoint - 2026-01-27 03:25:06 UTC

* Add Python TTS service project structure (Phase 1.1-1.3)

Create the initial makima-tts Python service directory structure with:
- pyproject.toml with FastAPI, Qwen-TTS, and torch dependencies
- config.py with pydantic-settings TTSConfig class
- models.py with Pydantic message models (Start, Speak, Stop, Ready, etc.)

This implements tasks P1.1, P1.2, and P1.3 from the Qwen3-TTS implementation plan.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Add TTS engine and voice manager for Qwen3-TTS (Phase 1.4-1.5)

Implement core TTS functionality:
- tts_engine.py: Qwen3-TTS wrapper with streaming audio chunk generation
- voice_manager.py: Voice prompt caching with LRU eviction and TTL support

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* [WIP] Heartbeat checkpoint - 2026-01-27 03:30:06 UTC

* Add TTS proxy client and message types (Phase 2.1, 2.2, 2.4)

- Add tts_client.rs with TtsConfig, TtsCircuitBreaker, TtsError,
  TtsProxyClient, and TtsConnection structs for WebSocket proxying
- Add TTS message types to messages.rs (TtsAudioEncoding, TtsPriority,
  TtsStartMessage, TtsSpeakMessage, TtsStopMessage, TtsClientMessage,
  TtsReadyMessage, TtsAudioChunkMessage, TtsCompleteMessage,
  TtsErrorMessage, TtsStoppedMessage, TtsServerMessage)
- Export tts_client module from server mod.rs
- tokio-tungstenite already present in Cargo.toml

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Add TTS WebSocket handler and route (Phase 2.3, 2.5, 2.6)

- Create speak.rs WebSocket handler that proxies to Python TTS service
- Add TtsState fields (tts_client, tts_config) to AppState
- Add with_tts() builder and is_tts_healthy() methods to AppState
- Register /api/v1/speak route in the router
- Add speak module export in handlers/mod.rs

The handler forwards WebSocket messages bidirectionally between
the client and the Python TTS microservice with proper error handling.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Add Makima voice profile assets for TTS voice cloning

Creates the voice assets directory structure with:
- manifest.json containing voice configuration (voice_id, speaker,
  language, reference audio path, and Japanese transcript placeholder)
- README.md with instructions for obtaining voice reference audio

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Add Rust-native Qwen3-TTS integration research document

Research findings for integrating Qwen3-TTS-12Hz-0.6B-Base directly into
the makima Rust codebase without Python. Key conclusions:
- ONNX export is not viable (unsupported architecture)
- Candle (HF Rust ML framework) is the recommended approach
- Model weights available in safetensors format (2.52GB total)
- Three components needed: LM backbone, code predictor, speech tokenizer
- Crane project has Qwen3-TTS as highest priority (potential upstream)

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* [WIP] Heartbeat checkpoint - 2026-01-27 11:21:43 UTC

* [WIP] Heartbeat checkpoint - 2026-01-27 11:24:19 UTC

* [WIP] Heartbeat checkpoint - 2026-01-27 11:26:43 UTC

* feat: implement Rust-native Qwen3-TTS using candle framework

Replace monolithic tts.rs with modular tts/ directory structure:

- tts/mod.rs: TtsEngine trait, TtsEngineFactory, shared types (AudioChunk,
  TtsError), and utility functions (save_wav, resample, argmax)
- tts/chatterbox.rs: existing ONNX-based ChatterboxTTS adapted to implement
  TtsEngine trait with Mutex-wrapped sessions for Send+Sync
- tts/qwen3/mod.rs: Qwen3Tts entry point with HuggingFace model loading
- tts/qwen3/config.rs: Qwen3TtsConfig parsing from HF config.json
- tts/qwen3/model.rs: 28-layer Qwen3 transformer with RoPE, GQA (16 heads,
  8 KV heads), SiLU MLP, RMS norm, and KV cache
- tts/qwen3/code_predictor.rs: 5-layer MTP module predicting 16 codebooks
- tts/qwen3/speech_tokenizer.rs: ConvNet encoder/decoder with 16-layer RVQ
- tts/qwen3/generate.rs: autoregressive generation loop with streaming support

Add candle-core, candle-nn, candle-transformers, safetensors to Cargo.toml.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* feat: integrate TTS engine into speak WebSocket handler

- Update speak.rs handler to use TTS engine directly from SharedState
  instead of returning a stub "not implemented" error
- Add TtsEngine (OnceCell lazy-loaded) to AppState in state.rs with
  get_tts_engine() method for lazy initialization on first connection
- Implement full WebSocket protocol: client sends JSON speak/cancel/stop
  messages, server streams binary PCM audio chunks and audio_end signals
- Create voices/makima/manifest.json for Makima voice profile configuration
- All files compile successfully with zero errors

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* feat: add /speak TTS page with WebSocket audio playback

Add a new /speak frontend page for text-to-speech via WebSocket.
The page accepts text input and streams synthesized PCM audio through
the Web Audio API. Includes model loading indicator, cancel support,
and connection status. Also adds a loading bar to the listen page
ControlPanel during WebSocket connection.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;</content>
</entry>
</feed>
