<feed xmlns='http://www.w3.org/2005/Atom'>
<title>soryu/docs, branch makima/task-task-e397cd28-e397cd28</title>
<subtitle>soryu-co/soryu mirror</subtitle>
<id>http://src.eirin.xyz/soryu/atom?h=makima%2Ftask-task-e397cd28-e397cd28</id>
<link rel='self' href='http://src.eirin.xyz/soryu/atom?h=makima%2Ftask-task-e397cd28-e397cd28'/>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/'/>
<updated>2026-01-27T03:11:08+00:00</updated>
<entry>
<title>Add Qwen3-TTS research document for live TTS replacement</title>
<updated>2026-01-27T03:11:08+00:00</updated>
<author>
<name>soryu</name>
<email>soryu@soryu.co</email>
</author>
<published>2026-01-27T03:11:08+00:00</published>
<link rel='alternate' type='text/html' href='http://src.eirin.xyz/soryu/commit/?id=ebe029483184d51e702adb9ed79ea70d681a35f8'/>
<id>urn:sha1:ebe029483184d51e702adb9ed79ea70d681a35f8</id>
<content type='text'>
Research findings for replacing Chatterbox TTS with Qwen3-TTS-12Hz-0.6B-Base:

- Current TTS: Chatterbox-Turbo-ONNX with batch-only generation, no streaming
- Qwen3-TTS: 97ms end-to-end latency, streaming support, 3-second voice cloning
- Voice cloning: Requires 3s reference audio + transcript (Makima voice planned)
- Integration: Python service with WebSocket bridge (no ONNX export available)
- Languages: 10 supported including English and Japanese

Document includes:
- Current architecture analysis (makima/src/tts.rs)
- Qwen3-TTS capabilities and requirements
- Feasibility assessment for live/streaming TTS
- Audio clip requirements for voice cloning
- Preliminary technical approach with architecture diagrams

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
</content>
</entry>
</feed>
