diff options
| author | soryu <soryu@soryu.co> | 2026-01-27 03:11:08 +0000 |
|---|---|---|
| committer | soryu <soryu@soryu.co> | 2026-01-27 03:11:08 +0000 |
| commit | ebe029483184d51e702adb9ed79ea70d681a35f8 (patch) | |
| tree | e69c54d7016fef0b2e6c5c3635f4a8501f66ae64 /apps/mobile/components/TaskStatusBadge.tsx | |
| parent | f6b4d06a0158fb7803a2d7a861cf891cb3b202b4 (diff) | |
| download | soryu-ebe029483184d51e702adb9ed79ea70d681a35f8.tar.gz soryu-ebe029483184d51e702adb9ed79ea70d681a35f8.zip | |
Add Qwen3-TTS research document for live TTS replacement
Research findings for replacing Chatterbox TTS with Qwen3-TTS-12Hz-0.6B-Base:
- Current TTS: Chatterbox-Turbo-ONNX with batch-only generation, no streaming
- Qwen3-TTS: 97ms end-to-end latency, streaming support, 3-second voice cloning
- Voice cloning: Requires 3s reference audio + transcript (Makima voice planned)
- Integration: Python service with WebSocket bridge (no ONNX export available)
- Languages: 10 supported including English and Japanese
Document includes:
- Current architecture analysis (makima/src/tts.rs)
- Qwen3-TTS capabilities and requirements
- Feasibility assessment for live/streaming TTS
- Audio clip requirements for voice cloning
- Preliminary technical approach with architecture diagrams
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Diffstat (limited to 'apps/mobile/components/TaskStatusBadge.tsx')
0 files changed, 0 insertions, 0 deletions
