fix: Use correct hf command for Qwen3-TTS download (#46)

* chore: fix unused import warnings in qwen3-tts module - Remove unused import 'IndexOp' in model.rs - Remove unused import 'DType' in speech_tokenizer.rs - Add #[allow(dead_code)] to codebook_dim field in RvqCodebook Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add voice loading and selection for TTS cloning Add voice reference audio loading so the TTS speak handler can perform voice cloning using reference WAV files from the voices/ directory. - Add voice.rs module: loads manifest.json and reference.wav for a given voice_id, decodes via symphonia, resamples to 24kHz for the TTS engine - Update speak.rs: resolve voice_id from the speak request (default "makima"), load reference audio, pass it to engine.generate() - Add voices/makima/README.md with instructions for obtaining reference audio (extraction from YouTube, recording, ffmpeg conversion) - Graceful fallback: if reference audio is missing, TTS proceeds without voice cloning using the model's default voice Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add inference cancellation support for TTS generation Add cooperative cancellation via Arc<AtomicBool> cancel flag that threads through TtsEngine::generate -> Qwen3Tts -> GenerationContext. The autoregressive loop and streaming decoder check the flag each iteration and break early when set. The speak WebSocket handler creates a per-session flag, passes it to generate, and sets it on Cancel/Stop/Close messages. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add Qwen3-TTS model download to build process Fix TTS engine failure due to missing tokenizer by downloading Qwen3-TTS models during Docker build: - Download model.safetensors, config.json, tokenizer.json, and tokenizer_config.json from Qwen/Qwen3-TTS-12Hz-0.6B-Base - Download speech tokenizer from Qwen/Qwen3-TTS-Tokenizer-12Hz - Add QWEN3_TTS_DIR environment variable to Dockerfile - Script supports both env var override and default path Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use correct hf command for Qwen3-TTS download Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
author: soryu <soryu@soryu.co> 2026-01-29 02:24:04 +0000
committer: GitHub <noreply@github.com> 2026-01-29 02:24:04 +0000
commit: 764ace78046e78cce36b64cb3682cc5489bcf9d7 (patch)
tree: 274faa2561f99bda2fd8137e2d3f4ac9ae980a7f
parent: 45a433c0eb63cae1322203ee14292f1c427a09c9 (diff)
download: soryu-764ace78046e78cce36b64cb3682cc5489bcf9d7.tar.gz
soryu-764ace78046e78cce36b64cb3682cc5489bcf9d7.zip
1 files changed, 2 insertions, 2 deletions
diff --git a/makima/sh/download-models.sh b/makima/sh/download-models.sh
index 1aefad8..4d791af 100755
--- a/makima/sh/download-models.sh
+++ b/makima/sh/download-models.sh
@@ -128,7 +128,7 @@ download_qwen3_tts() {
 
     # Download base TTS model files from Qwen/Qwen3-TTS-12Hz-0.6B-Base
     echo "Downloading Qwen3-TTS-12Hz-0.6B-Base..."
-    huggingface-cli download Qwen/Qwen3-TTS-12Hz-0.6B-Base \
+    hf download Qwen/Qwen3-TTS-12Hz-0.6B-Base \
         model.safetensors \
         config.json \
         tokenizer.json \
@@ -138,7 +138,7 @@ download_qwen3_tts() {
     # Download speech tokenizer from Qwen/Qwen3-TTS-Tokenizer-12Hz
     echo "Downloading Qwen3-TTS-Tokenizer-12Hz..."
     local tmpdir=$(mktemp -d)
-    huggingface-cli download Qwen/Qwen3-TTS-Tokenizer-12Hz \
+    hf download Qwen/Qwen3-TTS-Tokenizer-12Hz \
         model.safetensors \
         --local-dir "$tmpdir"
     mv "$tmpdir/model.safetensors" "$QWEN3_TTS_DIR/speech_tokenizer.safetensors"
author	soryu <soryu@soryu.co>	2026-01-29 02:24:04 +0000
committer	GitHub <noreply@github.com>	2026-01-29 02:24:04 +0000
commit	764ace78046e78cce36b64cb3682cc5489bcf9d7 (patch)
tree	274faa2561f99bda2fd8137e2d3f4ac9ae980a7f
parent	45a433c0eb63cae1322203ee14292f1c427a09c9 (diff)
download	soryu-764ace78046e78cce36b64cb3682cc5489bcf9d7.tar.gz soryu-764ace78046e78cce36b64cb3682cc5489bcf9d7.zip