soryu - soryu-co/soryu mirror

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix downloading too many models	soryu	2026-02-02	1	-3/+21
\|
*	Use chatterbox TTS	soryu	2026-02-01	1	-35/+19
\|
*	Download vocab.json and merges.txt in container image	soryu	2026-01-30	1	-2/+5
\|
*	Ensure tokenizor exists for TTS model	soryu	2026-01-29	1	-1/+5
\|
*	fix: Use correct hf command for Qwen3-TTS download (#46)	soryu	2026-01-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* chore: fix unused import warnings in qwen3-tts module - Remove unused import 'IndexOp' in model.rs - Remove unused import 'DType' in speech_tokenizer.rs - Add #[allow(dead_code)] to codebook_dim field in RvqCodebook Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add voice loading and selection for TTS cloning Add voice reference audio loading so the TTS speak handler can perform voice cloning using reference WAV files from the voices/ directory. - Add voice.rs module: loads manifest.json and reference.wav for a given voice_id, decodes via symphonia, resamples to 24kHz for the TTS engine - Update speak.rs: resolve voice_id from the speak request (default "makima"), load reference audio, pass it to engine.generate() - Add voices/makima/README.md with instructions for obtaining reference audio (extraction from YouTube, recording, ffmpeg conversion) - Graceful fallback: if reference audio is missing, TTS proceeds without voice cloning using the model's default voice Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add inference cancellation support for TTS generation Add cooperative cancellation via Arc<AtomicBool> cancel flag that threads through TtsEngine::generate -> Qwen3Tts -> GenerationContext. The autoregressive loop and streaming decoder check the flag each iteration and break early when set. The speak WebSocket handler creates a per-session flag, passes it to generate, and sets it on Cancel/Stop/Close messages. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add Qwen3-TTS model download to build process Fix TTS engine failure due to missing tokenizer by downloading Qwen3-TTS models during Docker build: - Download model.safetensors, config.json, tokenizer.json, and tokenizer_config.json from Qwen/Qwen3-TTS-12Hz-0.6B-Base - Download speech tokenizer from Qwen/Qwen3-TTS-Tokenizer-12Hz - Add QWEN3_TTS_DIR environment variable to Dockerfile - Script supports both env var override and default path Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use correct hf command for Qwen3-TTS download Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
*	fix: Add Qwen3-TTS model download to Docker build (#44)	soryu	2026-01-29	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* chore: fix unused import warnings in qwen3-tts module - Remove unused import 'IndexOp' in model.rs - Remove unused import 'DType' in speech_tokenizer.rs - Add #[allow(dead_code)] to codebook_dim field in RvqCodebook Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add voice loading and selection for TTS cloning Add voice reference audio loading so the TTS speak handler can perform voice cloning using reference WAV files from the voices/ directory. - Add voice.rs module: loads manifest.json and reference.wav for a given voice_id, decodes via symphonia, resamples to 24kHz for the TTS engine - Update speak.rs: resolve voice_id from the speak request (default "makima"), load reference audio, pass it to engine.generate() - Add voices/makima/README.md with instructions for obtaining reference audio (extraction from YouTube, recording, ffmpeg conversion) - Graceful fallback: if reference audio is missing, TTS proceeds without voice cloning using the model's default voice Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add inference cancellation support for TTS generation Add cooperative cancellation via Arc<AtomicBool> cancel flag that threads through TtsEngine::generate -> Qwen3Tts -> GenerationContext. The autoregressive loop and streaming decoder check the flag each iteration and break early when set. The speak WebSocket handler creates a per-session flag, passes it to generate, and sets it on Cancel/Stop/Close messages. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add Qwen3-TTS model download to build process Fix TTS engine failure due to missing tokenizer by downloading Qwen3-TTS models during Docker build: - Download model.safetensors, config.json, tokenizer.json, and tokenizer_config.json from Qwen/Qwen3-TTS-12Hz-0.6B-Base - Download speech tokenizer from Qwen/Qwen3-TTS-Tokenizer-12Hz - Add QWEN3_TTS_DIR environment variable to Dockerfile - Script supports both env var override and default path Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
*	Add Postgres for persistence and File cabinet	soryu	2025-12-23	4	-0/+65
\| \| \| \|	Migrations are local only currently, and must be run manually by setting POSTGRES_CONNECTION_URI
*	Bump diarization version to 2.1 and fix downloading the tokenizer	soryu	2025-12-23	1	-11/+45
\|
*	Use hf cli to download models	soryu	2025-12-23	1	-2/+2
\|
*	Use HF to download models	soryu	2025-12-23	1	-18/+42
\|
*	Create container image and move parakeet fork to vendor dir	soryu	2025-12-23	1	-0/+60