summaryrefslogtreecommitdiff
path: root/parakeet-rs/README.md
diff options
context:
space:
mode:
authorsoryu <soryu@soryu.co>2025-12-21 01:27:02 +0000
committersoryu <soryu@soryu.co>2025-12-23 14:47:18 +0000
commit3c696cfc9005e73be5ed46f8941dfc8f0aca7102 (patch)
tree497bffd67001501a003739cfe0bb790502ffd50a /parakeet-rs/README.md
parent55cacf6e1a087c0fa6950a1ddeb09060f787e541 (diff)
downloadsoryu-3c696cfc9005e73be5ed46f8941dfc8f0aca7102.tar.gz
soryu-3c696cfc9005e73be5ed46f8941dfc8f0aca7102.zip
Create container image and move parakeet fork to vendor dir
Diffstat (limited to 'parakeet-rs/README.md')
-rw-r--r--parakeet-rs/README.md122
1 files changed, 0 insertions, 122 deletions
diff --git a/parakeet-rs/README.md b/parakeet-rs/README.md
deleted file mode 100644
index 75dfe85..0000000
--- a/parakeet-rs/README.md
+++ /dev/null
@@ -1,122 +0,0 @@
-# parakeet-rs
-[![Rust](https://github.com/altunenes/parakeet-rs/actions/workflows/rust.yml/badge.svg)](https://github.com/altunenes/parakeet-rs/actions/workflows/rust.yml)
-[![crates.io](https://img.shields.io/crates/v/parakeet-rs.svg)](https://crates.io/crates/parakeet-rs)
-
-Fast speech recognition with NVIDIA's Parakeet models via ONNX Runtime.
-Note: CoreML doesn't stable with this model - stick w/ CPU (or other GPU EP like CUDA). But its incredible fast in my Mac M3 16gb' CPU compared to Whisper metal! :-)
-
-## Models
-
-**CTC (English-only)**: Fast & accurate
-```rust
-use parakeet_rs::Parakeet;
-
-let mut parakeet = Parakeet::from_pretrained(".", None)?;
-let result = parakeet.transcribe_file("audio.wav")?;
-println!("{}", result.text);
-
-// Or transcribe in-memory audio
-// let result = parakeet.transcribe_samples(audio, 16000, 1)?;
-
-// Token-level timestamps
-for token in result.tokens {
- println!("[{:.3}s - {:.3}s] {}", token.start, token.end, token.text);
-}
-```
-
-**TDT (Multilingual)**: 25 languages with auto-detection
-```rust
-use parakeet_rs::ParakeetTDT;
-
-let mut parakeet = ParakeetTDT::from_pretrained("./tdt", None)?;
-let result = parakeet.transcribe_file("audio.wav")?;
-println!("{}", result.text);
-
-// Or transcribe in-memory audio
-// let result = parakeet.transcribe_samples(audio, 16000, 1)?;
-
-// Token-level timestamps
-for token in result.tokens {
- println!("[{:.3}s - {:.3}s] {}", token.start, token.end, token.text);
-}
-```
-
-**EOU (Streaming)**: Real-time ASR with end-of-utterance detection
-```rust
-use parakeet_rs::ParakeetEOU;
-
-let mut parakeet = ParakeetEOU::from_pretrained("./eou", None)?;
-
-// Prepare your audio (Vec<f32>, 16kHz mono, normalized)
-let audio: Vec<f32> = /* your audio samples */;
-
-// Process in 160ms chunks for streaming
-const CHUNK_SIZE: usize = 2560; // 160ms at 16kHz
-for chunk in audio.chunks(CHUNK_SIZE) {
- let text = parakeet.transcribe(chunk, false)?;
- print!("{}", text);
-}
-```
-
-**Sortformer v2 (Speaker Diarization)**: Streaming 4-speaker diarization
-```toml
-parakeet-rs = { version = "0.2", features = ["sortformer"] }
-```
-```rust
-use parakeet_rs::sortformer::{Sortformer, DiarizationConfig};
-
-let mut sortformer = Sortformer::with_config(
- "diar_streaming_sortformer_4spk-v2.onnx",
- None,
- DiarizationConfig::callhome(), // or dihard3(),custom()
-)?;
-let segments = sortformer.diarize(audio, 16000, 1)?;
-for seg in segments {
- println!("Speaker {} [{:.2}s - {:.2}s]", seg.speaker_id, seg.start, seg.end);
-}
-```
-See `examples/diarization.rs` for combining with TDT transcription.
-
-
-## Setup
-
-**CTC**: Download from [HuggingFace](https://huggingface.co/onnx-community/parakeet-ctc-0.6b-ONNX/tree/main/onnx): `model.onnx`, `model.onnx_data`, `tokenizer.json`
-
-**TDT**: Download from [HuggingFace](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx): `encoder-model.onnx`, `encoder-model.onnx.data`, `decoder_joint-model.onnx`, `vocab.txt`
-
-**EOU**: Download from [HuggingFace](https://huggingface.co/altunenes/parakeet-rs/tree/main/realtime_eou_120m-v1-onnx): `encoder.onnx`, `decoder_joint.onnx`, `tokenizer.json`
-
-**Diarization (Sortformer v2)**: Download from [HuggingFace](https://huggingface.co/altunenes/parakeet-rs/blob/main/diar_streaming_sortformer_4spk-v2.onnx): `diar_streaming_sortformer_4spk-v2.onnx`
-
-Quantized versions available (int8). All files must be in the same directory.
-
-GPU support (auto-falls back to CPU if fails):
-```toml
-parakeet-rs = { version = "0.1", features = ["cuda"] } # or tensorrt, webgpu, directml, rocm
-```
-
-```rust
-use parakeet_rs::{Parakeet, ExecutionConfig, ExecutionProvider};
-
-let config = ExecutionConfig::new().with_execution_provider(ExecutionProvider::Cuda);
-let mut parakeet = Parakeet::from_pretrained(".", Some(config))?;
-```
-
-
-## Features
-
-- [CTC: English with punctuation & capitalization](https://huggingface.co/nvidia/parakeet-ctc-0.6b)
-- [TDT: Multilingual (auto lang detection) ](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)
-- [EOU: Streaming ASR with end-of-utterance detection](https://huggingface.co/nvidia/parakeet_realtime_eou_120m-v1)
-- [Sortformer v2: Streaming speaker diarization (up to 4 speakers)](https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2)
-- Token-level timestamps (CTC, TDT)
-
-## Notes
-
-- Audio: 16kHz mono WAV (16-bit PCM or 32-bit float)
-
-## License
-
-Code: MIT OR Apache-2.0
-
-FYI: The Parakeet ONNX models (downloaded separately from HuggingFace) are licensed under **CC-BY-4.0** by NVIDIA. This library does not distribute the models.