Motivation
Omi currently uses Deepgram for speech-to-text, which requires API keys and incurs costs. FunASR (16K+ stars) provides a fully self-hosted, open-source alternative with an OpenAI-compatible API — meaning minimal integration effort.
Why FunASR
- Free & self-hosted: No API keys, no per-minute billing, data stays on your infrastructure
- OpenAI-compatible API:
/v1/audio/transcriptions endpoint — drop-in replacement
- 50+ languages including English, Chinese, Japanese, Korean, etc.
- Industrial-grade accuracy: Paraformer (non-autoregressive, 170x realtime on GPU), SenseVoice (50+ languages, emotion detection)
- Built-in VAD + punctuation + speaker diarization (cam++)
- Runs on consumer GPUs: SenseVoice-Small (234M params) works on 4GB VRAM
Quick Start
pip install funasr vllm
funasr-server --device cuda # starts OpenAI-compatible server at :8000
# Test
curl http://localhost:8000/v1/audio/transcriptions \
-F file=@audio.wav -F model=sensevoice
Since Omi already has Deepgram self-hosted deployment (Helm charts in backend/charts/), FunASR could serve as a lighter-weight, truly open-source alternative that's easier to deploy.
References
Motivation
Omi currently uses Deepgram for speech-to-text, which requires API keys and incurs costs. FunASR (16K+ stars) provides a fully self-hosted, open-source alternative with an OpenAI-compatible API — meaning minimal integration effort.
Why FunASR
/v1/audio/transcriptionsendpoint — drop-in replacementQuick Start
Since Omi already has Deepgram self-hosted deployment (Helm charts in
backend/charts/), FunASR could serve as a lighter-weight, truly open-source alternative that's easier to deploy.References
/v1/audio/transcriptions