Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Robust Speech Recognition via Large-Scale Weak Supervision
Multilingual speech recognition and audio understanding model
Audio foundation model excelling in audio understanding
StreamSpeech is a seamless model for offline speech recognition
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Fast multimodal LLM for real-time voice interaction and AI apps
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Framework for building real-time voice and multimodal AI agents
End-to-end speech processing toolkit
Real-time voice interactive digital human
LLM Large Model of Selling Anchor
Open source AI VTuber platform with voice chat and Live2D avatars
Qwen3-ASR is an open-source series of ASR models
Large Audio Language Model built for natural interactions
AI-powered tool for generating, optimizing, and translating subtitles
Production ready toolkit to run AI locally
Framework for building neural networks
Workflow and speech recognition app
Textream is a free macOS teleprompter app for streamers, interviewers
Open source AI wearable platform for recording and summarizing speech
Realtime AI Voice Agents with SoTA Multimodal AI models on Arduino ESP
A Web UI for easy subtitle using whisper model
Foundational Models for State-of-the-Art Speech and Text Translation
The media player for language learning, with dual subtitles