Audio foundation model excelling in audio understanding
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Framework for building real-time voice and multimodal AI agents
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Automatic Speech Recognition with Word-level Timestamps
Open source AI model for generating full songs from lyrics prompts
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Open source AI wearable platform for recording and summarizing speech
Build multimodal language agents for fast prototype and production
Official repository for LTX-Video
Qwen3-TTS is an open-source series of TTS models
Instill Core is a full-stack AI infrastructure tool for data
WhatsApp MCP server enabling AI access to chats and messaging
Multi-lingual large voice generation model, providing inference
Omnilingual ASR Open-Source Multilingual SpeechRecognition
Build multimodal AI applications with cloud-native stack
TorchMultimodal is a PyTorch library
32/64 bit multi-platform Ethernet S7 PLC communication suite
An extremely simple tool for separating vocals and background music
Is a web server for all Web Developers and Web Designers
Real-time music generation using stable diffusion techniques AI
Defeating Google's audio reCaptcha with 85% accuracy
A GUI foundation for creating U/I based content