Framework for building realtime multimodal voice AI agents apps
Omilo is a simple text to speech application
Voice Recognition to Text Tool
The official Python SDK for the ElevenLabs API
High-Quality Voice Cloning TTS for 600+ Languages
A simple, high-quality voice conversion tool focused on ease of use
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Controllable & emotion-expressive zero-shot TTS
The python library for real-time communication
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
A single Gradio + React WebUI with extensions for ACE-Step
AI teacher that lives as a buddy next to your cursor
Offline inference engine for art, real-time voice conversations
Faster Whisper transcription with CTranslate2
State-of-the-art TTS model under 25MB
Subtitle Creation Assistant
Readest is a modern, feature-rich ebook reader
Gp.nvim (GPT prompt) Neovim AI plugin
Towards Human-Sounding Speech
Foundational model for human-like, expressive TTS
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Speakr is a personal, self-hosted web application
A speech-text foundation model for real time dialogue
Stanford CoreNLP, a Java suite of core NLP tools
Framework for building real-time voice and multimodal AI agents