Showing 1082 open source projects for "voice"

View related business solutions
  • Awardco Employee Recognition Icon
    Awardco Employee Recognition

    For companies looking to recognize and reward their employees

    Everything you love about Amazon is now available for rewards and recognition. Awardco has partnered with Amazon Business to bring millions of reward choices, lower vendor fees and dollar-for-dollar recognition spend to your organization. More choice, more capability, and less spend - all in one simple platform.
    Learn More
  • The complete IT asset and license management platform Icon
    The complete IT asset and license management platform

    Gain full visibility and control over your IT assets, licenses, usage and spend in one place with Setyl.

    The platform seamlessly integrates with 100+ IT systems, including MDM, RMM, IDP, SSO, HR, finance, helpdesk tools, and more.
    Learn More
  • 1
    Open-LLM-VTuber

    Open-LLM-VTuber

    Open source AI VTuber platform with voice chat and Live2D avatars

    Open-LLM-VTuber is an open source platform designed to create AI-powered VTuber characters that can interact with users through voice and animated avatars. It enables hands-free conversations with large language models by combining speech recognition, language processing, and text-to-speech synthesis into a single system. Users can speak directly to the AI character, and the system can respond with a generated voice while animating a Live2D avatar to simulate a talking virtual personality. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 2
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice cloning — meaning it can mimic a target speaker’s voice from a short reference sample — making it versatile for multi-voice uses. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    Mumble

    Mumble

    Mumble is an open-source, low-latency, high quality voice chat

    Mumble is an open-source, low-latency, high-quality voice chat software. There are two modules in Mumble; the client (mumble) and the server (murmur). The client works on Windows, Linux, FreeBSD, OpenBSD, and macOS, while the server should work on anything Qt can be installed on. Low-latency and high-quality voice-chat program written on top of Qt and Opus. Administrators appreciate Mumble for being able to self-host and have control over data security and privacy.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 4
    Rhino

    Rhino

    On-device Speech-to-Intent engine powered by deep learning

    Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a given context of interest, in real-time. The end-to-end platform for embedding private voice AI into any software in a few lines of code. Design with no limits on top of a modular platform. Create use-case-specific voice AI models in seconds. Develop voice features with a few lines of code using intuitive and cross-platform SDKs. Deliver voice AI everywhere: on-device, mobile, web browsers, on-premise, or cloud. Measure adoption, learn, and iterate. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Waitlist App and Paging System Icon
    Waitlist App and Paging System

    We make it cheaper and easier to manage your waitlist, order backlog, and just about any other waiting scenario.

    Streamline your customer flow with our SMS-powered waitlist, reservations, and queue management app for restaurants, health care providers, and many other businesses.
    Free Trial
  • 5
    MegaTTS 3

    MegaTTS 3

    Official PyTorch Implementation

    MegaTTS3 is an open-source text-to-speech (TTS) and voice-cloning system from ByteDance that aims to deliver high-quality, expressive speech synthesis, including zero-shot voice cloning of previously unseen speakers. Its backbone is a lightweight diffusion-transformer (on the order of ~0.45 B parameters), which enables efficient inference while still producing high-fidelity audio. Given a reference audio sample (and corresponding latent representation), MegaTTS3 can generate speech in the style and voice timbre of that speaker — useful for personalized TTS, voice-overs, dubbing, or multi-speaker applications. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    ebook2audiobook

    ebook2audiobook

    Generate audiobooks from e-books, voice cloning & 1107+ languages

    ...It automates the pipeline: it reads the eBook file, splits it into appropriate segments (chapters, paragraphs), uses text-to-speech (TTS) models to synthesize audio, optionally applies voice cloning, and outputs a final audiobook — ideal for people who prefer listening over reading, or for accessibility purposes. The tool supports a wide array of underlying TTS backends (XTTSv2, Bark, VITS, Fairseq, Tacotron2, YourTTS and more), which gives flexibility depending on hardware availability, voice preference, and language. ...
    Downloads: 32 This Week
    Last Update:
    See Project
  • 7
    OmniVoice

    OmniVoice

    High-Quality Voice Cloning TTS for 600+ Languages

    The OmniVoice project is a cutting-edge multilingual text-to-speech system designed to generate high-quality speech across more than 600 languages. Built on a diffusion language model-style architecture, it combines scalability with strong performance, enabling both natural-sounding voice synthesis and efficient inference speeds. One of its most notable capabilities is zero-shot voice cloning, allowing users to replicate a speaker’s voice using only a short reference audio clip. In addition, it supports voice design through configurable attributes such as gender, accent, pitch, and speaking style, giving users fine-grained control over generated speech. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Project AIRI

    Project AIRI

    Self hosted, you-owned Grok Companion

    AIRI is a self-hosted AI companion platform designed to create interactive virtual characters capable of real-time conversation, gameplay interaction, and multimedia presence. The project aims to emulate advanced AI personalities similar to popular autonomous VTuber-style agents, combining voice interaction, animation, and behavioral logic into a unified system. It supports deployment across web, macOS, and Windows environments, making it accessible for hobbyists and developers building digital companions. AIRI integrates real-time voice chat capabilities and can interact with external applications such as games, enabling more immersive and dynamic experiences. ...
    Downloads: 82 This Week
    Last Update:
    See Project
  • 9
    Qwen3-TTS

    Qwen3-TTS

    Qwen3-TTS is an open-source series of TTS models

    ...Developers can customize voice output parameters like speed, pitch, and volume, and combine the TTS stack with other AI components.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Simplify Your Managed File Transfers with JSCAPE Icon
    Simplify Your Managed File Transfers with JSCAPE

    JSCAPE is a Flexible, Scalable MFT Solution That Supports Any Protocol, Any Platform, Any Deployment

    Platform Independent Managed File Transfer Server. JSCAPE is the perfect solution for businesses and government agencies looking to centralize your processes and provide secure, seamless and reliable file transfers. Meet all compliance regulations including PCI DSS, SOX, HIPAA and GLBA.
    Learn More
  • 10
    LuxTTS

    LuxTTS

    A high-quality rapid TTS voice cloning model

    ...The project supports zero-shot voice cloning, meaning it can adapt to a reference speaker’s voice with minimal example data, enabling realistic and personalized synthetic speech. Intended for developers, hobbyists, and creators, the repository includes installation instructions, usage examples, and Python APIs that make it feasible to integrate the model in local workflows, web demos, or production systems.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    DiscordBotClient

    DiscordBotClient

    A patched version of discord, with bot login support

    A patched version of Discord, with bot login support. Discord Bot Client allows you to use your bot, just like any other user account, except for Friends and Groups.
    Downloads: 111 This Week
    Last Update:
    See Project
  • 12
    RealtimeSTT

    RealtimeSTT

    A robust, efficient, low-latency speech-to-text library

    RealtimeSTT is a Python-based realtime speech-to-text engine emphasizing low latency, wake-word detection, voice activity detection, and automatic speech segmentation. It provides asynchronous callbacks, nanosecond-precision timestamps, and CLI tools, suitable for building voice assistants, meeting transcribers, or live caption systems.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    MetaVoice-1B

    MetaVoice-1B

    Foundational model for human-like, expressive TTS

    ...The goal is to provide human-like, expressive, and flexible TTS: able to generate natural-sounding speech that can handle diverse inputs and likely generalize over voice styles, intonation, prosody, and perhaps multiple languages or accents. With that scale and dataset volume, MetaVoice aims to push the boundary of what open-source TTS models can achieve: high fidelity, natural prosody, and robustness even for edge cases. As a foundational model, it can serve as the backbone for downstream tasks — such as voice generation, voice cloning, speech generation for virtual agents, or even audio production pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    ElevenLabs Python

    ElevenLabs Python

    The official Python SDK for the ElevenLabs API

    elevenlabs-python is the official Python SDK for the ElevenLabs API, giving developers a convenient way to access ElevenLabs’ high-quality, lifelike voices. The library wraps the HTTP API into a typed Python client, so you can perform text-to-speech, streaming, voice cloning, voice management, and agents-related operations with simple method calls. It exposes ElevenLabs’ main models such as Eleven Multilingual v2, Eleven Flash v2.5, and Eleven Turbo v2.5, each targeting different trade-offs between latency, cost, and quality. The SDK is designed for quick setup: after installing the package and setting an API key, you can generate speech in multiple languages and play or process the resulting audio bytes. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 15
    Bailing

    Bailing

    Bailing is a voice dialogue robot similar to GPT-4o

    Bailing is an open-source voice-dialogue assistant designed to deliver natural voice-based conversations by combining automatic speech recognition (ASR), voice activity detection (VAD), a large language model (LLM), and text-to-speech (TTS) in a single pipeline. Its goal is to offer a “voice-first” chat experience similar to what one might expect from a system like GPT-4o, but fully open and deployable by users.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    ...Trained on a large 1.8-million-hour bilingual corpus, VoxCPM can infer appropriate speaking style from context, dynamically adjusting intonation, rhythm, and emotional tone. It supports zero-shot voice cloning from a short reference audio clip, capturing timbre, accent, and pacing to closely mimic a target speaker without per-speaker fine-tuning.
    Downloads: 44 This Week
    Last Update:
    See Project
  • 17
    Rasa

    Rasa

    Open source machine learning framework to automate text conversations

    Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual assistants on Facebook Messenger, Slack, Google Hangouts, Webex Teams, Microsoft Bot Framework, Rocket.Chat, Mattermost, Telegram, and Twilio or on your own custom conversational channels. Rasa helps you build contextual assistants capable of having layered conversations with lots of back-and-forths.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 18
    Kaset

    Kaset

    The missing YouTube Music macOS app

    Kaset is a social audio platform framework that allows users to host, share, and interact with audio content in community-oriented spaces, combining elements of podcasting, voice rooms, and feedback-driven discovery. It provides an interface where creators can upload episodes, host live or scheduled voice sessions, and cultivate listener communities through comments, reactions, and follow systems. The platform emphasizes audio discovery with playlists, curated channels, and trending audio feeds, helping users find relevant voice content without sifting through noise. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    VoxCPM2

    VoxCPM2

    Tokenizer-Free TTS for Multilingual Speech Generation

    ...The system is trained on massive multilingual datasets, enabling support for dozens of languages and dialects while maintaining high fidelity and realism in generated audio. VoxCPM stands out for its ability to perform voice cloning with minimal input, capturing not only the speaker’s timbre but also nuanced features such as rhythm, accent, and emotional delivery. It also introduces voice design capabilities, allowing users to generate entirely new voices from natural language descriptions without requiring reference audio.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 20
    Meshenger

    Meshenger

    P2P Voice/Video phone App for local networks

    Meshenger is an open-source, serverless P2P voice and video calling app for Android that works over local networks or directly between devices without the internet. It facilitates direct communication using QR codes or IP addresses, bypassing the need for any central infrastructure or account registration. Meshenger is particularly suited for emergency scenarios, privacy-focused users, and mesh networks where conventional communication tools fail.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 21
    KrillinAI

    KrillinAI

    Video translation and dubbing tool powered by LLMs

    ...It integrates several stages of the pipeline: video acquisition (either from local files or remote via download tools), speech recognition (ASR), subtitle segmentation and alignment, machine translation (with context-aware translation to preserve semantics), and voice cloning + text-to-speech (TTS) to produce dubbed audio tracks. KrillinAI supports both landscape and portrait videos, which makes it suitable for a wide range of platforms — from YouTube to TikTok or other vertical-video sites — and ensures correct formatting and layout for the final video. The tool offers “one-click” workflows and desktop versions, lowering the barrier for users who may not be familiar with video editing or audio processing pipelines.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 22
    CosyVoice

    CosyVoice

    Multi-lingual large voice generation model, providing inference

    CosyVoice is a multilingual large voice generation model that offers a full-stack solution for training, inference, and deployment of high-quality TTS systems. The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech across languages and in code-switching contexts. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Clicky

    Clicky

    AI teacher that lives as a buddy next to your cursor

    Clicky is an experimental AI-powered desktop companion designed to act as an interactive, real-time teaching assistant that lives directly alongside the user’s cursor on macOS. It functions as a menu bar application that can observe the user’s screen, interpret context, and provide guidance through both voice and visual cues, effectively simulating the experience of having a human tutor sitting next to you. The system captures screenshots and combines them with voice input to send contextual queries to AI models, which then respond with both spoken explanations and on-screen visual pointers. One of its defining features is the ability to physically “point” at UI elements across multiple monitors using a cursor overlay, helping users navigate complex software step by step. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 24
    VibeVoice ComfyUI

    VibeVoice ComfyUI

    ComfyUI integration for Microsoft's VibeVoice text-to-speech model

    ...The project also introduces first-class LoRA support, making it possible to fine-tune and load custom LoRA adapters that modify voice identity or style while keeping the base VibeVoice model intact.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    Jovo Framework

    Jovo Framework

    The React for Voice and Chat, build apps for Alexa, Google Assistant

    The multimodal experience platform enables professional teams to build and run apps that work across smart speakers, the web, mobile, and more. Fully customizable and open source. The Jovo product ecosystem allows you to build, test, and run powerful experiences for voice, chat, and web platforms. From local development to production, Jovo allows you to build robust experiences, faster. Build across devices and platforms and use all supported modalities thanks to the Jovo output template engine. Our component and plugin architecture makes it possible to make Jovo work for your specific use case, across projects. ...
    Downloads: 6 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB