Search Results for "/storage/emulated/0/android/data/net.sourceforge.uiq3.fx603p/files"

Showing 70 open source projects for "/storage/emulated/0/android/data/net.sourceforge.uiq3.fx603p/files"

View related business solutions
  • Create stunning, professional email signatures in minutes Icon
    Create stunning, professional email signatures in minutes

    For companies looking to create, assign and manage all their employees email signatures and add targeted marketing banners.

    Create, assign and manage all your employees’ email signatures and add targeted marketing banners. Stop getting worked up about your signatures! Leverage a centralized interface to easily create and manage the email signatures of all your employees. Take advantage of each email to broadcast and amplify your brand. Letsignit helps you regain control over your digital identity. Harmonize 100% of your employee’s email signatures in just a few clicks! 121 professional emails are received and 40 are sent every day by an employee. With Letsignit, turn every email into a powerful communication opportunity: send the right message to the right person at the right time! Innovative more than tech, inspiring more than following. Authentic more than overrated, close more than "think big", trustworthy more than doubtful. Hands-on more than complex, available but yet premium, fun but yet expert.
    Learn More
  • Stigg | SaaS Monetization and Entitlements API Icon
    Stigg | SaaS Monetization and Entitlements API

    For developers in need of a tool to launch pricing plans faster and build better buying experiences

    A monetization platform is a standalone middleware that sits between your application and your business applications, as part of the modern enterprise billing stack. Stigg unifies all the APIs and abstractions billing and platform engineers had to build and maintain in-house otherwise. Acting as your centralized source of truth, with a highly scalable and flexible entitlements management, rolling out any pricing and packaging change is now a self-service, risk-free, exercise.
    Learn More
  • 1
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    ...Trained on a large 1.8-million-hour bilingual corpus, VoxCPM can infer appropriate speaking style from context, dynamically adjusting intonation, rhythm, and emotional tone. It supports zero-shot voice cloning from a short reference audio clip, capturing timbre, accent, and pacing to closely mimic a target speaker without per-speaker fine-tuning.
    Downloads: 58 This Week
    Last Update:
    See Project
  • 3
    LuxTTS

    LuxTTS

    A high-quality rapid TTS voice cloning model

    ...It implements a lightweight architecture based on ZipVoice and optimized sampling techniques so that it can generate speech at speeds up to roughly 150 times real-time on a single GPU and faster than real-time on CPU, all while producing audio at high fidelity with 48 kHz quality. The project supports zero-shot voice cloning, meaning it can adapt to a reference speaker’s voice with minimal example data, enabling realistic and personalized synthetic speech. Intended for developers, hobbyists, and creators, the repository includes installation instructions, usage examples, and Python APIs that make it feasible to integrate the model in local workflows, web demos, or production systems. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice character even for unseen speakers. The system introduces a multi-reward reinforcement learning framework that jointly optimizes for voice similarity, emotional expressiveness, pronunciation, and intelligibility, yielding output that can rival commercial options in naturalness and expressiveness. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Enterprise Job Scheduling Software Icon
    Enterprise Job Scheduling Software

    Unify Enterprise Job Scheduling for Scale, Visibility, and Control

    Managing your sprawling data center and cloud with disparate native schedulers creates chaos. Achieve unparalleled control and efficiency over your entire IT environment with JAMS job orchestration tools. JAMS provides the singular, centralized platform required to overcome the complexities of disparate native schedulers. Automate, secure, and govern all your workloads, eliminating fragmented control, compliance risks, and operational bottlenecks. JAMS streamlines operations and ensures audit-ready history, transforming your enterprise automation with confidence and precision.
    Learn More
  • 5
    IMS Toucan

    IMS Toucan

    Controllable and fast Text-to-Speech for over 7000 languages

    ...The toolkit focuses on being fast and controllable while not requiring huge amounts of compute, making it practical for research labs and smaller teams. It includes complete pipelines for preprocessing datasets, training models, and running inference, plus a storage configuration system to manage where models and caches are stored. IMS-Toucan ships with several ready-to-run scripts, including GUIs for interactive demos, prosody override tools, zero-shot language embedding injection, and text-to-audio file generation. Pretrained models are automatically downloaded when needed, and there is an online demo instance hosted on GPU that anyone can try.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    OpenVoice

    OpenVoice

    Instant voice cloning by MIT and MyShell. Audio foundation model

    ...It is designed not only to match the timbre of the reference voice, but also to give granular control over style parameters such as emotion, accent, rhythm, pauses, and intonation. The model supports cross-lingual and even zero-shot cross-lingual voice cloning, so a speaker recorded in one language can be made to speak naturally in others. Architecturally, OpenVoice separates “tone color” cloning from style control, which makes it easier to keep a consistent identity while flexibly changing prosody or language. The project provides open-weight models, inference code, and examples, making it suitable both for research and for building production voice experiences. ...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 7
    Sopro TTS

    Sopro TTS

    A lightweight text-to-speech model with zero-shot voice cloning

    Sopro TTS is an open-source text-to-speech (TTS) project that implements a lightweight model capable of producing speech from text with zero-shot voice cloning, meaning it can mimic a speaker’s voice from only a few seconds of reference audio. Built with a 169 million-parameter architecture that uses dilated convolutions and cross-attention layers instead of large Transformer stacks, it achieves relatively fast real-time performance even on CPUs (about a 0.25 real-time factor measured on an M3 base). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Spark TTS

    Spark TTS

    Spark-TTS Inference Code

    ...It uses an efficient single-stream architecture where speech tokens are directly reconstructed from the predictions of an LLM, removing the need for external acoustic models or complex vocoders and making the generation pipeline cleaner and faster. The project supports zero-shot voice cloning, meaning it can imitate a new speaker’s voice without dedicated training for that specific voice, and works across languages, including English and Chinese, even in cross-lingual code-switching scenarios. Spark-TTS allows users to control speech characteristics like gender, pitch, and speaking rate to customize synthesized output and support virtual speaker creation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    OmniVoice

    OmniVoice

    High-Quality Voice Cloning TTS for 600+ Languages

    ...Built on a diffusion language model-style architecture, it combines scalability with strong performance, enabling both natural-sounding voice synthesis and efficient inference speeds. One of its most notable capabilities is zero-shot voice cloning, allowing users to replicate a speaker’s voice using only a short reference audio clip. In addition, it supports voice design through configurable attributes such as gender, accent, pitch, and speaking style, giving users fine-grained control over generated speech. The system also includes advanced features like non-verbal expression tags and pronunciation overrides, enabling expressive and precise output. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Ditto Edge Server is a lightweight standalone server for resource-constrained edge environments, based on the core Ditto Edge SDK. Icon
    Ditto Edge Server is a lightweight standalone server for resource-constrained edge environments, based on the core Ditto Edge SDK.

    With Ditto Edge Server, you can join devices as small as a Raspberry Pi to a local mesh network and synchronize data across edge environments.

    Ditto's Edge SDK is the only thing your edge devices need to ensure your application is operational in any environment, regardless of network conditions.
    Learn More
  • 10
    Voice-Pro

    Voice-Pro

    Comprehensive Gradio WebUI for audio processing

    Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.
    Downloads: 36 This Week
    Last Update:
    See Project
  • 11
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 12
    CosyVoice

    CosyVoice

    Multi-lingual large voice generation model, providing inference

    ...The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech across languages and in code-switching contexts. CosyVoice 2.0 significantly improves on version 1.0 by boosting accuracy, stability, speed, and overall speech quality, making it more suitable for production environments. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    ...It is designed to produce human-like speech with natural intonation, emotion, and rhythm, targeting quality comparable to or better than many closed-source systems. The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research preview, and includes data-processing scripts so users can train or finetune their own variants. Inference is provided through a Python package that uses vLLM under the hood for high-throughput, low-latency generation, including streaming examples that show how to generate audio chunks in real time. The maintainers provide Colab notebooks, a standardized prompting format, and one-click deployment via Baseten for production-grade, FP8/FP16 optimized inference with ~200 ms streaming latency.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Fish Speech

    Fish Speech

    SOTA Open Source TTS

    Fish Speech is a state-of-the-art open-source text-to-speech project that has evolved into the OpenAudio series of advanced TTS models. The repository hosts the code and tooling for training, fine-tuning, and serving high-quality TTS, while the current flagship models (OpenAudio-S1 and S1-mini) are distributed via Fish Audio’s playground and Hugging Face. The models are evaluated with Seed TTS metrics and achieve exceptionally low word and character error rates, indicating strong...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 15
    gTTS

    gTTS

    Python library and CLI tool to interface with Google Translate

    ...A small CLI utility, gtts-cli, makes it easy to test or batch-generate MP3 files right from the shell.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    ...It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common formats like MP3 or WAV. It also supports generating subtitle files (such as SRT or VTT) alongside the speech, which is handy for video narration, e-learning, or accessibility workflows. From the CLI you can adjust parameters such as speaking rate, volume, and pitch, giving you some control over prosody without diving into SSML. ...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 17
    kokoro-onnx

    kokoro-onnx

    TTS with kokoro and onnx runtime

    ...It focuses on running efficiently on commodity hardware, including macOS with Apple Silicon, while still delivering near real-time performance for many use cases. The project ships prebuilt model files and a simple example script, so you can go from installation to producing an audio.wav file in just a few steps. It supports multiple languages and voices, with a curated voice list and configuration via a VOICES file hosted alongside the models. The package is distributed on PyPI, meaning you can integrate it directly into applications or scripts using standard Python tooling. ...
    Downloads: 142 This Week
    Last Update:
    See Project
  • 18
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    ...The library exposes a simple but flexible API for controlling voice selection, speaking rate, volume, and other synthesis parameters from Python code. It supports both a high-level speak convenience function and a lower-level engine object with event hooks, queuing, and saving output to audio files. The repository includes examples and documentation that show how to adjust properties dynamically, persist synthesized output, and integrate pyttsx3 into GUIs or background services.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 19
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    ...The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and network streams such as RTSP and HLS, making it flexible for live events, monitoring, or accessibility workflows. Configuration options let you control the number of clients, maximum connection time, and threading behavior so the server can be tuned for different deployment environments. On the client side, you can set the language, whether to translate into English, model size, voice activity detection, and output recording behavior.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 20
    Voicebox

    Voicebox

    The open-source voice synthesis studio powered by Qwen3-TTS

    Voicebox is a local-first voice synthesis studio that aims to bring professional, DAW-like voice generation workflows to a desktop app while keeping models and voice data entirely on your machine. It positions itself as an open-source alternative to cloud voice platforms by emphasizing privacy, offline use, and freedom from subscriptions or usage caps. The tool supports downloading voice models, cloning voices from short audio samples, and generating speech locally, then organizing the results using studio-oriented editing concepts. ...
    Downloads: 68 This Week
    Last Update:
    See Project
  • 21
    Auto Synced & Translated Dubs

    Auto Synced & Translated Dubs

    Automatically translates the text of a video based on a subtitle file

    ...The tool then time-stretches or compresses each TTS clip to match the original speech duration exactly, which preserves lip-sync and rhythm as closely as possible without manual editing. Finally, it combines all the clips into a single dubbed audio track that can be muxed with the original video, along with new translated subtitle files.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    ...Developers work with an agent abstraction that connects video edge providers, LLMs, and processors into pipelines, making it easier to orchestrate tasks like object detection, pose estimation, and conversational guidance. The project includes SDKs for React, Android, iOS, Flutter, React Native, and Unity, enabling integration into a wide variety of client environments such as mobile apps, web apps, and games.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Chatterbox

    Chatterbox

    SoTA open-source TTS

    Chatterbox is Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs and is consistently preferred in side-by-side evaluations. Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support emotion exaggeration control, a powerful feature that makes your voices stand out. Try it now on our...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 24
    ChatTTS

    ChatTTS

    A generative speech model for daily dialogue

    ChatTTS is an open-source conversational text-to-speech model optimized for dialogue, developed by 2Noise. Trained on 100,000+ hours of English and Chinese conversation data, it excels at generating expressive prosody—pauses, interjections, laughter—for more natural-sounding speech synthesis in assistant and chatbot applications.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes. This combination allows researchers to leverage modern neural architectures while still benefiting from the robust data preparation practices developed in the speech community. ESPnet provides many ready-to-run recipes for popular academic benchmarks, making it straightforward to reproduce published results or serve as baselines for new research. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB