Search Results for "content based audio retrievel"

Sort By:

Showing 156 open source projects for "content based audio retrievel"

View related business solutions

Best Visitor Management System
Instantly Notify Staff Of Deliveries And Guest Arrivals To Increase Your Efficiency

<p class="mb-4">Do stacks of paperwork pile up at the front desk area? Or are your receptionists constantly filing reports, guest log-in information and NDAs – taking them away from other important tasks? Not anymore! Our Visitor Management System automates all these processes, streamlining your workflow. Guests can complete inductions, sign NDAs, fill in their contact details and much more using the easy software. These records are then automatically filed and stored, making life easy for receptionists and the HR team. Claim your FREE 7-day trial and experience how VisitUs can transform your workplace!</p>

Try it Free
The most user-friendly sales commission software for revenue-focused teams
Everstage is a trusted ICM for public companies and enterprises worldwide-across industries

Rated as #1 sales compensation management software, Everstage helps businesses streamline commission administration, boost sales performance and improve ROI with actionable insights. Top features: No-code plan designer, detailed commission statements, advanced commission forecasting, quota management, queries & approval workflows, deferred commissions (ASC606), BI-powered reporting, and more.

Learn More
1

Step-Audio-EditX

LLM-based Reinforcement Learning audio edit model

Step-Audio-EditX is an open-source, 3 billion-parameter audio model from StepFun AI designed to make expressive and precise editing of speech and audio as easy as text editing. Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level...

Downloads: 3 This Week

Last Update: 2026-04-09
See Project
2

Podcastfy.ai

Transforming Multimodal Content into Captivating Multilingual Audio

Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling customization and scale.

Downloads: 10 This Week

Last Update: 2024-11-16
See Project
3

Step-Audio 2

Multi-modal large language model designed for audio understanding

Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. ...

Downloads: 0 This Week

Last Update: 2026-03-16
See Project
4

BlogWizard

Generate blog articles from video or audio

BlogWizard is a demo/utility project built on top of Groq’s LLM infrastructure that converts video or audio content into well-structured blog posts, enabling creators to repurpose multimedia content into text — useful for SEO, accessibility, or reaching audiences that prefer reading. The tool uses transcription (e.g. via Whisper) to extract text from audio/video, then runs an LLM-based generation pipeline to transform that content into coherent, readable blog-format posts — with sections, formatting, and possibly metadata. ...

Downloads: 0 This Week

Last Update: 2025-12-19
See Project
Network Performance Monitoring | Statseeker
Statseeker is a powerful network performance monitoring solution for businesses

Using just a single server or virtual machine, Statseeker can be up and running within minutes, and discovering your entire network in less than an hour, without any significant effect on your bandwidth availability.

Learn More
5

AI-Media2Doc

AI tool converting video/audio into structured documents instantly

AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not uploaded externally. ...

Downloads: 8 This Week

Last Update: 2026-03-18
See Project
6

Markdownify MCP Server

Convert files and web content into clean, usable Markdown easily

Markdownify MCP is a Model Context Protocol server that converts many types of files and web content into clean Markdown. It supports formats such as PDFs, images, audio with transcription, DOCX, XLSX, and PPTX, along with web sources like YouTube transcripts, Bing results, and general webpages. Markdownify MCP is designed to simplify content extraction and make data easier to read, share, and reuse in structured workflows. Developers can install dependencies, build, and run the server locally, then extend functionality by modifying its TypeScript-based tools and server logic. ...

Downloads: 9 This Week

Last Update: 2026-04-09
See Project
7

Navidrome

Your Personal Streaming Service

...Navidrome also implements the Subsonic API, making it compatible with many third-party players and apps across different platforms. It automatically monitors and indexes your library for new content, supports on-the-fly transcoding to adapt audio streams to different network conditions.

Downloads: 97 This Week

Last Update: 4 days ago
See Project
8

Unrud Video Downloader

Download videos from websites like YouTube and many others

Video Downloader is a desktop application designed to simplify the process of downloading videos from various online platforms through a user-friendly graphical interface. Built on top of yt-dlp, it abstracts the complexity of command-line tools and provides an accessible way for users to retrieve video and audio content. The application supports a wide range of features, including downloading entire playlists, handling private or password-protected content, and automatically selecting optimal formats based on user preferences. It also allows users to convert videos into audio files such as MP3, making it useful for media extraction workflows. ...

Downloads: 23 This Week

Last Update: 7 days ago
See Project
9

FineTune

FineTune, a macOS menu bar app to control volume for each app

...Through a clean, minimal interface accessible from the menu bar, FineTune lets users isolate and balance application volumes, assign specific outputs (like headphones versus speakers), and tweak equalization to enhance or tailor audio based on content or personal preference. Its integration into the OS workflow means that these adjustments persist across sessions and respect the user’s choices without requiring constant interaction with deeper system settings.

Downloads: 56 This Week

Last Update: 2026-04-08
See Project
Monitoring, Securing, Optimizing 3rd party scripts
For developers looking for a solution to monitor, script, and optimize 3rd party scripts

c/side is crawling many sites to get ahead of new attacks. c/side is the only fully autonomous detection tool for assessing 3rd party scripts. We do not rely purely on threat feed intel or easy to circumvent detections. We also use historical context and AI to review the payload and behavior of scripts.

Learn More
10

Bili23 Downloader

Cross platform GUI tool for downloading videos from Bilibili sites

...It can parse different types of links such as standard video pages, short links, and collection or activity pages to automatically retrieve downloadable media. It also allows users to choose video resolution, audio quality, and encoding format based on the available sources. Additional features include downloading subtitles, comments, metadata, and artwork associated with videos.

Downloads: 27 This Week

Last Update: 2026-04-07
See Project
11

AudioMuse-AI

AudioMuse-AI is an Open Source Dockerized environment

AudioMuse-AI is an open-source system designed to automatically generate playlists and analyze music libraries using artificial intelligence and audio signal processing techniques. The platform runs locally in a Dockerized environment and performs detailed sonic analysis on audio files to understand characteristics such as tempo, mood, and acoustic similarity. By analyzing the underlying audio content rather than relying on external metadata services, the system can organize large personal...

Downloads: 4 This Week

Last Update: 2026-04-06
See Project
12

YouTube Playlist Downloader

A tool to download whole playlists, channels or single videos

YoutubePlaylistDownloader is a desktop-based utility designed to simplify the process of downloading entire YouTube playlists with minimal user interaction. The tool allows users to input a playlist URL and automatically retrieve all associated videos, handling the sequence and download process in a structured way. It supports multiple output formats and quality settings, enabling users to choose between audio or video downloads depending on their needs.

Downloads: 312 This Week

Last Update: 2026-03-18
See Project
13

AI YouTube Shorts Generator

A python tool that uses GPT-4, FFmpeg, and OpenCV

AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies subtitle overlays, producing a polished short video without manual editing. ...

Downloads: 13 This Week

Last Update: 2026-02-05
See Project
14

CloudReader

A netease cloud music based UI

A netease cloud music-based UI, using wanandroid API development accord with Google Material Desgin reading class open-source projects. Kotlin && Netease cloud music Ui && Retrofit2 + RxJava2 + Room + MVVM-databinding && Wanandroid API. NetEase Cloud Music was officially released on April 23, 2013. It is an online music product that focuses on discovery and sharing and has a strong social use. I believe that everyone who has used it will know that the experience it gives is excellent. The...

Downloads: 1 This Week

Last Update: 2025-03-31
See Project
15

WhisperJAV

Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

...WhisperJAV introduces a specialized pipeline that separates text generation from timestamp alignment, allowing the system to generate transcripts and then align them with audio using forced alignment techniques. The framework supports several speech recognition models, including Qwen-based ASR systems and fine-tuned Whisper models trained on domain-specific dialogue.

Downloads: 22 This Week

Last Update: 7 days ago
See Project
16

Canvas LMS

The open LMS by Instructure, Inc.

Canvas LMS is a full-featured learning management system designed for K–12, higher-ed, and professional training, with a strong emphasis on usability and openness. Instructors build courses from modular content—pages, assignments, discussions, quizzes—and organize them into learning paths with prerequisites and due dates. Rich grading tools like SpeedGrader streamline assessment with rubrics, inline annotations, and audio/video feedback, while the gradebook supports weighting, outcomes, and late/missing policies. A robust API, standards like LTI/IMS Common Cartridge, and SIS integrations make it straightforward to connect Canvas with publisher content, analytics tools, proctoring, and institutional systems. ...

Downloads: 33 This Week

Last Update: 13 hours ago
See Project
17

Ultravox

Fast multimodal LLM for real-time voice interaction and AI apps

Ultravox is an open source multimodal large language model designed specifically for real-time voice-based interactions. It is built to process both text and spoken audio directly, eliminating the need for a separate speech recognition stage and enabling more seamless conversational experiences. Ultravox works by combining text prompts with encoded audio inputs, allowing it to understand spoken language alongside written instructions in a unified pipeline.

Downloads: 6 This Week

Last Update: 2026-03-18
See Project
18

ReClip

Download videos from almost any website

ReClip is a lightweight, self-hosted media downloader that provides a simple web-based interface for downloading videos and audio from a wide range of online platforms. Built around the yt-dlp engine, it supports over a thousand websites, including major platforms like YouTube, TikTok, and Instagram, allowing users to retrieve media content in various formats. The application emphasizes simplicity and minimalism, featuring a clean interface built with plain HTML, CSS, and JavaScript without requiring complex build steps or frameworks. ...

Downloads: 118 This Week

Last Update: 7 days ago
See Project
19

Whisper-WebUI

A Web UI for easy subtitle using whisper model

Whisper WebUI is an open-source browser-based interface that simplifies the use of Whisper speech recognition models by providing an intuitive graphical environment for transcription, translation, and subtitle generation. Built with Gradio, it allows users to upload audio or video files, process them locally, and generate accurate text outputs without relying on command-line tools.

Downloads: 19 This Week

Last Update: 2026-03-18
See Project
20

NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT

NExT-GPT is an open-source research framework that implements an advanced multimodal large language model capable of understanding and generating content across multiple modalities. Unlike traditional models that primarily handle text, NExT-GPT supports input and output combinations involving text, images, video, and audio in a unified architecture. The system connects a large language model with multimodal encoders and diffusion-based decoders so it can interpret information from different sensory formats and generate responses in different media types. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
21

SoniTranslate

Synchronized Translation for Videos

SoniTranslate is a video translation and dubbing system that produces synchronized target-language audio tracks for existing video content. It provides a web UI built with Gradio, allowing users to upload a video, choose source and target languages, and then run a pipeline that handles transcription, translation and re-synthesis of speech. Under the hood, it uses advanced speech and diarization models to separate speakers, align audio with timecodes and respect subtitle timing, which lets the generated dub track stay in sync with the original video structure. ...

Downloads: 27 This Week

Last Update: 2025-11-28
See Project
22

LatentSync

Taming Stable Diffusion for Lip Sync

LatentSync is an open-source framework from ByteDance that produces high-quality lip-synchronization for video by using an audio-conditioned latent diffusion model, bypassing traditional intermediate motion representations. In effect, given a source video (with masked or reference frames) and an audio track, LatentSync directly generates frames whose lip motions and expressions align with the audio, producing convincing talking-head or animated lip-sync output. The system leverages a U-Net...

Downloads: 1 This Week

Last Update: 2025-12-02
See Project
23

comfyui-mixlab-nodes

Workflow and speech recognition app

comfyui-mixlab-nodes is a large collection of custom nodes for ComfyUI that turns workflows into interactive apps and adds real-time multimedia, LLM, and TTS capabilities. It introduces a “Workflow-to-APP” concept, where a ComfyUI graph can be transformed into a Web App through an AppInfo node, complete with categories, batch prompts, and editable configurations. The project also brings Real-time Design features like screen capture and floating video nodes, enabling creative pipelines that...

Downloads: 9 This Week

Last Update: 2025-11-28
See Project
24

ChatTTS_colab

One-click deployment (including offline integration package)

ChatTTS_colab is a wrapper project around the ChatTTS model that focuses on “one-click” deployment, especially in Google Colab. It provides an integrated offline bundle and scripts for Windows and macOS so users can run ChatTTS locally without wrestling with complex environment setup. The repository includes Colab notebooks that launch a Gradio-based web UI and expose streaming TTS, making it possible to listen to generated audio as it is produced. A distinctive feature is the “voice gacha”...

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
25

AutoSubs

Instantly generate AI-powered subtitles on your device

...Users can customize subtitle styling, adjust timing, and export results in multiple formats, making it suitable for content creators, filmmakers, and editors. AutoSubs is designed with performance in mind, offering efficient processing through a Rust-based backend and supporting multiple operating systems including Windows, macOS, and Linux.

Downloads: 14 This Week

Last Update: 2026-03-18
See Project