TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
SOTA Open Source TTS
Multimodal Diffusion with Representation Alignment
RGBD video generation model conditioned on camera input
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Stable Diffusion web UI
Agent framework and applications built upon Qwen>=3.0
ChatGLM2-6B: An Open Bilingual Chat LLM
Open source template for AI-powered code generation apps w/ sandboxes
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Private chat with local GPT with document, images, video, etc.
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
GPT4V-level open-source multi-modal model based on Llama3-8B
A state-of-the-art open visual language model
Visual Instruction Tuning: Large Language-and-Vision Assistant
A colab gradio web UI for running Large Language Models
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
A webui for different audio related Neural Networks
Lightweight Stable Diffusion v 2.1 web UI: txt2img, img2img, depth2img
Hebrew text generation models based on EleutherAI's gpt-neo