Towards Real-World Vision-Language Understanding
Multimodal Diffusion with Representation Alignment
Industrial-level controllable zero-shot text-to-speech system
Sharp Monocular Metric Depth in Less Than a Second
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Qwen3-ASR is an open-source series of ASR models
Foundation model for image generation
Block Diffusion for Ultra-Fast Speculative Decoding
Official implementation of Watermark Anything with Localized Messages
A Unified Framework for Text-to-3D and Image-to-3D Generation
Open-source framework for intelligent speech interaction
ChatGPT interface with better UI
DeepSeek Coder: Let the Code Write Itself
Qwen2.5-VL is the multimodal large language model series
Tiny vision language model
Repo for SeedVR2 & SeedVR
Inference code for scalable emulation of protein equilibrium ensembles
Programmatic access to the AlphaGenome model
Easy Docker setup for Stable Diffusion with user-friendly UI
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Open-source large language model family from Tencent Hunyuan
Generate Any 3D Scene in Seconds
The Clay Foundation Model - An open source AI model and interface
GLM-4 series: Open Multilingual Multimodal Chat LMs
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training