Analyze computation-communication overlap in V3/R1
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Industrial-level controllable zero-shot text-to-speech system
Generating Immersive, Explorable, and Interactive 3D Worlds
A Powerful Native Multimodal Model for Image Generation
Language modeling in a sentence representation space
CLIP, Predict the most relevant text snippet given an image
Implementation of "MobileCLIP" CVPR 2024
Controllable & emotion-expressive zero-shot TTS
Revolutionizing Database Interactions with Private LLM Technology
HY-Motion model for 3D character animation generation
Sharp Monocular Metric Depth in Less Than a Second
ICLR2024 Spotlight: curation/training code, metadata, distribution
Powerful AI language model (MoE) optimized for efficiency/performance
State-of-the-art TTS model under 25MB
Lets make video diffusion practical
Open-source, high-performance AI model with advanced reasoning
Easy Docker setup for Stable Diffusion with user-friendly UI
Pretrained time-series foundation model developed by Google Research
Wan2.2: Open and Advanced Large-Scale Video Generative Model
From Vibe Coding to Agentic Engineering
The Clay Foundation Model - An open source AI model and interface
Code for running inference and finetuning with SAM 3 model
Uncommon Objects in 3D dataset
Tongyi Deep Research, the Leading Open-source Deep Research Agent