Analyze computation-communication overlap in V3/R1
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Industrial-level controllable zero-shot text-to-speech system
A Powerful Native Multimodal Model for Image Generation
Generating Immersive, Explorable, and Interactive 3D Worlds
Language modeling in a sentence representation space
CLIP, Predict the most relevant text snippet given an image
Implementation of "MobileCLIP" CVPR 2024
Controllable & emotion-expressive zero-shot TTS
Revolutionizing Database Interactions with Private LLM Technology
HY-Motion model for 3D character animation generation
Sharp Monocular Metric Depth in Less Than a Second
Powerful AI language model (MoE) optimized for efficiency/performance
ICLR2024 Spotlight: curation/training code, metadata, distribution
Open-source, high-performance AI model with advanced reasoning
Lets make video diffusion practical
State-of-the-art TTS model under 25MB
Easy Docker setup for Stable Diffusion with user-friendly UI
Pretrained time-series foundation model developed by Google Research
Wan2.2: Open and Advanced Large-Scale Video Generative Model
From Vibe Coding to Agentic Engineering
Code for running inference and finetuning with SAM 3 model
Open-source deep-learning framework
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming