Analyze computation-communication overlap in V3/R1
Industrial-level controllable zero-shot text-to-speech system
A Powerful Native Multimodal Model for Image Generation
Language modeling in a sentence representation space
CLIP, Predict the most relevant text snippet given an image
Implementation of "MobileCLIP" CVPR 2024
Controllable & emotion-expressive zero-shot TTS
Sharp Monocular Metric Depth in Less Than a Second
ICLR2024 Spotlight: curation/training code, metadata, distribution
Powerful AI language model (MoE) optimized for efficiency/performance
Open-source, high-performance AI model with advanced reasoning
Easy Docker setup for Stable Diffusion with user-friendly UI
Pretrained time-series foundation model developed by Google Research
From Vibe Coding to Agentic Engineering
Code for running inference and finetuning with SAM 3 model
Open-source deep-learning framework
Official DeiT repository
Production-tested AI infrastructure tools
Open-Source Financial Large Language Models
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Strong, Economical, and Efficient Mixture-of-Experts Language Model
Foundation Models for Time Series
Scaling Reinforcement Learning with LLMs
PyTorch implementation of JiT
RGBD video generation model conditioned on camera input