Qwen2.5-VL is the multimodal large language model series
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Open-Source Financial Large Language Models
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Foundation Models for Time Series
AlphaFold 3 inference pipeline
Open-source framework for intelligent speech interaction
LLM-based Reinforcement Learning audio edit model
RGBD video generation model conditioned on camera input
An experimental version of DeepSeek model
Reference PyTorch implementation and models for DINOv3
Towards Real-World Vision-Language Understanding
FAIR Sequence Modeling Toolkit 2
Global weather forecasting model using graph neural networks and JAX
A trainable PyTorch reproduction of AlphaFold 3
An AI-powered security review GitHub Action using Claude
The ChatGPT Retrieval Plugin lets you easily find personal documents
Qwen3-Coder is the code version of Qwen3
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
A Pragmatic VLA Foundation Model
Models for object and human mesh reconstruction
GLM-4 series: Open Multilingual Multimodal Chat LMs
Contexts Optical Compression
Memory-efficient and performant finetuning of Mistral's models