A Pragmatic VLA Foundation Model
LTX-Video Support for ComfyUI
ICLR2024 Spotlight: curation/training code, metadata, distribution
Memory-efficient and performant finetuning of Mistral's models
Pokee Deep Research Model Open Source Repo
Tooling for the Common Objects In 3D dataset
Qwen2.5-VL is the multimodal large language model series
PyTorch code and models for the DINOv2 self-supervised learning
Tool for exploring and debugging transformer model behaviors
CLIP, Predict the most relevant text snippet given an image
HY-Motion model for 3D character animation generation
OCR expert VLM powered by Hunyuan's native multimodal architecture
Open Source Speech Language Model
Collection of Gemma 3 variants that are trained for performance
Implementation of "MobileCLIP" CVPR 2024
Pretrained time-series foundation model developed by Google Research
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
The ChatGPT Retrieval Plugin lets you easily find personal documents
Release for Improved Denoising Diffusion Probabilistic Models
StudioOllamaUI is a local, portable interface for Ollama
Open Multilingual Multimodal Chat LMs
Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project
Official repo for consistency models