Marrying Grounding DINO with Segment Anything & Stable Diffusion
Fast-stable-diffusion + DreamBooth
A Pragmatic VLA Foundation Model
Ultimate meta-skill for generating best-in-class Claude Code skills
End-to-end pipeline converting generative videos
Motion-controllable Video Generation via Latent Trajectory Guidance
A tool to use the Ai2 Open Coding Agents Soft-Verified Agents
Hunyuan Translation Model Version 1.5
Persistent context and multi-instance coordination
Multimodal embedding and reranking models built on Qwen3-VL
SimpleMem: Efficient Lifelong Memory for LLM Agents
A New Axis of Sparsity for Large Language Models
"Big Model" trains a visual multimodal VLM with 26M parameters
Collection of Gemma 3 variants that are trained for performance
Language Model Reinforcement Learning Environments frameworks
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
UI-TARS-desktop version that can operate on your local personal device
State-of-the-art (SoTA) text-to-video pre-trained model
OCR expert VLM powered by Hunyuan's native multimodal architecture
Collection of reference environments, offline reinforcement learning
Less Code, Lower Barrier, Faster Deployment
A simple, secure MCP-to-OpenAPI proxy server
Implementation of "MobileCLIP" CVPR 2024
Code release for Cut and Learn for Unsupervised Object Detection
Training Large Language Model to Reason in a Continuous Latent Space