Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Multimodal model achieving SOTA performance
A Systematic Framework for Interactive World Modeling
RGBD video generation model conditioned on camera input
Provides convenient access to the Anthropic REST API from any Python 3
A Unified Framework for Text-to-3D and Image-to-3D Generation
Easy Docker setup for Stable Diffusion with user-friendly UI
Inference script for Oasis 500M
Foundational Models for State-of-the-Art Speech and Text Translation
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
The ChatGPT Retrieval Plugin lets you easily find personal documents
Safety reasoning models built-upon gpt-oss
Open source large language model by Alibaba
Detect faces in an image
Open Multilingual Multimodal Chat LMs
Let us control diffusion models
Custom BLEURT model for evaluating text similarity using PyTorch
Reasoning-powered OCR VLM for converting complex documents to Markdown
Hermes 4 FP8: hybrid reasoning Llama-3.1-405B model by Nous Research
Dia-1.6B generates lifelike English dialogue and vocal expressions
Multimodal 7B model for image, video, and text understanding tasks
685B model with improved agents and consistency