OCRmyPDF adds an OCR text layer to scanned PDF files
Uncover insights, surface problems, monitor, and fine tune your LLM
A Simple and Universal Swarm Intelligence Engine
Benchmarking synthetic data generation methods
ComfyUI wrapper nodes for WanVideo and related models
Conditional GAN for generating synthetic tabular data
AI coding assistant skill (Claude Code, Codex, OpenCode, OpenClaw)
Cloud-native open source data warehouse for analytics and AI queries
ExtractThinker is a Document Intelligence library for LLMs
Training data (data labeling, annotation, workflow) for all data types
Video-based AI memory library. Store millions of text chunks in MP4
1 min voice data can also be used to train a good TTS model
AI-data warehouse to enrich, transform and analyze unstructured data
AI multi-agent platform for automated code security auditing system
Detecting silent model failure. NannyML estimates performance
Claude Code skill for generating production-quality SVG+PNG technical
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Inference script for Oasis 500M
AutoGluon: AutoML for Image, Text, and Tabular Data
A reactive notebook for Python
Effortless data labeling with AI support from Segment Anything
Anomaly detection related books, papers, videos, and toolboxes
The standard data-centric AI package for data quality and ML
Instill Core is a full-stack AI infrastructure tool for data