MII makes low-latency and high-throughput inference possible
State-of-the-art Parameter-Efficient Fine-Tuning
Multilingual Automatic Speech Recognition with word-level timestamps
A high-performance ML model serving framework, offers dynamic batching
Large Language Model Text Generation Inference
Replace OpenAI GPT with another LLM in your app
20+ high-performance LLMs with recipes to pretrain, finetune at scale
GPU environment management and cluster orchestration
A set of Docker images for training and serving models in TensorFlow
A Pythonic framework to simplify AI service building
Library for OCR-related tasks powered by Deep Learning
Pytorch domain library for recommendation systems
PyTorch extensions for fast R&D prototyping and Kaggle farming
Lightweight Python library for adding real-time multi-object tracking
Low-latency REST API for serving text-embeddings
Standardized Serverless ML Inference Platform on Kubernetes
Simplifies the local serving of AI models from any source
PyTorch library of curated Transformer models and their components
Libraries for applying sparsification recipes to neural networks
Library for serving Transformers models on Amazon SageMaker
A library for accelerating Transformer models on NVIDIA GPUs
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Neural Network Compression Framework for enhanced OpenVINO
Openai style api for open large language models
Sparsity-aware deep learning inference runtime for CPUs