Insanely Fast Whisper is a high-performance command-line tool designed to dramatically accelerate speech-to-text transcription using OpenAI’s Whisper models on local hardware. It leverages modern optimizations such as batch processing, mixed precision, and advanced attention mechanisms like Flash Attention to significantly reduce inference time while maintaining high transcription accuracy. The project is built on top of the Transformers ecosystem and integrates with libraries such as Optimum to maximize GPU efficiency. It is specifically engineered for environments with CUDA-enabled GPUs or Apple Silicon devices, allowing users to process hours of audio in minutes or even seconds depending on hardware capabilities. The tool provides a streamlined CLI interface, making it easy to run transcription tasks on local files or URLs without needing to write custom code. It supports multiple Whisper model variants, including distilled versions for faster inference with minimal accuracy loss.
Features
- Blazing-fast transcription using GPU acceleration and batching
- Support for multiple Whisper models including large and distilled variants
- Command-line interface for easy local or remote file processing
- Integration with Transformers and Optimum for performance optimization
- Optional Flash Attention support for further speed improvements
- Compatibility with CUDA GPUs and Apple Silicon devices