Name: faster-whisper
Author: cmdop

Overview

Local speech-to-text using faster-whisper, a CTranslate2 reimplementation of OpenAI’s Whisper, for fast and accurate transcription with GPU acceleration.

Key Features

Transcribe audio/video files
Generate subtitles (SRT, VTT, ASS, LRC, TTML)
Identify speakers (diarization labels)
Transcribe from URLs (YouTube links and direct audio URLs)
Batch process files (glob patterns, directories, skip-existing support)
Convert speech to text locally (no API costs, works offline)
Translate to English
Do multilingual transcription (supports 99+ languages with auto-detection)
Transcribe a batch of files in different languages
Transcribe multilingual audio
Transcribe audio with specific terms
Preprocess noisy audio (before transcription)
Stream output
Clip time ranges
Search the transcript
Detect chapters
Export speaker audio
Spreadsheet output

How It Works

Use the faster-whisper skill to transcribe audio/video files, generate subtitles, and more. The skill uses the faster-whisper model, which runs 4-6x faster than OpenAI Whisper with identical accuracy. With GPU acceleration, expect ~20x realtime transcription.

Use Cases

Transcribe a meeting or interview
Generate subtitles for a YouTube video
Identify speakers in a podcast
Transcribe a batch of files in different languages
Transcribe multilingual audio
Preprocess noisy audio before transcription
Stream output for real-time transcription
Clip time ranges for specific sections
Search the transcript for specific terms
Detect chapters for a table of contents
Export speaker audio for separate WAV files
Spreadsheet output for CSV or spreadsheet format

faster-whisper

Overview

Key Features

How It Works

Use Cases

Avaliações