Faster-whisper

fast-whisper 是使用 CTranslate2 重新实现 OpenAI 的 Whisper 模型，CTranslate2 是 Transformer 模型的快速推理引擎。此实现速度比 openai/whisper 快 4 倍，并且使用更少的内存，但具有相同的精度。通过 CPU 和 GPU 上的 8 位量化，可以进一步提高效率。

用法

from faster_whisper import WhisperModel

model_size = "large-v3"

# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio.mp3", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

注意：segments 是一个生成器，因此只有在迭代它时才会开始转录。可以通过将片段收集到列表或 for 循环中来运行完成：

segments, _ = model.transcribe("audio.mp3")
segments = list(segments)  # The transcription will actually run here.

Links

https://github.com/SYSTRAN/faster-whisper
https://github.com/openai/whisper