Comment by trenchpilgrim

6 months ago

Whisper has quite bad issues with hallucination. It will inject sentences that were never said in the audio.

It's decent for classification but poor at transcription.

Pre-processing with a vocal extraction model (bs-rofomer or similar) helps a lot with the hallucinations, especially with poor quality sources.