Comment by trenchpilgrim

10 days ago

Whisper has quite bad issues with hallucination. It will inject sentences that were never said in the audio.

It's decent for classification but poor at transcription.

Pre-processing with a vocal extraction model (bs-rofomer or similar) helps a lot with the hallucinations, especially with poor quality sources.