Comment by trenchpilgrim
6 months ago
Whisper has quite bad issues with hallucination. It will inject sentences that were never said in the audio.
It's decent for classification but poor at transcription.
6 months ago
Whisper has quite bad issues with hallucination. It will inject sentences that were never said in the audio.
It's decent for classification but poor at transcription.
Pre-processing with a vocal extraction model (bs-rofomer or similar) helps a lot with the hallucinations, especially with poor quality sources.
I'm working with fairly "clean" audio (voice only) and still see ridiculous hallucinations.