Comment by trenchpilgrim
10 days ago
Whisper has quite bad issues with hallucination. It will inject sentences that were never said in the audio.
It's decent for classification but poor at transcription.
10 days ago
Whisper has quite bad issues with hallucination. It will inject sentences that were never said in the audio.
It's decent for classification but poor at transcription.
Pre-processing with a vocal extraction model (bs-rofomer or similar) helps a lot with the hallucinations, especially with poor quality sources.
I'm working with fairly "clean" audio (voice only) and still see ridiculous hallucinations.