Comment by sandreas
12 hours ago
Thanks, I'm going to read through the link. I also found some python libs, that do this, so since I need to run Whisper on the backend to transfer the speech to text, I think it would be suitable to use python also for tokenization - maybe spaCy (https://www.geeksforgeeks.org/tokenization-using-spacy-libra...).
Very less tramatic programming exercise than using awk. :-) aka realistic programming tool(s) for required task.