Comment by zxvkhkxvdvbdxz
4 days ago
Interesting. Being able to use regexps for text processing through my career has probably saved me a few thousand hours of programming one-off solutions so far. It is one of those skills that really pays off to learn proper.
And speaking of ffmpeg, or tooling in general, I tend to make notes. After a while you end up with a pretty decent curated reference.
I use regexes a lot. The main thing that always trips me up is dealing with escaping, because different tools I use – vim, sed, rg, and so on – sometimes have different meanings for when to escape or not.
In one tool you’ll use + to match one or more times, and \+ to mean literal plus sign.
In another tool you’ll use \+ to match one or more time, and + to mean literal plus sign.
In one tool you’ll use ( and ) to create a match group, and \( and \) to mean literal open and close parentheses.
In another tool you’ll use \( and \) to create a match group, and ( and ) to mean literal open and close parentheses.
This is basically the only problem I have when writing regexes, for the kinds of regexes I write.
Also, one thing that’s not a problem per se but something that leads me to write my regexes with more characters than strictly necessary is that I rarely use shorthand for groups of characters. For example the tool might have a shorthand for digit but I always write [0-9] when I need to match a digit. Also probably because the shorthand might or might not be different for different tools.
Regexes are also known to be “write once read never”, in that writing a regex is relatively easy, but revisiting a semi-complicated regex you or someone else wrote in the past takes a little bit of extra effort to figure out what it’s matching and what edits one should make to it. In this case, tools like https://regex101.com/ or https://www.debuggex.com/ help a lot.
The problem with escaping (like with using quotes) is often that you need to know through how many parsers the string goes. The shell or editor, the language you are programming in and the regexp engine each time can strip off an escape character or a set of outer quotes. That and of course different dialects of regexp makes things complicated.
This is like saying that words have different meaning if you talk to someone in english or french. Most tools have a switch to inform them that you're going to talk in english (perl-compatible regular expressions).
No one doubts the power or utility of regexes or ffmpeg, but they are both complicated beasts that really take a lot of skill.
They're both tools where if they're part of your daily workflow you'll get immense value out of learning them thoroughly. If instead you need a regex once or twice a week, the benefit is not greater than the cost of learning to do it myself. I have a hundred other equally complicated things to learn and remember, half the job of the computer is to know things I can't put in my brain. If it can do the regex for me, I suddenly get 70% of the value at no cost.
Regex is not a tool I need often enough to justify the hours and brain space. But it is still an indespensible tool. So when I need a regex, I either ask a human wizard I know, or now I ask my computer directly.
It's self-reinforcing though. If you invest the time to learn, then you may find yourself (i a beafutiful house :)) using it a lot more than two times a week.
I think this way of thinking is an uphill battle.
My kid uses wifi, google classroom tools, youtube, games,... I can tell him if only you knew command.com, ipconfig, doom.wad formats, lateg,... you could be so much more proficient. I already know this will never happen, just like I never learned x86 assembly.
The same goes for tools like LLMs, once you are used to them, your knowledge shifts.
1 reply →