← Back to context

Comment by hbn

4 days ago

I've enjoyed using ffmpeg 1000% more since I was able to stop doing manually the tedious task of Googling for Stack Overflow answers and cobbling them into a command and got Chat GPT to write me commands instead.

I use ffmpeg multiple times a week thanks to LLMs. It's my top use-case for my "llm cmd" tool:

  uv tool install llm
  llm install llm-cmd

  llm cmd use ffmpeg to extract audio from myfile.mov and save that as mp3

https://github.com/simonw/llm-cmd

  • I tried this (though with a different tool called aichat) for extremely simple stuff like just "convert this mov to mp4" and it generated overly complex commands that failed due to missing libraries. When I removed the "crap" from the commands, they worked.

    So much like code assistance, they still need a fair amount of baby sitting. A good boost for experienced operators but might suck for beginners.

    • Plus you need to know the format of your source file to design the command correctly. How many audio tracks, is the first video track a thumbnail or the video, are the subtitles tracks forced, etc.

      And in some situations ffmpeg has some warts you have to go around. Like they introduced recently a moronic change of behaviour where the first sub tracks becomes forced/default irrespective of the original forced/default flag of the source. You need to add "-default_mode infer_no_subs" to counter that.

      1 reply →

    • My feelings exactly, but I think that's OK!

      It's another tool and one that might actually improve with time. I don't see GNU's man pages getting any better spontaneously.

      Whoa, what if they started to use AI to auto-generate man pages...

      5 replies →

    • My experience exactly.

      I no longer check with these AI tools after a number of attempts. Unrelated, a friend thought there was a NFL football game last Saturday at noon. Checking with Google's Gemini, it said "no", but there was one between two teams whose season had ended two weeks before at 1:00 Eastern Time and 2:00 Central. (The times are backwards.)

      4 replies →

    • what exactly do you want the llm to do here? if the ask was so unambiguous and simple that it could be reliably generated, then the interface wouldn't be so complicated to use in the first place! LLMs are not in any way best suited for one-shot prompt => perfect output, and expectations to that effect are extremely unreasonable. the reason why LLMs are still hard for beginners to use is because the software is hard to use correctly. as with LLM output goes life itself: the results you get from using a tool can only ever be as good as the (mental) model used to choose that tool & the inputs to begin with. if all the information required to generate the output were contained by the initial prompt, then there would be absolutely no need to use the LLM at all in the first place.

    • Hate to be that guy, but which LLM was doing the generation? GPT-4 Turbo / Claude 3.x have not really let me down in generating ffmpeg commands - especially for basic requests - with most of their failures resulting from domain-specific vagaries that an expert would need to weigh in on m

      17 replies →

  • A while back I simply wrote my own bash function for this called `please`

    as in

        bash> please "use ffmpeg to extract audio from myfile.mov and save it as mp3"
    

    It will then courteously show you the command it wants to run before you agree to do it.

    Here is the whole thing, with its two dependent functions, so that people stop writing their own versions of this lol. All it needs is an OPENAI_API_KEY, feel free to modify for other LLMs

    EDIT: Moved to a gist: https://gist.github.com/pmarreck/9ce17f7996347dd532f3e20a2a3...

    Suggestions welcome- for example I want to add a feature that either just copies it (for further modification) or prepopulates the command line with it somehow (possibly for further modification, or even for skipping the approval step)

  • Did you just invent the LLM-equivalent of curl-piping unread shell scripts into sh?

    I am sure that will never cause any problems.

  • "The future is already here. It's just not very well distributed"

    (honestly, the work you share is very inspiring)

  • >This will then be displayed in your terminal ready for you to edit it, or hit <enter> to execute the prompt. If the command doesnt't look right, hit Ctrl+C to cancel.

    I appreciate the UI choice here. I have yet to do anything with AI (consciously and deliberately, anyway) but this sort of thing is exactly what I imagine as a proper use case.

    • Just like all other code. There will be user-respecting open source code and tools, and there's user-disrespecting profitable closed code that makes too many decisions for you.

  • You should figure out what went wrong for the other commenter and fix your tool.

  • While I love that that works, I still feel like just maybe ffmpeg needs a better interface. Not necessarily a GUI, just a better designed command line.

  • I think I’m finally sold on actually attempting to add some LLM to my toolbelt.

    As a helper and not a replacement, this sounds grand. Like the most epic autocomplete. Because I hate how much time I waste trying to figure out the command line incantation when I already know precisely what I want to do. It’s the weakest part of the command line experience.

For the longest time I had ffmpeg in the same bucket as regex: "God I really need to learn this but I'm going to hate it so much." Then ChatGPT came along and solved both problems!

  • Interesting. Being able to use regexps for text processing through my career has probably saved me a few thousand hours of programming one-off solutions so far. It is one of those skills that really pays off to learn proper.

    And speaking of ffmpeg, or tooling in general, I tend to make notes. After a while you end up with a pretty decent curated reference.

    • I use regexes a lot. The main thing that always trips me up is dealing with escaping, because different tools I use – vim, sed, rg, and so on – sometimes have different meanings for when to escape or not.

      In one tool you’ll use + to match one or more times, and \+ to mean literal plus sign.

      In another tool you’ll use \+ to match one or more time, and + to mean literal plus sign.

      In one tool you’ll use ( and ) to create a match group, and \( and \) to mean literal open and close parentheses.

      In another tool you’ll use \( and \) to create a match group, and ( and ) to mean literal open and close parentheses.

      This is basically the only problem I have when writing regexes, for the kinds of regexes I write.

      Also, one thing that’s not a problem per se but something that leads me to write my regexes with more characters than strictly necessary is that I rarely use shorthand for groups of characters. For example the tool might have a shorthand for digit but I always write [0-9] when I need to match a digit. Also probably because the shorthand might or might not be different for different tools.

      Regexes are also known to be “write once read never”, in that writing a regex is relatively easy, but revisiting a semi-complicated regex you or someone else wrote in the past takes a little bit of extra effort to figure out what it’s matching and what edits one should make to it. In this case, tools like https://regex101.com/ or https://www.debuggex.com/ help a lot.

      2 replies →

    • No one doubts the power or utility of regexes or ffmpeg, but they are both complicated beasts that really take a lot of skill.

      They're both tools where if they're part of your daily workflow you'll get immense value out of learning them thoroughly. If instead you need a regex once or twice a week, the benefit is not greater than the cost of learning to do it myself. I have a hundred other equally complicated things to learn and remember, half the job of the computer is to know things I can't put in my brain. If it can do the regex for me, I suddenly get 70% of the value at no cost.

      Regex is not a tool I need often enough to justify the hours and brain space. But it is still an indespensible tool. So when I need a regex, I either ask a human wizard I know, or now I ask my computer directly.

      4 replies →

  • For me, it wasn’t so much learning ffmpeg, as it was understanding containers/codecs/encoders/streams/etc. Learning all of the intricacies there made ffmpeg make a lot more sense.

    • Almost no one cares to understand the domain of the tool anymore, they only want result and expect a simplified interface that already does the unique thing they want to do, but can’t accept that a power tool can only be used with training.

  • CSS has entered the ChatGPT.

    • My rule for using LLMs is that anything that's one off is okay. Anything that's more permanent and committed to a repo needs a human review. I strongly suggest you have an understanding of the basics (at least the box model) so that you are competent at reviewing CSS code before using LLM for that.

Same here, it's one of these things where AI has taken over completely and I'm just a broker that copy-pastes error traces.

My experience got even better once I learned how complex filters worked.

  • learning how to use splits to do multiple things all in one command is a god send. the savings of only needed to read the source and convert to baseband video once is a great savings.

    i started with avisynth, and it took time for my brain to switch to ffmpeg. i don't know how i could function without ffmpeg at this point

Truly, a net positive to my life. Just a few days ago I asked my AI buddy (Claude) to create a zsh script to organize my downloads folder according to the Johnny Decimal system. I’ve since modified it to move the files to a JD setup on my desktop.

The sense of elation I get when I wonder aloud to my digital friend and they generate what I thought was too much to expect. Well worth the subscription.

I think you're onto something. I've had hit or miss experiences with code from LLMs but it definitely makes the searching part different.

I had a problem I'd been thinking about for some time and I thought "Ill have some LLM give me an answer" and it did - it was wrong and didn't work but it got me to thinking about the problem in a slightly different way and my quacks after that got me an exact solution to this problem.

So I'm willing to give the AI more than partial credit.

Basic syntax for re-encoding a video file did take me some time to memorize, but isn't in fact too hard:

  ffmpeg <Input file(s)> <Codec(s)> <MAPping of streams> <Video Filters> output_file

- input file: -i, can be repeated for multiple input files, like so:

  ffmpeg -i file1.mp4 -i file2.mkv

If there is more than one input file then some mapping is needed to decide what goes out in the output file.

- codec: -c:x where x is the type of codec (v: video, a: audio or s:subtitles), followed by its name, like so:

  -c:v libx265

I usually never set the audio codec as the guesses made by ffmpeg, based on output file type, are always right (in my experience), but deciding the video codec is useful, and so is the subtitles codec, as not all containers (file formats) support all codecs; mkv is the most flexible for subtitles codecs.

- mapping of streams: -map <input_file>:<stream_type>:<order>, like so:

  -map 0:v:0 -map 1:a:1 -map 1:a:0 -map 1:s:4

Map tells ffmpeg what stream from the input files to put in the output file. The first number is the position of the input file in the command, so if we're following the same example as above, '0' would be 'file1.mp4' and '1' would be 'file2.mkv'. The parameter in the middle is the stream type (v for video, a for audio, s for subtitles). The last number is the position of the stream IN THE INPUT FILE (NOT in the output file).

The position of the stream in the output file is determined by the position of the map command in the command line, so for example in the command above we are inverting the position of the audio streams (taken from 'file2.mkv'), as audio stream 1 will be in first position in the output file, and audio stream 0 (the first in the second input file) will be in second position in the output file.

This map thing is for me the most counter-intuitive because it's unusual for a CLI to be order-dependent. But, well, it is.

- video filters: -vf

Video filters can be extremely complex and I don't pretend to know how to use them by heart. But one simple video filter that I use often is 'scale', for resizing a video:

  -vf scale=<width>:<height>

width and height can be exact values in pixels, or one of them can be '-1' and then ffmpeg computes it based on the current aspect ratio and the other provided value, like this for example:

  -vf scale=320:-1

This doesn't always work because the computed value should be an even integer; if it's not, ffmpeg will raise an error and tell you why; then you can replace the -1 with the nearest even integer (I wonder why it can't do that by itself, but apparently, it can't).

And that's about it! ffmpeg options are immense, but this gets me through 90% of my video encoding needs, without looking at a manual or ask an LLM. (The only other options I use often are -ss and -t for start time and duration, to time-crop a video.)

  • > This doesn't always work because the computed value should be an even integer; if it's not, ffmpeg will raise an error and tell you why; then you can replace the -1 with the nearest even integer (I wonder why it can't do that by itself, but apparently, it can't).

    It's not about integer, but some of the sizes need to be even. You can use `-vf scale=320:-2` to ensure that.

  • > then you can replace the -1 with the nearest even integer (I wonder why it can't do that by itself, but apparently, it can't).

    Likely because the aspect ratio will no longer be the same. There will either be lost information (cropping), compression/stretching, or black bars, none of which should be default behaviour. Hence, the warning.

I would like to throw in a tool that I built into the ring: gencmd - https://gencmd.com/. There is a web version and also a CLI version.

If the CLI is installed, you can do: gencmd -c ffmpeg extract first 1 minute of video

Or you can just search for the same in the browser page.

I do it the old way: I write down the commands as a shell script, and reuse later.

But really what ffmpeg is missing is an expressive language to describe its operation. Something well-structured, like what jq does for JSON.

  • It already does. It’s the cli flags. What you’re missing is the semantic which you can get with learning about containers, codecs, and other stuff. You don’t use grep and sed with no understanding of what a text file is.

ffmpeg and jq are 2 commands I've about given up trying to "use" with any facility and am more than happy to pawn that off to one of the Gippity's; chat, claude, etc.

For me it was using a container of it, instead of having to install all the things FFmpeg needs on a machine.