← Back to context

Comment by jazzyjackson

4 days ago

This reminds me I need to publish my write up on how I've been converting digitized home video tapes into clips using scene detection, but in case anyone is googling for it, here's a gist I landed on that does a good job of it [0] but sometimes it's fooled by e.g. camera flashes or camera shake so I need to give it a start and end file and have ffmpeg concatenate them back together [1]

Weird thing is I got better performance without "-c:v h264_videotoolbox" on latest Mac update, maybe some performance regression in Sequoia? I don't know. The equivalent flag for my windows machine with Nvidia GPU is "-c:v h264_nvenc" . I wonder why ffmpeg doesn't just auto detect this? I get about 8x performance boost from this. Probably the one time I actually earned my salary at work was when we were about to pay out the nose for more cloud servers with GPU to process video when I noticed the version of ffmpeg that came installed on the machines was compiled without GPU acceleration !

[0] https://gist.githubusercontent.com/nielsbom/c86c504fa5fd61ae...

[1] https://gist.githubusercontent.com/jazzyjackson/bf9282df0a40...

> Probably the one time I actually earned my salary at work was when we were about to pay out the nose for more cloud servers with GPU to process video when I noticed the version of ffmpeg that came installed on the machines was compiled without GPU acceleration !

Issue with cloud CPU's is that they don't come with any of the consumer grade CPU built-in hardware video encoders so you'll have to go with the GPU machines that cost so much more. To be honest I haven't tried using HW accel in the cloud to have a proper price comparison, are you saying you did it and it was worth it?

  • Are the hardware encoders even good? I thought that unless you need something realtime, it's always better to spend the cpu cycles on a better encode with th software encoder. Or have things changed ?

    • They still suck compared to software encoders. This is true for both H.264 and H.265 on AMD, Nvidia, and Intel GPUs. They’re “good enough” for live streaming, or for things like Plex transcoding, or where you care only about encoding speed and have a large bandwidth budget. They’re better than they used to be, but not worth using for anything you really care about.

    • That's my experience too. I transcode a lot of video for a personal project and hardware acceleration isn't much faster. I figure that's because on CPU I can max out my 12 cores.

      The file size is also problematic I've had hardware encodes twice as large as the same video encoded with CPU.

      4 replies →

  • We were a quick and dirty R&D team that had to do a lot of video processing quickly, we were not very cost sensitive and didn’t have anything other than AWS to work with, so I can’t speak to whether it was worth it :)

In your snippets, you don't appear to be deinterlacing. If your pre-digitized clips are already deinterlaced, that's fine, but if they're not, you're encoding interlaced material as progressive, and mangling the quality. Try adding a bwdif filter so that your 30i content gets encoded as 60p (which will look more like the original videotapes).

I used ffmpeg for empty scene detection- I have a camera pointed at the flight path for SFO, and stripped out all the frames that didn't have motion in them. You end up with a continuous movie of planes passing through, with none of the boring bits.

  • Then can you merge all the clips starting when motion starts and see hundreds of planes fly across at once?

    • Interesting. Yes, I assume that's possible although I'm not sure how you handle the background- I guess you find an empty frame, and subtract that from every image with a plane.

      One of the advantages of working with image data is that movies are really just 3d data and as long as all the movies you work with are the same size, if you have enough ram, or use dask, you could basically do this in a couple lines of numpy.

      1 reply →

> I wonder why ffmpeg doesn't just auto detect this?

Hardware encoding is often less configurable and involves greater trade-offs than using sophisticated software codecs, and don't produce exactly equivalent results even with equivalent parameters. On top of that, systems often have multiple hardware APIs to choose from that often different features.

FFMpeg is a complex command-line tool intended for users who are willing to learn its intricacies, so I'm not sure it makes sense for it to set defaults based on assumptions.

  -c:v h264_nvenc

This is useful for batch encoding, when you're encoding a lot of different videos at once, because you can get better encoding throughput.

But in my limited experiments a while back, I found the output quality to be slightly worse than with libx264. I don't know if there's a way around it, but I'm not the only one who had that experience.

  • IIRC they have improved the hardware encoder over the generations of cards, but yes NVENC has worse quality than libx264. NVENC is really meant for running the compression in real-time with minimal performance impact to the system. Basically for recording/streaming games.

  • So counterintuitive that nvenc confers worse quality than QSV/x264 variants, but it is both in theory and in my testing as well.

    But for multiple streams or speed requirements, nvenc is the only way to fly.

    • Why's that counterintuitive? It makes intuitive sense to me that an approach optimized for throughput would make trade-offs that are less optimized for quality.

  • Co-signing. Encode time was faster with nvenc, but quality was noticeably worse even to my untrained eye.

    • Fascinating, it didn't occur to me quality could take a hit, I thought the flag merely meant "perform h264 encoding over here"

      Edit: relevant docs from ffmpeg, they back up your perception, and now I'm left to wonder how much I want to learn about profiles in order to cut up these videos. I suppose I'll run an overnight job to reencode them from Avi to h264 at high quality, and make sure the scene detect script is only doing copys, not reencoding, since that's the part I'm doing interactively, there's no real reason I should be sitting at the computer while its transcoding.

      Hardware encoders typically generate output of significantly lower quality than good software encoders like x264, but are generally faster and do not use much CPU resource. (That is, they require a higher bitrate to make output with the same perceptual quality, or they make output with a lower perceptual quality at the same bitrate.)

      [0] https://trac.ffmpeg.org/wiki/HWAccelIntro

      3 replies →