Comment by qbow883

2 months ago

Setting aside the various formatting problems and the LLM writing style, this just seems all kinds of wrong throughout.

> “Just lower the bitrate,” you say. Great idea. Now it’s 10Mbps of blocky garbage that’s still 30 seconds behind.

10Mbps should be way more than enough for a mostly static image with some scrolling text. (And 40Mbps are ridiculous.) This is very likely to be caused by bad encoding settings and/or a bad encoder.

> “What if we only send keyframes?” The post goes on to explain how this does not work because some other component needs to see P-frames. If that is the case, just configure your encoder to have very short keyframe intervals.

> And the size! A 70% quality JPEG of a 1080p desktop is like 100-150KB. A single H.264 keyframe is 200-500KB.

A single H.264 keyframe can be whatever size you want, *depending on how you configure your encoder*, which was apparently never seriously attempted. Why are we badly reinventing MJPEG instead of configuring the tools we already have? Lower the bitrate and keyint, use a better encoder for higher quality, lower the frame rate if you need to. (If 10 fps JPEGs are acceptable, surely you should try 10 fps H.264 too?)

But all in all the main problem seems to be squeezing an entire video stream through a single TCP connection. There are plenty of existing solutions for this. For example, this article never mentions DASH, which is made for these exact purposes.

26 comments

qbow883

Sesse__ 2 months ago

> Why are we badly reinventing MJPEG instead of configuring the tools we already have?

Is it much of a stretch to assume that in the AI gold rush, there will be products made by people who are not very experienced engineers, but just push forward and assume the LLM will fix all their problems? :-)

qingcharles 2 months ago

I built a little tool using AI recently and it worked great but it was brittle as hell and I was constantly waiting for it to fail. A few days later I realized there was a much better way of writing it. I'd boxed the LLM in by proposing the way to code it.
I've changed my AGENTS.md now so it basically says "Assume user is ignorant to other better solutions to the problem they are asking. Don't assume their given solution to the problem is the best one, look at the problem itself and propose other ways to solve it."

ozim 2 months ago

*Why are we badly reinventing MJPEG instead of configuring the tools we already have?*

Getting to know and understand existing tools costs time/money. If it less expensive or more expensive than reinventing something badly is very complicated to judge and depends on loads of factors.

Might be that reinventing something badly - but good enough for the case is best use of resources.

antisol 2 months ago
From TFA:
Implementation complexity: h264 Stream: 3 months of rust JPEG Spam: fetch() in a loop
I don't see how it could have taken 3 months to read up on existing technologies. And that "3 month" number is before we start factoring in time spent on:
* Writing code for JPEG Spam / "fetch() in a loop" method
* Mechanisms to switch between h264 / jpeg modes
* Debugging implementation of 2 modes
* Debugging switching back and forth between the 2 modes
* Maintenance of 2 modes into the future

bugufu8f83 2 months ago

>Setting aside...the LLM writing style

I don't want to set that aside either. Why is AI generated slop getting voted to the top of HN? If you can't be bothered to spend the time writing a blog post, why should I be bothered spending my time reading it? It's frankly a little bit insulting.

piskov 2 months ago
Don’t assume something you cannot prove. It was great writing
- npunt 2 months ago
  
  Normally the 1 sentence per para LinkedIn post for dummies writing style bugs me to no end, but for a technical article that's continually hopping between questions, results, code, and explanations, it fits really well and was a very easy article to skim and understand.
  
  1 reply →
- bugufu8f83 2 months ago
  
  >Don’t assume something you cannot prove.
  Well it's an inherently unprovable accusation, so assumption will have to do. It reeks of LLM-ese in certain word choices, phrases, and structure, though. I thought it was quite clear.
  >It was great writing
  Err... no accounting for taste, I suppose.
  
  7 replies →
- tylervigen 2 months ago
  
  The author replied below that they used Opus to write the blog post.
- rasz 2 months ago
  
  You mean other than this being AI slop company, usage is monitoring AI slop output and author confirming blog is AI slop? https://news.ycombinator.com/item?id=46372060
- PunchyHamster 2 months ago
  
  Looked like typical medium.com slop but with a bit more technical detail. Not sure where you see greatness

mschuster91 2 months ago

> For example, this article never mentions DASH, which is made for these exact purposes.

DASH isn't supported on Apple AFAIK. HLS would be an idea, yes...

But in either case: you need ffmpeg somewhere in your pipeline for that experience to be even remotely enjoyable. No ffmpeg? No luck, good luck implementing all of that shit yourself.

rezonant 2 months ago

Or Gstreamer, which the article says they were using.
> DASH isn't supported on Apple AFAIK. HLS would be an idea, yes...
They said they implemented a WebCodecs websocket custom implementation, surely they can use Dash.js here. Or rather, their LLM can since it's doubtful they are writing any actual code.
They would need to use LL-DASH or HLS low latency but it's quite achievable.

rdsubhas 2 months ago

Huh? This is the least LLM writing style I've encountered. Extraordinary claims require extraordinary proof.

nathan82 2 months ago
It's not an extraordinary claim, it's a mundane and plausible one. This is exactly what you get when you ask an LLM to write in a "engaging conversational" style, and skip any editing after the fact. You could never prove it but there are a LOT of tells.
"The key insight" - llms love key insights! "self-contained corruption-free" - they also love over-hypenating, as much as they love em-dashing. Both abundant here. "X like it's 2005" and also "Y like it's 2009" - what a cool casual turn of phrase, so natural! The architecture diagram is definitely unedited AI, Claude always messes up the border alignment on ascii boxes
I wouldn't mind except the end result is imprecise and sloppy, as pointed out by the GP comment. And the tone is so predictable/boring at this point, I'd MUCH rather read poorly written human output with some actual personality.
Tenobrus 2 months ago

ai detectors are never totally accurate but this one is quite good and it suggests something like 80% of this article is llm generated. honestly idk how you didn't get that just by reading it tho, maybe you haven't been exposed to much modern llm-generated content?
https://www.pangram.com/history/5cec2f02-6fd6-4c97-8e71-d509...