Comment by ga2mer

24 days ago

>Data duplication can't just be banned by Steam

Steam compresses games as much as possible, so in the case of Helldivers 2, you had to download between ~30 and ~40 GB, which was then unpacked to 150 GB (according to SteamDB[0])

[0] https://steamdb.info/app/553850/depots/

You are missing that each update takes AGES while it tortures your disk for patching the files (on my machine it takes 15min or so, and that's on an SSD). So I agree that this is careless and reminds me of the GTA5 startup time that was fixed by a dedicated player who finally had enough and reverse engineered the problem (see https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...). I still find these things hard to accept.

  • Steam update durations depend on compression + CPU performance + SSD I/O. Things will be harder when the disk is almost full and live defragmentation kicks in to get free space for contiguous files. Some SSDs are fast enough to keep up with such a load, but a lot of them will quickly hit their DRAM limits and suddenly that advertised gigabyte per second write speed isn't all that fast. Bonus points for when your SSD doesn't have a heatsink and moving air over it, making the controller throttle hard.

    Patching 150GiB with a compressed 15GiB download just takes a lot of I/O. The alternative is downloading a fresh copy of the 150GiB install file, but those playing on DSL will probably let their SSD whizz a few minutes longer than spend another day downloading updates.

    If your SSD is slower than your internet capacity, deleting install files and re-downloading the entire game will probably save you some time.

In this case, the bug was 131 GB of wasted disk space after installation. Because the waste came from duplicate files, it should have had little impact on download size (unless there's a separate bug in the installer...)

This is why the cost of the bug was so easy for the studio to ignore. An extra 131 GB of bandwidth per download would have cost Steam several million dollars over the last two years, so they might have asked the game studio to look into it.

  • This article presents it as a big success, but it could be read the opposite way: "Developers of Helldivers 2 wasted 130 GB for years and didn't care because it was others people computers"

  • > An extra 131 GB of bandwidth per download would have cost Steam several million dollars over the last two years

    Nah, not even close. Let's guess and say there were about 15 million copies sold. 15M * 131GB is about 2M TB (2000 PB / 2 EB). At 30% mean utilisation, a 100Gb/s port will do 10 PB in a month, and at most IXPs that costs $2000-$3000/month. That makes it about $400k in bandwidth charges (I imagine 90%+ is peered or hosted inside ISPs, not via transit), and you could quite easily build a server that would push 100Gb/s of static objects for under $10k a pop.

    It would surprise me if the total additional costs were over $1M, considering they already have their own CDN setup. One of the big cloud vendors would charge $100M just for the bandwidth, let alone the infrastructure to serve it, based on some quick calculation I've done (probably incorrectly) -- though interestingly, HN's fave non-cloud vendor Hetzner would only charge $2M :P

    • Isn't it a little reductive to look at basic infrastructure costs? I used Hetzner as a surrogate for the raw cost of bandwidth, plus overheads. If you need to serve data outside Europe, the budget tier of BunnyCDN is four times more expensive than Hetzner.

      But you might be right - in a market where the price of the same good varies by two orders of magnitude, I could believe that even the nice vendors are charging a 400% markup.

    • Yea, I always laugh when folks talk about how expensive they claim bandwidth is for companies. Large “internet” companies are just paying a small monthly cost for transit at an IX. They arent paying $xx/gig ($1/gig) like the average consumer is. If you buy a 100gig port for $2k, it costs the same if you’re using 5 GB a day or 8 PB per day.

    • Off topic question.

      > I imagine 90%+ is peered or hosted inside ISPs, not via transit

      How hosting inside ISPs function? Does ISP have to MITM? I heard similar claims for Netflix and other streaming media, like ISPs host/cache the data themselves. Do they have to have some agreement with Steam/Netflix?

      3 replies →

Makes sense, initial claim was that HD2 size was mainly because of duplicated assets, and any compression worth it's salt would de-duplicate things effectively.