Comment by nickpsecurity

10 years ago

That's unreal. On what kind of graphics hardware, though? Seems like it probably offloads most of the work on GPU whereas we'd have had to do most of it in software on HW weak enough that 4KB size actually mattered. And probably not achieve this demo.

>Seems like it probably offloads most of the work on GPU

It does just about everything on the GPU. All the CPU does is repeatedly render two triangles and play music: https://www.shadertoy.com/view/MdX3Rr

Edit: I'm wrong about the two triangles. From the .nfo-file:

  for those wondering, this a (too) low density flat mesh displaced with
  a procedural vertex shader. there arent any texturemaps for texturing,
  instead texturing (and shading) is defferred and computed procedurally
  in a full screen quad. this means there is zero overdraw for the quite
  expensive material at the cost of a single geometry pass. then another
  second full screen quad computes the motion blur. camera movements are 
  computed by a shader too and not in the cpu, as only the gpu knows the
  procedural definition of the landscape.

  • Thanks for detailed response. I figured it mostly did GPU stuff. So, real computing necessary here is a massively-parallel chip with generic and custom hardware with a bunch of memory plus a regular core using 4KB on other end. I think a more interesting challenge would be to force use of a subset of GPU functions or memory plus tiny memory on CPU side. I don't follow demoscene close enough to know if they subset GPU's like that. Idea being making them run closer to the old Voodoo or pre-GeForce GPU's to see just how much 2D or 3D performance once could squeeze out of it.

    Tricks could have long-term benefit given any emerging FOSS GPU is more likely to be like one of the older ones given complexity of new ones. I'd clone one like SGI's Octane ones they used to do movies on with mere 200MHz processors. Meanwhile, similar tricks might let one squeeze more out of the existing, embedded GPU's in use. Maybe subset a PC GPU in demoscenes like one of the smartphone GPU's. Yeah, that's got some interesting potential.

    • You seem to think that GPU programming is somehow easy. You should try it and see what you think.

      Yes, there is massive amount of power available but it's not easy to use effectively. You need a different mental model how things work, there's very little shared state and all the algorithms used have to match the model of computation.

      Using the GPU almost exclusively, generating everything procedurally is a massive accomplishment and much more difficult than "normal" CPU+GPU programming or using just the CPU.

      I do not share your view that this would be somehow less impressive because it uses the GPU.

      4 replies →

    • >plus a regular core using 4KB on other end.

      The .exe is 4K (it has been compressed using Crinkler), not the application's RAM requirements. The game .kkrieger for example is a 96K .exe, but uses several hundred MB of RAM when run.

      Also, the strict size requirements can interfere with execution speed. From the .nfo again:

         believe it or not, this was running at 30 fps in a gefoce 7900 at some
         point, but size optimizations forced us to ask you for a pretty decent
         graphics card, like a geforce 8800gtx or hd4850. please, make sure you
         have d3d9_33.dll somewhere there. also, you only need windows xp.

      7 replies →

The 4kb restriction isn't there to make it run on weak hw, it's there to push people.

  • However it does irritate me that it's pregenerating the entire scene in memory. Being allowed to use 300MB of RAM doesn't strike me as very limiting.

    • You wanted optimized code size and optimized performance?

      I mean, sure, but think about how big 4KB is, the tricks that are being used to create the scenes are crazy hacks using default Windows sound files and literally anything the executable can reference on the cheap.

      Procedural content generation is really expensive (in general), but that's the beauty of it. You find a way to abstract the content into an algorithm, and then you can reduce the size of the assets, but you pretty much always need to pay the price somewhere.

      But hey, I understand the sentiment, I wish Slack didn't consume 2 GB of RAM on my machine.

    • "Being allowed to use 300MB of RAM doesn't strike me as very limiting."

      BOOM! I knew it was going to be huge. That's a beefy GPU + 300MB in RAM + pregenerating. I'd have... made sacrifices to have that even in the Half-Life 1 days. :)

  • I figured that. It's just that almost everythings done on the GPU for a rendering demo. That's really pushing people. ;)

    • The wink face makes it seem like you think this is easy because using a GPU to execute the program is allowed. No?

      Edit: just read your other comment about real challenges in the C64 subset of the demoscene. That's like "You set a record in a 1600m race? For a real challenge, set a record in a marathon." It's just arbitrarily moving the totally legitimate goalposts to a different challenge because you prefer it.

      8 replies →

The C64 subset of the demoscene is still going if you want it.

  • That's a real challenge. :) My comment to Kristine has some other details on how we might do something between that and a full GPU.

    • If we go that way, a real challenge would be designing your own computer then making a demo to run on it.

      Just because the tools are more capable doesn't mean the challenge is any less real or the result less impressive.

      3 replies →