Comment by Kristine1975

10 years ago

>Seems like it probably offloads most of the work on GPU

It does just about everything on the GPU. All the CPU does is repeatedly render two triangles and play music: https://www.shadertoy.com/view/MdX3Rr

Edit: I'm wrong about the two triangles. From the .nfo-file:

  for those wondering, this a (too) low density flat mesh displaced with
  a procedural vertex shader. there arent any texturemaps for texturing,
  instead texturing (and shading) is defferred and computed procedurally
  in a full screen quad. this means there is zero overdraw for the quite
  expensive material at the cost of a single geometry pass. then another
  second full screen quad computes the motion blur. camera movements are 
  computed by a shader too and not in the cpu, as only the gpu knows the
  procedural definition of the landscape.

Thanks for detailed response. I figured it mostly did GPU stuff. So, real computing necessary here is a massively-parallel chip with generic and custom hardware with a bunch of memory plus a regular core using 4KB on other end. I think a more interesting challenge would be to force use of a subset of GPU functions or memory plus tiny memory on CPU side. I don't follow demoscene close enough to know if they subset GPU's like that. Idea being making them run closer to the old Voodoo or pre-GeForce GPU's to see just how much 2D or 3D performance once could squeeze out of it.

Tricks could have long-term benefit given any emerging FOSS GPU is more likely to be like one of the older ones given complexity of new ones. I'd clone one like SGI's Octane ones they used to do movies on with mere 200MHz processors. Meanwhile, similar tricks might let one squeeze more out of the existing, embedded GPU's in use. Maybe subset a PC GPU in demoscenes like one of the smartphone GPU's. Yeah, that's got some interesting potential.

  • You seem to think that GPU programming is somehow easy. You should try it and see what you think.

    Yes, there is massive amount of power available but it's not easy to use effectively. You need a different mental model how things work, there's very little shared state and all the algorithms used have to match the model of computation.

    Using the GPU almost exclusively, generating everything procedurally is a massive accomplishment and much more difficult than "normal" CPU+GPU programming or using just the CPU.

    I do not share your view that this would be somehow less impressive because it uses the GPU.

    • I used to do GPU programming. Brief foray into it for game programming plus a then-new field called "GPGPU" pushing its limits. Think I implemented some crypto or physics stuff on one. I've followed some of the recent efforts.

      My points of comparison are what they're doing vs what it's designed to do with what vs what other people do with that and other hardware. It looks great with lots of efficiency. I'll give them that. It's just way less impressive to me given they're using a powerful graphics card to mostly do what it's designed to do plus their innovation.

      3 replies →

  • >plus a regular core using 4KB on other end.

    The .exe is 4K (it has been compressed using Crinkler), not the application's RAM requirements. The game .kkrieger for example is a 96K .exe, but uses several hundred MB of RAM when run.

    Also, the strict size requirements can interfere with execution speed. From the .nfo again:

       believe it or not, this was running at 30 fps in a gefoce 7900 at some
       point, but size optimizations forced us to ask you for a pretty decent
       graphics card, like a geforce 8800gtx or hd4850. please, make sure you
       have d3d9_33.dll somewhere there. also, you only need windows xp.

    • Oh yeah, I forgot about that. I wonder what this one's runtime in RAM is. Regarding GPU quote, that's exactly the sort of thing I'm talking about. It's sort of a cheat where a massive amount of resources are used in one place to reduce a tiny amount in another. An impressive optimization requires little to no extra resources in B when optimizing A. There's some types that straight-up can't seem to have that tradeoff. Yet, the more constrained demo scenes were forced to figure out a bunch of them that worked.

      So, I think there's potential for GPU subsets or CPU/GPU tradeoffs to make for interesting opportunities for people to show off brilliance.

      6 replies →