Simplifying Vulkan one subsystem at a time

1 day ago (khronos.org)

The main problem with Vulkan isn't the programming model or the lack of features. These are tackled by Khronos. The problem is with coverage and update distribution. It's all over the place! If you develop general purpose software (like Zed), you can't assume that even the basic things like dynamic rendering are supported uniformly. There are always weird systems with old drivers (looking at Ubuntu 22 LTS), hardware vendors abandoning and forcefully deprecating the working hardware, and of course driver bugs... So, by the time I'm going to be able to rely on the new shiny descriptor heap/buffer features, I'll have more gray hair and other things on the horizon.

  • > Ubuntu LTS

    This is why I try to encourage new Linux users away from Ubuntu: it's a laggard with, often important, functionality. It is now an enterprise OS (where durability is more important than functionality), it's not really suitable for a power user (like someone who would use Zed).

    • My understanding with Mesa is that it has very few dependencies and is ABI stable, so freezing Mesa updates is counterproductive. I'm not sure about Snaps, but Flatpak ships as it's own system managing Mesa versions.

      2 replies →

    • " It is now an enterprise OS"

      You really want enterprise standards support for your graphics API.

      Bleeding edge ...is not nice in graphics. Especially the more complex the systems get, so do the edge cases.

      I mean in general. If you are writing a high end game engine don't listen to me, you know better. But if you are a mid-tier graphics wonk like myself 20 year old concepts are usually quite pareto-optimal for _lots_ of stuff and should be robustly covered by most apis.

      If I could give one advice for myself 20 years ago.

      For anything practical - focus on the platform native graphics API. Windows - DirectX. Mac - OpenGL (20 years ago! Predates metal!. Today ofc would be metal).

      I don't think that advice would be much different today (apart from Metal) IF you don't know what to do and just want to start on doing graphics. For senior peeps who know the field do whatever rights for you of course.

      Linux - good luck. Find the API that has best support for your card & driver combo - meaning likely the most stabilized with most users.

    • I encourage them away from Ubuntu because of the Snaps. If people want an enterprise distro that lags upstreams by a lot they should go with Debian.

    • And this is a prime example of development-centric thinking prioritizing developer comfort over the capabilities and usability of the actual software. Rather than targeting stable older feature sets it's always targeting the bleeding edge and then being confused that this doesn't work on machines that aren't their own and then blaming everyone else for their decision. 4 years is not a long time (LTS). 4 years is the minimum that software should be able to live.

  • Yes, this is the problem. They tout this new latest and greatest extension that fixes and simplifies a lot, yet you go look up the extension on vulkan.gpuinfo.org and see ... currently 0.3% of all devices support it. Which means you can't in any way use it. So you wait 5 years, and now maybe 20% of devices support it. Then you wait another 5 years, and maybe 75% of devices support it. And maybe you can get away with limiting your code to running on 75% of devices. Or, you wait another 5 years to get into the 90s.

    • > look up the extension on vulkan.gpuinfo.org and see ... currently 0.3% of all devices support it.

      Afaik the extension isn't even finalized yet and they are pre-releasing it to gather feedback.

      And you can't use gpuinfo for assessing how widely available something is or isn't. The stats contain reports from old drivers too so the numbers you see are no indication of hardware support.

      To assess how widely supported something is, you need to look at gpuinfo, sort by date or driver version and cross reference something like steam hardware survey.

  • > There are always weird systems with old drivers (looking at Ubuntu 22 LTS)

    While I agree with your general point, RHEL stands out way, way more to me. Ubuntu 22.04 and RHEL 9 were both released in 2022. Where Ubuntu 22.04 has general support until mid-2027 and security support until mid-2032, RHEL 9 has "production" support through mid-2032 and extended support until mid-2034.

    Wikipedia sources for ubuntu[0] and RHEL [1]:

    [0] https://en.wikipedia.org/wiki/Ubuntu#Releases

    [1] https://upload.wikimedia.org/wikipedia/en/timeline/fcppf7prx...

  • Tbh, we should more readily abandon GPU vendors that refuse to go with the times. If we cater to them for too long, they have no reason to adapt.

    • I had a relatively recent graphics card (5 years old perhaps?). I don't care about 3D or games, or whatever.

      So I was sad not to be able to run a text editor (let's be honest, Zed is nice but it's just displaying text). And somehow the non-accelerated version is eating 24 cores. Just for text.

      https://github.com/zed-industries/zed/discussions/23623

      I ended up buying a new graphics card in the end.

      I just wish everyone could get along somehow.

      8 replies →

    • No. I remember a phone app ( Whatsapp?) doggedly supporting every godforsaken phone, even the nokias with the zillion incompatible Java versions. A developer should go where the customers are.

      What does help is an industry accepted benchmark, easily ran by everyone. I remember browser css being all over the place, until that whatsitsname benchmark (with the smiley face) demonstrated which emperors had no clothes. Everyone could surf to the test and check how well their favorite browser did. Scores went up quickly, and today, css is in a lot better shape.

      1 reply →

    • > we should more readily abandon GPU vendors

      This was so much more practical before the market coalesced to just 3 players. Matrox, it's time for your comeback arc! and maybe a desktop pcie packaging for mali?

      1 reply →

    • NVidia says no new gamer GPUs in 2026, and increasing prices through 2030. They're too focused on enterprise AI machines.

  • Some just ignore it and require using recent Vulkan (see for example dxvk and etc.). Do that. Ubuntu LTS isn't something you should be using for graphics dependent desktop scenarios anyway. Limiting features based on that is a bad idea.

I wish they would just allow us to push everything to GPU as buffer pointers, like buffer_device address extension allows you to, and then reconstruct the data to your required format via shaders.

The GPU programming seems to be both super low level, but also high level, cause textures and descriptors need these ultra specific data format's, and then the way you construct and upload those formats are very complicated and change all the time.

Is there really no way to simplify this ?

Regular vertex data was supposed to be strictly pre formatted in pipeline too, util it was not suddenly, and now we can just give the shader a `device_address`extension memory pointer and construct the data from that.

  • I also want what you're describing. It seems like the ideal "data-in-out" pipeline for purely compute based shaders.

    I've brought it up several times when talking with folks who work down in the chip level for optimizing these operations and all I can say is, there are a lot of unforeseen complications to what we're suggesting.

    It's not that we can't have a GPU that does these things, it's apparently more of a combination of previous and current architectural decisions that don't want that. For instance, an nVidia GPU is focused on providing the hardware optimizations necessary to do either LLM compute or graphics acceleration, both essentially proprietary technologies.

    The proprietariness isn't why it's obtuse though, you can make a chip go super-duper fast for specific tasks, or more general for all kinds of tasks. Somewhere, folks are making a tradeoff of backwards compatibility and supporting new hardware accelerated tasks.

    Neither of these are "general purpose compute and data flow" focuses. As such, you get the GPU that only sorta is configurable for what you want to do. Which in my opinion explains your "GPU programming seems to be both super low level, but also high level" comment.

    That's been my experience. I still think what you're suggesting is a great idea and would make GPU's a more open compute platform for a wider variety of tasks, while also simplifying things a lot.

    • This is true, but what the parent comment is getting at is we really just want to be able to address graphics memory the same way it's exposed in CUDA for example. Where you can just have pointers to GPU memory in structures visible to the CPU, without this song and dance with descriptor set bindings.

  • If you got what you're asking for you'd presumably lose access to any fixed function hardware. RE your example, knowing the data format permits automagic hardware accelerated translations between image formats.

    You're free to do what you're asking after by simply performing all operations manually in a compute shader. You can manually clip, transform, rasterize, and even sample textures. But you'll lose the implicit use of various fixed function hardware that you currently benefit from.

    • > If you got what you're asking for you'd presumably lose access to any fixed function hardware.

      Are there any fixed functions left that aren't just being implemented by the general compute shader hardware?

      I guess the ray tracing stuff would qualify, but that isn't what people are complaining about here.

  • I’m not watching Rust as closely as I once did, but it seems like buffer ownership is something it should be leaning on more fully.

    There’s an old concurrency pattern where a producer and consumer tag team on two sets of buffers to speed up throughput. Producer fills a buffer, transfers ownership to the consumer, and is given the previous buffer in return.

    It is structurally similar to double buffered video, but for any sort of data.

    It seems like Rust would be good for proving the soundness. And it should be a library now rather than a roll your own.

    • > There’s an old concurrency pattern where a producer and consumer tag team on two sets of buffers to speed up throughput. Producer fills a buffer, transfers ownership to the consumer, and is given the previous buffer in return.

      Isn't this just called a swapchain?

At least they are making an effort to correct the extension spaghetti, already worse than OpenGL.

Addiitionally most of these fixes aren't coming into Android, now getting WebGPU for Java/Kotlin[0] after so many refused to move away from OpenGL ES, and naturally any card not lucky to get new driver releases.

Still, better now than never.

[0] - https://developer.android.com/jetpack/androidx/releases/webg...

  • As someone from game development, not supporting Vulkan on Android and sticking with OpenGL ES instead is a safer bet. There is always some device(s) that bug out on Vulkan badly. Nobody wants to sit and find workarounds for that obscure vendor.

  • Bizarre take. Notice how that WebGPU is an AndroidX library? That means WebGPU API support is built into apps via that library and runs on top of the system's Vulkan or OpenGL ES API.

    Do you work for Google or an Android OEM? If not, you have no basis to make the claim that Android will cease updating Vulkan API support.

    • I did not do such claim.

      WebGPU on Android runs on top of Vulkan.

      If you knew about 3D programming on Android, you would know that there are ongoing efforts to have only Vulkan, with OpenGL ES on top.

      However Java and Kotlin devs refuse to bother with the NDK for Vulkan, and keep reaching for OpenGL ES instead.

      Please refer to Google talks on Vulkanised conferences.

      6 replies →

  • > Addiitionally most of these fixes aren't coming into Android

    The fuck are you talking about? Of course they'll come to Android

    • Thanks for showing the audience the lack of experience with Vulkan drivers on Android.

    • The chain for GPU driver updates on Android phones is incredibly long.

      Google > Phone Vendor > SoC Vendor > Arm > SoC Vendor > Phone Vendor > Android Update

      and that is the happy case. The general case looks more like this:

      Google > Phone Vendor > SoC Vendor > "Have you considered buying a new SoC?"

I suspect we are only 5-10 years away until Vulkan is finaly usable. There are so many completely needlessly complex things, or things that should have an easy-path for the common case.

BDA, dynamic rendering and shader objects almost make Vulkan bearable. What's still sorely missing is a single-line device malloc, a default queue that can be used without ever touching the queue family API, and an entirely descriptor-free code path. The latter would involve making the NV bindless extension the standard which simply gives you handles to textures, without making you manage descriptor buffers/sets/heaps. Maybe also put an easy-path for synchronization on that list and making the explicit API optional.

Until then I'll keep enjoying OpenGL 4.6, which already had BDA with c-style pointer syntax in glsl shaders since 2010 (NV_shader_buffer_load), and which allows hassle-free buffer allocation and descriptor-set-free bindless textures.

  • I use Vulkan on a daily basis. Some examples:

    - with DXVK to play games - with llama.cpp to run local LLMs

    Vulkan is already everywhere, from games to AI.

I'm really enjoying these changes. Going from render passes to dynamic rendering really simplified my code. I wonder how this new feature compares to existing bindless rendering.

From the linked video, "Feature parity with OpenCL" is the thing I'm most looking forward to.

  • You can use descriptor heaps with existing bindless shaders if you configure the optional "root signature".

    However it looks like it's simpler to change your shaders (if you can) to use the new GLSL/SPIR-V functionality (or Slang) and don't specify the root signature at all (it's complex and verbose).

    Descriptor heaps really reduce the amount of setup code needed, with pipeline layouts gone you can drop like third of the code needed to get started.

    Similar in magnitude to dynamic rendering.

    • Having quite recently written a (still experimental) Vulkan backend for sokol_gfx.h, my impression is that starting with `VK_EXT_descriptor_buffer` (soon-ish to be replaced with `VK_EXT_descriptor_heap`), the "core API" is in pretty good shape now (with the remaining problem that all the outdated and depreciated sediment layers are still part of the core API, this should really be kicked out - e.g. when I explicitly request a specific API version like 1.4 I don't care about any features that have been deprecated in versions up to 1.4 and I don't care about any extensions that have been incorporated into the core API up until 1.4, so I'd really like to have them at least not show up in the Vulkan header so that code completion cannot sneak in outdated code (like EXT/KHR postfixes for things that have been moved into core).

      The current OpenGL-like sediment-layer-model (e.g. never remove old stuff) is extremely confusing when not following Vulkan development very closely since 2016, since there's often 5 ways to do the same thing, 3 of which are deprecated - but finding out whether a feature is deprecated is its own sidequest.

      What I actually wrestled with most was getting the outer frame-loop right without validation layer errors. I feel like this should be the next thing which the "Eye of Khronos" should focus on.

      All official tutorial/example code I've tried doesn't run without swapchain-sync-related validation errors on one or another configuration. Even this 'best practices' example code which demonstrates how to do the frame-loop scaffolding correctly produces valiation layer errors, so it's also quite useless:

      https://docs.vulkan.org/guide/latest/swapchain_semaphore_reu...

      What's worse: different hardware/driver combos produce different validation layer errors (even in the swapchain-code which really shouldn't have different implementations across GPU vendors - e.g. shouldn't Khronos provide common reference code for those GPU-independent parts of drivers?). I wonder if there is actually any Vulkan code out there which is completely validation-layer-clean across all possible configs (I seriously doubt it).

      Also the VK_[EXT/KHR]_swapchain_maintenance1 extension which is supposed to fix all those little warts has such a low coverage that it's not worth supporting (but it should really be part of the core API by now - the extension is from 2019).

      Anyway... baby steps into the right direction, only a shame that it took a decade ;)

      23 replies →

    • Are there any good Vulkan tutorials that are continuously updated to reflect these advancement and ease of use improvements?

      It's a similar challenge to the many different historical strata of C++ resources.

      3 replies →

I would like to / am "supposed to" use Vulkan but it's a massive pain coming from OpenCL, with all kinds of issues that need safe handling which simply don't come from OpenCL workloads.

Everyone keeps telling me OpenCL is deprecated (which is true, although it's also true that it continues to work superbly in 2026) but there isn't a good / official OpenCL to Vulkan wrapper out there to justify it for what I do.

Not sure if this is an "oh, no" event.

So this goes into Vulkan. Then it has to ship with the OS. Then it has to go into intermediate layers such as WGPU. Which will probably have to support both old and new mode. Then it has to go into renderers. Which will probably have to support both old and new mode. Maybe at the top of the renderer you can't tell if you're in old or new mode, but it will probably leak through. In that case game engines have to know about this. Which will cause churn in game code.

And Apple will do something different, in Metal.

Unreal Engine and Unity have the staffs to handle this, but few others do. The Vulkan-based renderers which use Vulkan concurrency to get performance OpenGL can't deliver are few. Probably only Unreal Engine and Unity really exploit Vulkan properly.

Here's the top level of the Vulkan changes.[1] It doesn't look simple.

(I'm mostly grumbling because the difficulty and churn in Vulkan/WGPU has resulted in three abandoned renderers in Rust land through developer burnout. I'm a user of renderers, and would like them to Just Work.)

[1] https://docs.vulkan.org/refpages/latest/refpages/source/VK_E...

  • > And Apple will do something different, in Metal.

    Microsoft, Sony and Nintendo as well.

  • > Not sure if this is an "oh, no" event.

    it's not.

    descriptor sets are realistically never getting deprecated. old code doesn't have to be rewritten if it works. there's no point.

    if you're doing bindless (which you most certainly arent if you're still stuck with descriptor sets) this offers a better way of handling that.

    if you care to upgrade your descriptor set based path to use heaps, this extension offers a very nice pathway to doing so _without having to even recompile shaders_.

    for new/future code, this is a solid improvement.

    if you're happy where you are with your renderer, there isn't a need to do anything.

    • And apparently if you do mobile you stay away from big chunk of dynamic rendering and use Vulkan 1.0 style renderpasses... or you leave performance on the floor (based on guidelines from various mobile GPU vendors)

      3 replies →

Does this evolution of the Vulkan API get closer to the model explained in https://news.ycombinator.com/item?id=46293062 ?

  • Yes, you can get very close to that API with this extension + existing Vulkan extensions. The main difference is that you still kind of need opaque buffer and texture objects instead of raw pointers, but you can get GPU pointers for them and still work with those. In theory I think you could do the malloc API design there but it's fairly unintuitive in Vulkan and you'd still need VkBuffers internally even if you didn't expose them in a wrapper layer. I've got a (not yet ready for public) wrapper on Vulkan that mostly matches this blog post, and so far it's been a really lovely way to do graphics programming.

    The main thing that's not possible at all on top of Vulkan is his signals API, which I would enjoy seeing - it could be done if timeline semaphores could be waited on/signalled inside a command buffer, rather than just on submission boundaries. Not sure how feasible that is with existing hardware though.

  • It's a baby-step in this direction, e.g. from Seb's article:

    > Vulkan’s VK_EXT_descriptor_buffer (https://www.khronos.org/blog/vk-ext-descriptor-buffer) extension (2022) is similar to my proposal, allowing direct CPU and GPU write. It is supported by most vendors, but unfortunately is not part of the Vulkan 1.4 core spec.

    The new `VK_EXT_descriptor_heap` extension described in the Khronos post is a replacement for `VK_EXT_descriptor_buffer` which fixes some problems but otherwise is the same basic idea (e.g. "descriptors are just memory").

I personally just switched to using push descriptors everywhere. On desktops, the real world limits are high enough that it end up working out fine and you get a nice immediate mode API like OpenGL.

  • That's the right way to go for simple use cases and especially getting started on a new project.

Vulkan takes like 600+ lines to do what Metal does in 50.

I'm sure the comments will be all excuses and whys but they're all nonsense. It's just a poorly thought out API.

  • My understanding of API standards that need to be implemented by multiple vendors is that there's a tradeoff between having something that's easy for the programmer to use and something that's easy for vendors to implement.

    A big complaint I hear about OpenGL is that it has inconsistent behavior across drivers, which you could argue is because of the amount of driver code that needs to be written to support its high-level nature. A lower-level API can require less driver code to implement, effectively moving all of that complexity into the open source libraries that eventually get written to wrap it. As a graphics programmer you can then just vendor one of those libraries and win better cross-platform support for free.

    For example: I've never used Vulkan personally, but I still benefit from it in my OpenGL programs thanks to ANGLE.

  • Agreed. It has way too much completely unnecessary verbosity. Like, why the hell does it take 30 lines to allocate memory rather than one single malloc.

    • just use the vma library. the low level memory allocation interface is for those who care to have precise control over allocations. vma has shipped in production software and is a safe choice for those who want to "just allocate memory".

      13 replies →

  • Same with DirectX, if only COM actually had better tooling, instead of pick your adventure C++ framework, or first class support for .NET.

    • DXGI+D3D11 via C is actually fine and is close or even lower than Metalv1 when it comes to 'lines of code needed to get a triangle on screen". D3D12 is more boilerplate-heavy, but still not as bad as Vulkan.

      1 reply →

How are folks feeling about WebGPU these days?

Once Vulkan is finally in good order, descriptor_heap and others, I really really hope we can get a WebGPU.next.

Where are we at with the "what's next for webgpu" post, from 5 quarters ago? https://news.ycombinator.com/item?id=42209272

  • This is my point of view as someone who learned WebGPU as a precursor to learning Vulkan, and who is definitely not a graphics programming expert:

    My personal experience with WebGPU wasn't the best. One of my dislikes was pipelines, which is something that other people also discuss in this comment thread. Pipeline state objects are awkward to use without an extension like dynamic rendering. You get a combinatorial explosion of pipelines and usually end up storing them in a hash map.

    In my opinion, pipelines state objects are a leaky abstraction that exposes the way that GPUs work: namely that some state changes may require some GPUs to recompile the shader, so all of the state should be bundled together. In my opinion, an API for the web should be concerned with abstractions from the point of view of the programmer designing the application: which state logically acts as a single unit, and which state may change frequently?

    It seems that many modern APIs have gone with the pipeline abstraction; for example, SDL_GPU also has pipelines. I'm still not sure what the "best practices" are supposed to be for modern graphics programming regarding how to structure your program around pipelines.

    I also wish that WebGPU had push constants, so that I do not have to use a bind group for certain data such as transformation matrices.

    Because WebGPU is design-by-committee and must support the lowest common denominator hardware, I'm worried whether it will evolve too slowly to reflect whatever the best practices are in "modern" Vulkan. I hope that WebGPU could be a cross-platform API similar to Vulkan, but less verbose. However, it seems to me that by using WebGPU instead of Vulkan, you currently lose out on a lot of features. Since I'm still a beginner, I could have misconceptions that I hope other people will correct.

  • As always, the only two positive things about WebGL and WebGPU, are being available on browsers, and having been designed for managed languages.

    They lag behind modern hardware, and after almost 15 years, there are zero developer tools to debug from browser vendors, other than the oldie SpectorJS that hardly counts.

    • This is kind of ridiculous take

      You can use wgpu or dawn in a native app and use native tools for GPU debugging if that's what you want

      You can then take that and also run it in the browser, and, you can debug the browser in the same tools. Google it for instructions

      The positive things about WebGPU is it's actually portable, unlike Vulkan. And, it's easy to use, unlike Vulkan.

      1 reply →

  • WebGPU is kinda meh, a 2010s graphic programmers vision of a modern API. It follows Vulkan 1.0, and while Vulkan is finally getting rid of most of the mess like pipelines, WebGPU went all in. It's surprisingly cumbersome to bind stuff to shaders, and everything is static and has to be hashed&cached, which sucks for streaming/LOD systems. Nowadays you can easily pass arbitrary amounts of buffers and entire scene descriptions via GPU memory pointers to OpenGL, Vulkan, CUDA, etc. with BDA and change them dynamically each frame. But not in WebGPU which does not support BDA und is unlikely to support it anytime soon.

    It's also disappointing that OpenGL 4.6, released in 2017, is a decade ahead of WebGPU.

    • WebGPU has the problem of needing to handle the lowest common denominator (so GLES 3 if not GLES 2 because of low end mobile), and also needing to deal with Apple's refusal to do anything with even a hint of Khronos (hence why no SPIR-V even though literally everything else including DirectX has adopted it)

      Web graphics have never and will never be cutting edge, they can't as they have to sit on top of browsers that have to already have those features available to it. It can only ever build on top of something lower level. That's not inherently bad, not everything needs cutting edge, but "it's outdated" is also just inherently going to be always true.

      6 replies →

  • I think in the end it all depends on Android. Average Vulkan driver quality on Android doesn't seem to be great in the first place, getting uptodate Vulkan API support, and in high quality and high enough performance for a modernized WebGPU version to build on might be too much to ask of the Android ecosystem for the next one or two decades.

  • I try my best to push ML things into WebGPU and I think it has a future, but performance is not there yet. I have little experience with Vulkan except toy projects, but WebGPU and Vulkan seem very similar

  • WebGPU is kinda meh. It's when you need to do do something on browser that you can't with WebGL. GLES is the compatibility king and runs pretty much everywhere, if not natively then through a compatibility layer like ANGLE. I'm sad that WebGPU killed WebGL 3 which was supposed to add compute shaders. Maybe WebGPU would've been more interesting if it wasn't made to replace WebGL but instead be a non-compatibility API targetting modern rendering and actually supporting Spir-V.

Uuugh, graphics. So many smart people expending great energy to look busy while doing nothing particularly profound.

Graphics people, here is what you need to do.

1) Figure out a machine abstraction.

2) Figure out an abstraction for how these machines communicate with each other and the cpu on a shared memory bus.

3) Write a binary spec for code for this abstract machine.

4) Compilers target this abstract machine.

5) Programs submit code to driver for AoT compilation, and cache results.

6) Driver has some linker and dynamic module loading/unloading capability.

7) Signal the driver to start that code.

AMD64, ARM, and RISC-V are all basically differing binary specs for a C-machine+MMU+MMIO compute abstraction.

Figure out your machine abstraction and let us normies write code that’s accelerated without having to throw the baby out with the bathwater ever few years.

Oh yes, give us timing information so we can adapt workload as necessary to achieve soft real-time scheduling on hardware with differing performance.

  • They have done it. The current modern abstraction is called Vulkan, and the binary spec code for this machine is called SPIR-V.

  • I don’t know which of my detractors to respond to, so I’ll respond here.

    It should be clear that I’m only interested in compute and not a GPU expert.

    GPUs, from my understanding, have lost the majority of fixed-function units as they’ve become more programmable. Furthermore, GPUs clearly have a hidden scheduler and this is not fully exposed by vendors. In other words we have no control over what is being run on a GPU at any given instant, we simply queue work for it.

    Given all these contrivances, why should not the interface exposed to the user be absolutely simple. It should then be up to vendors to produce hardware (and co-designed compilers) to run our software as fast as possible.

    Graphics developers need to develop a narrow-waist abstraction for wide, latency-hiding, SIMD compute. On top of this Vulkan, or OpenGL, or ML inference, or whatever can be done. The memory space should also be fully unified.

    This is what needs to be worked on. If you don’t agree, that’s fine, but don’t pretend that you’re not protecting entrenched interests from the likes of Microsoft, Nvidia, Epic Games, Valve and others.

    Telling people to just use Unreal engine, or Unity, or even Godot, it just like telling people to just use Python, or Typescript, or Go to get their sequential compute done.

    Expose the compute!

    • > GPUs, from my understanding, have lost the majority of fixed-function units as they’ve become more programmable.

      That would be nice but doesn't match reality unfortunately, there are even new fixed-fuction units added from time to time (e.g. for raytracing).

      Texture sampling units also seem to be critical for performance and probably won't go away for a while.

      It should be possible to hide a lot of the fixed-function magic behind high level GPU instructions (e.g. for sampling a texture), but GPU vendors still don't agree about details like how the texture and sampler properties are managed on the GPU (see: https://www.gfxstrand.net/faith/blog/2022/08/descriptors-are...).

      E.g. the problem isn't in the software, but the differing hardware designs, and GPU vendors don't seem to like the idea of harmonizing their GPU architectures and they're also not a fan of creating a common ISA as compatibility shim (e.g. how it is common for CPUs). Instead the 3D API, driver and highlevel shader bytecode (e.g. SPIRV) is this common interface, and that's how we landed at the current situation with all its downsides (most of the reasons are probably not even technical, but legal/strategic - patents and stuff).

  • Wow, you should get NVIDIA, AMD and Intel on the phone ASAP! Really strange that they didn't come up with such a simple and straightforward idea in the last 3 decades ;)

  • some of this is what's khronos standards are theoretically supposed to achieve.

    surprise, it's very difficult to do across many hw vendors and classes of devices. it's not a coincidence that metal is much easier to program for.

    maybe consider joining khronos since you apparently know exactly how to achieve this very simple goal...

    • > it's not a coincidence that metal is much easier to program for

      Tbf, Metal also works on non-Apple GPUs and with only minimal additional hints to manage resources in non-unified memory.