Comment by nicebyte

14 days ago

> I want a single-line malloc with zero care about usage flags and which only produces one single pointer value

That's not realistic on non-UMA systems. I doubt you want to go over PCIe every time you sample a texture, so the allocator has to know what you're allocating memory _for_. Even with CUDA you have to do that.

And even with unified memory, only the implementation knows exactly how much space is needed for a texture with a given format and configuration (e.g. due to different alignment requirements and such). "just" malloc-ing gpu memory sounds nice and would be nice, but given many vendors and many devices the complexity becomes irreducible. If your only use case is compute on nvidia chips, you shouldn't be using vulkan in the first place.

11 comments

nicebyte

m-schuetz 14 days ago

> Even with CUDA you have to do that.

No you don't, cuMemAlloc(&ptr, size) will just give you device memory, and cuMemAllocHost will give you pinned host memory. The usage flags are entirely pointless. Why would UMA be necessary for this? There is a clear separation between device and host memory. And of course you'd use device memory for the texture data. Not sure why you're constructing a case where I'd fetch them from host over PCI, that's absurd.

> only the implementation knows exactly how much space is needed for a texture with a given format and configuration

OpenGL handles this trivially, and there is also no reason for a device malloc to not also work trivially with that. Let me create a texture handle, and give me a function that queries the size that I can feed to malloc. That's it. No heap types, no usage flags. You're making things more complicated than they need to be.

nice_byte 14 days ago
> No you don't, cuMemAlloc(&ptr, size) will just give you device memory, and cuMemAllocHost will give you pinned host memory.
that's exactly what i said. You have to explicitly allocate one or the other type of memory. I.e. you have to think about what you need this memory _for_. It's literally just usage flags with extra steps.
> Why would UMA be necessary for this?
UMA is necessary if you want to be able to "just allocate some memory without caring about usage flags". Which is something you're not doing with CUDA.
> OpenGL handles this trivially,
OpenGL also doesn't allow you to explicitly manage memory. But you were asking for an explicit malloc. So which one do you want, "just make me a texture" or "just give me a chunk of memory"?
> Let me create a texture handle, and give me a function that queries the size that I can feed to malloc. That's it. No heap types, no usage flags.
Sure, that's what VMA gives you (modulo usage flags, which as we had established you can't get rid of). Excerpt from some code:
``` VmaAllocationCreateInfo vma_alloc_info = { .usage = VMA_MEMORY_USAGE_GPU_ONLY, .requiredFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT};
VkImage img; VmaAllocation allocn; const VkResult create_alloc_vkerr = vmaCreateImage( vma_allocator, &vk_image_info, // <-- populated earlier with format, dimensions, etc. &vma_alloc_info, &img, &allocn, NULL); ```
Since i dont care about reslurce aliasing, that's the extent of "memory management" that i do in my rhi. The last time i had to think about different heap types or how to bind memory was approximately never.
- m-schuetz 14 days ago
  
  No, it's not usage flags with extra steps, it's less steps. It's explicitly saying you want device memory without any kind of magical guesswork of what your numerous potential combinations of usage flags may end up giving you. Just one simple device malloc.
  Likewise, your claim about UMA makes zero sense. Device malloc gets you a pointer or handle to device memory, UMA has zero relation to that. The result can be unified, but there is no need for it to be.
  Yeah, OpenGL does not do malloc. I'm flexible, I don't necessarily need malloc. What I want is a trivial way to allocate device memory, and Vulkan and VMA don't do that. OpenGL is also not the best example since it also uses usage flags in some cases, it's just a little less terrible than Vulkan when it comes to texture memory.
  I find it fascinating how you're giving a bad VMA example and passing that of as exemplary. Like, why is there gpu-only and device-local. That vma alloc info as a whole is completely pointless because a theoretical vkMalloc should always give me device memory. I'm not going to allocate host memory for my 3d models.
  
  8 replies →