Comment by cong-or

4 hours ago

What’s the latency on a hostcall? A PCIe round-trip for something like File::open is fine—that’s slow I/O anyway. But if a println! from GPU code blocks on a host round-trip every time, that completely changes how you’d use it for debugging.

Is there device-side buffering, or does each write actually wait for the host?