← Back to context

Comment by grw_

3 hours ago

I actually didn't know there was more to InfiniBand than verbs (at least at this abstraction level, above PHY), so probably the answer is 'not much more'. The device imitates a RoCE V2 device and the higher level abstractions I used on top were GPU-ish libraries like NCCL and JACCL.

Good q about 'bridging into actual InfiniBand', I don't know the answer there either. My naive understanding would be that: since this is host-initiated RDMA (it's still the host cpu invoking into dma buffers, though they may be device-memory mapped), actually it should work fine, at least between two machines? I'm curious enough to try- I have a couple of machines with thunderbolt AND RoCE-capable NICs- the experiment is to see if we can use this across diverse transports simultaneously? I think this is what it does already (since the MacOS FA57 vs linux native are already 'different transports'), but say if you have a better scenario to demonstrate what 'bridging into actual infiniband' would look like!

InfiniBand is its entire own networking standard, if you have Mellanox NICs you can switch them into IB mode and... short version, it's not Ethernet anymore. It's not even the same speeds/baud rates (e.g. there is a FDR rate at 14.0625Gbaud.) (NB: InfiniBand is indeed not RoCE, that E is Ethernet. InfiniBand had RDMA way before RoCE became a thing; probably why its APIs are being used for it.)

It sounds like you're really just doing the IB verbs (which is kinda really RDMA verbs). I don't think any kind of "bridging" (other than IP routing) is really possible (you'd need a chip that understands both TB and IB and can somehow translate RDMA requests between the two.)