← Back to context

Comment by shivanshvij

6 hours ago

We absolutely ran into these issues.

A couple notes that help quite a bit:

1. Always build the eBPF programs in a container - this is great for reproducibility of course, but also makes DevX on MacOS better for those who prefer to use that.

2. You actually can do a full checksum! You need to limit the MTU but you can:

  static __always_inline void tcp_checksum(const struct iphdr *ip_header, struct tcphdr *tcp_header, const __u16 tcp_len, const void *data_end) {
    __u32 sum = 0;
    __u16 *buf = (void *)tcp_header;
    ip_header_pseudo_checksum(ip_header, tcp_len, &sum);
    tcp_header->check = 0;
    __u16 max_packet_size = tcp_len;
    if (max_packet_size > MAX_TCP_PACKET_SIZE) {
        max_packet_size = MAX_TCP_PACKET_SIZE;
    }
    for (int i = 0; i < max_packet_size / 2; i++) {
        if ((void *)(buf + 1) > data_end) {
            break;
        }
        sum += *buf;
        buf++;
    }
    if ((void *)buf + 1 <= data_end && ((__u8 *)buf - (__u8 *)tcp_header) < max_packet_size) {
        sum += *(__u8 *)buf;
    }
    tcp_header->check = csum_fold_helper(sum);
  }

With that being said, it's not lost on me that XDP in general is something you should only reach for once you hit some sort of bottleneck. The original version of our network migration was actually implemented in userspace for this exact reason!

> You actually can do a full checksum

Indeed! This is what I had in mind when I wrote "cumbersome" :).

It's been a while for me to be able to recall whether the problem was the verifier or me, and things may have improved since, but I recall having the verifier choke on a static size limit too. Have you been able to use this trick successfully?

> Always build the eBPF programs in a container

That should work generally but watch out for any weirdness due to the fact that in a container you are already inside a couple of layers of networking (bridge, netns etc.).

Different kernels will be different levels of fussy about the bounded loop you're using there. Bounded loops are themselves a relatively recent feature.

Of course, checksum fixups in eBPF are idiomatically incremental.

How do containers help when bpf is mostly a matter of kernel version?

  • they don't its just the poster wanting people to do what they prefer

    • I figure it’s one way to keep your compiler version unchanged for eBPF work, while you might update/upgrade your dev OS packages over time for other reasons. The title of the linked issue is this:

      “Checksum code does not work for LLVM-14, but it works for LLVM-13”