← Back to context

Comment by ajb

1 year ago

Wait, pcap as in wireshark packet capture?

I'm not the commenter, but yes, often trading firms record all order gateway traffic to from brokers or exchanges at the TCP/IP packet level, in what are referred to as "pcap files". Awkwardly low-level to work with, but it means you know for sure what you sent, not what your software thought it was sending!

  • The ultimate source of truth about what orders you sent to the exchange is the exact set of bits sent to the exchange. This is very important because your software can have bugs (and so can theirs), so using the packet captures from that wire directly is the only real way to know what really happened.

  • That makes sense - but it's still somewhat surprising that there's nothing better. I guess that's the equivalent of the modern paper trail.

    • It’s the closest to truth you can find (the network capture, not the drop copy). If it wasn’t on the network outbound, you didn’t send it, and it’s pretty damn close to an immutable record.

      1 reply →

  • The storage requirements of this must be impressive

    • Storage is cheap, and the overall figures are not that outlandish. If we look at a suitable first page search result[0], and round figures up we get to about 700 GB per day.

      And how did I get that figure?

      I'm going to fold pcap overhead into the per-message size estimate. Let's assume a trading day at an exchange, including after hours activity, is 14 hours. (~50k seconds) If we estimate that during the highest peaks of trading activity the exchange receives about 2M messages per second, then during more serene hours the average could be about 500k messages per second. Let's guess that the average rate applies 95% of the time and the peak rate the remaining 5% of the time. That gives us an average rate of about 575k messages per second. Round that up to 600k.

      If we assume that an average FIX message is about 200 bytes of data, and add 50 bytes of IP + pcap framing overhead, we get to ~250 bytes of transmitted data per message. At 600k messages per second, 14 hours a day, the total amount of trading data received by an exchange would then be slightly less than 710GB per day.

      Before compression for longer-term storage. Whether you consider the aggregate storage requirements impressive or merely slightly inconvenient is a more personal matter.

      0: https://robertwray.co.uk/blog/the-anatomy-of-a-fix-message

      1 reply →

    • Why? They're not streaming 4k video, it's either text protocol or efficient binary protocols.

Typically not a literal pcap. Not just wireshsrk running persistently everywhere.

There are systems you can buy (eg by Pico) that you mirror all traffic to and they store it, index it, and have pre-configured parsers for a lot of protocols to make querying easier.

Think Splunk/ELK for network traffic by packet.

  • Except it is literal “pcap” as they capture all packets at layer 3. I don’t know the exact specifications of Pico appliances, but it would not surprise me they’re running Linux + libpcap + some sort of timeseries DB

    • Well, probably, but I meant more like it's not typically someone running tcpdump everywhere and someone analyzing with Wireshark, rather than a systems configured to do this at scale across the desktop.

      1 reply →

  • The Corvil devices used by Pico have IME largely been replaced by Arista 7130 Metamux platforms at the capture “edge”

Looks like tnlnbn already answered, but the other benefit to having a raw network capture is often this is performed on devices (pico and exablaze just to name two) that provide very precise timestamping on a packet by packet basis, typically as some additional bytes prepended to the header.

Most modern trading systems performing competitive high frequency or event trades have performance thesholds in the tens of nanos, and the only place to land at that sort of precision is running analysis on a stable hardware clock.

I suppose Pre-Calculated Aggregated Positions, but I am not an expert in the field.