Contrary to the "highlights" section (which seems to be the only place calling it a "standard" 19-core optical fiber), this is not in fact a 'standard' fiber, rather the origin seems to be the standard (125µm) diameter ("Sumitomo Electric was responsible for the design and manufacture of a coupled 19-core optical fiber with a standard cladding diameter (see Figure 1)"). Looks like the "diameter" simply got lost for the highlights section.
(Nonetheless impressive, and multi-core fiber seems to be maturing as technology.)
The NANOG has had a regular presentation by Richard Steenbergen called "Everything You Always Wanted to Know About Optical Networking – But Were Afraid to Ask"; last year's:
How come with a LAG group on ethernet, I can get "more total bandwidth", but any single TCP flow is limited to the max speed of one of the LAG Components (gigabit lets say), but then these guys are somehow combining multiple fibers into an overall faster stream? What gives? Even round robin mode on LAG groups doesn't do that.
What are they doing differently and why can't we do that?
I do not know exactly what is being done here, but I can say that I am aware of two techniques for sending bit streams over parallel wires while keeping the bits in order:
1. The lengths of all wires are meticulously matched so that signals arrive at the same time. Then the hardware simply assumes that the bits coming off each wire are in sequence by wire order. This is done in computers for high speed interfaces such as memory or graphics. If you have ever seen squiggly traces on a PCB going to a high speed device, they were done to get the lengths to be exactly the same so the signals arrive at the same time across each. This is how data transfers from dual channel DDR4 RAM where 64 bits are received simultaneously occur without reordering bits.
2. The lengths of wires are not matched and may be of different lengths up to some tolerance. Deskew buffers then are used to emulate matched lengths. In the case of twisted pair Ethernet, the wire pairs are not equal length because the twist rates vary to avoid interference from having the same twist rates. The result is the Ethernet PHY must implement a deskew buffer to compensate for the mismatched lengths and present the illusion of the wire lengths being matched. This is part of the Ethernet standard and likely applies to Ethernet over fiber too. The IEEE has a pdf talking about this on 800 Gb/sec Ethernet:
> What are they doing differently and why can't we do that?
You're (incorrectly) assuming they're doing Ethernet/IP in that test setup. They aren't (this is implied by the results section discussing various FEC, which is below even Ethernet framing), so it's just a petabit of raw layer 1 bandwidth.
It's also important to note that many optical links don't use ethernet as a protocol either (SDH/SONET are the common ones), although this is changing more and more.
You don't really want to, but if you configure all of the LAG participants on the path to do round-robin or similar balancing rather than hashing based on addresses, you can have data in one flow that exceeds an individual connection. You'll also be pretty likely to get out of order data, and tcp receivers will exercise their reassembly buffer, which will kill performance and you'll rapidly wish you hadn't done all that configuration work. If you do need more than one link's worth of throughput, you'll almost always do better by running multiple flows, but you may need still need to configure your network so it hashes in a way that you can get diverse paths between two hosts, defaults might not give you diversity even on different flows.
They're not combining anything, they're sending 19 copies of one signal down 19 strands (with some offsets so they interfere in awkward ways), applying some signal processing to correct the interference which they say makes it realistic, and declaring that they've calculated the total capacity of the medium.
What you do with it at the higher layers is entirely up to you.
But, ethernet could totally do that, by essentially demuxing the parts of an individual packet and sending them in parallel across a bunch of links, and remuxing them at the other end. I'm not aware of anyone having bothered to implement it.
As others have mentioned, this is mostly a proof of concept for a high core count weakly-coupled fibre from Sumitomo. I also want to highlight the use of a 19 channels MIMO receiver structure which is completely impractical. The linked article also fails to mention a figure for MIMO gain.
I would guesstimate that if you try to run it live, the receiver [or rather its DSPs] would consume >100W of power, maybe even >1000W. (These things evolve & improve though.)
(Also, a kilowatt for the receiver is entirely acceptable for a submarine cable.)
To get a ballpark power usage, we can look at comparable (for some definition thereof) commercial offerings. Take a public datasheet from Arista[1], they quote 16W typical for a 400Gbps module for 120km of reach. You would need 2500 modems at 16W (38kW) jointly decoding (i.e. very close together) to process this data rate. GPU compute has really pushed the boundaries on thermal management, but this would be far more thermally dense.
Interesting work, but 19 cores is very much not standard. Multiples of 12 cores are the gold standard in the telecommunications industry. Ribbon fibre is typically 12, sometimes 24 fibres per ribbon, and high count cables these days are 864 cores or more using a more flexible ribbon structure that improves density while still using standard tooling.
You're confusing multi-core in a single cladding with multiple strands of cladding. This is 19 cores in a single cladded 125µm (which is quite impressive manufacturing from Sumitomo).
I wasn't confusing anything. To interoperate with industry standard fibre optic cables it should have a multiple of 12 or 24 cores, not the complete oddball number of 19. Yes it's cool that it's that small, but that is not the limiting factor in the deployment of long haul fibre optic telecommunications networks.
Sumitomo sells a lot of fusion splicers at very high margins. It is in their best interest to introduce new types of fibre that requires customers to buy new and more expensive fusion splicers. Any fibre built in this way will need rotational alignment that the existing fusion splicers used in telecom do not do (they only align the cores horizontally, vertically and by the gap between the ends). Maybe they can build ribbon fibres that have the required alignment provided by the structure of the ribbon, but I think that is unlikely.
Given that it does not interoperate with any existing cables or splicers, the only place this kind of cable is likely to see deployment in the near term is in undersea cables where the cost of the glass is completely insignificant compared to everything that goes around it and the increased capacity is useful. Terrestrial telecom networks just aren't under the kind of pressure needed to justify the incompatibility with existing fibre optic cables. Data centers are another possibility when they can figure out how to produce the optics at a reasonable cost.
while fascinating I'm still waiting for that transformative move from electrical. Whichever optical route you're taking, at the beginning and at the end of it has to be an electrical conversion which hinders speed, consumes power and produces (sometimes tons of) heat. Wen optical switching?
Contrary to the "highlights" section (which seems to be the only place calling it a "standard" 19-core optical fiber), this is not in fact a 'standard' fiber, rather the origin seems to be the standard (125µm) diameter ("Sumitomo Electric was responsible for the design and manufacture of a coupled 19-core optical fiber with a standard cladding diameter (see Figure 1)"). Looks like the "diameter" simply got lost for the highlights section.
(Nonetheless impressive, and multi-core fiber seems to be maturing as technology.)
The NANOG has had a regular presentation by Richard Steenbergen called "Everything You Always Wanted to Know About Optical Networking – But Were Afraid to Ask"; last year's:
* https://www.youtube.com/watch?v=Y-MfLsnqluM
Alright, I have a dumb question...
How come with a LAG group on ethernet, I can get "more total bandwidth", but any single TCP flow is limited to the max speed of one of the LAG Components (gigabit lets say), but then these guys are somehow combining multiple fibers into an overall faster stream? What gives? Even round robin mode on LAG groups doesn't do that.
What are they doing differently and why can't we do that?
I do not know exactly what is being done here, but I can say that I am aware of two techniques for sending bit streams over parallel wires while keeping the bits in order:
1. The lengths of all wires are meticulously matched so that signals arrive at the same time. Then the hardware simply assumes that the bits coming off each wire are in sequence by wire order. This is done in computers for high speed interfaces such as memory or graphics. If you have ever seen squiggly traces on a PCB going to a high speed device, they were done to get the lengths to be exactly the same so the signals arrive at the same time across each. This is how data transfers from dual channel DDR4 RAM where 64 bits are received simultaneously occur without reordering bits.
2. The lengths of wires are not matched and may be of different lengths up to some tolerance. Deskew buffers then are used to emulate matched lengths. In the case of twisted pair Ethernet, the wire pairs are not equal length because the twist rates vary to avoid interference from having the same twist rates. The result is the Ethernet PHY must implement a deskew buffer to compensate for the mismatched lengths and present the illusion of the wire lengths being matched. This is part of the Ethernet standard and likely applies to Ethernet over fiber too. The IEEE has a pdf talking about this on 800 Gb/sec Ethernet:
https://www.ieee802.org/3/df/public/23_01/0130/ran_3df_03_23...
LAG was never intended to have the sequence in which data is sent be preserved, so no effort was made to enable that in the standard.
That said, you would get a better answer from an electrical engineer, especially one that builds networking components.
I just noticed a typo in this. I should have written 128 bits, not 64 bits. Data transfers in dual channel DDR4 are 128-bits at a time.
> What are they doing differently and why can't we do that?
You're (incorrectly) assuming they're doing Ethernet/IP in that test setup. They aren't (this is implied by the results section discussing various FEC, which is below even Ethernet framing), so it's just a petabit of raw layer 1 bandwidth.
It's also important to note that many optical links don't use ethernet as a protocol either (SDH/SONET are the common ones), although this is changing more and more.
10 replies →
You don't really want to, but if you configure all of the LAG participants on the path to do round-robin or similar balancing rather than hashing based on addresses, you can have data in one flow that exceeds an individual connection. You'll also be pretty likely to get out of order data, and tcp receivers will exercise their reassembly buffer, which will kill performance and you'll rapidly wish you hadn't done all that configuration work. If you do need more than one link's worth of throughput, you'll almost always do better by running multiple flows, but you may need still need to configure your network so it hashes in a way that you can get diverse paths between two hosts, defaults might not give you diversity even on different flows.
the data out of order is the key bit.
How do these guys get the data in order and we dont?
6 replies →
Because your switch is mapping a 4 tuple to a certain link and these people aren't, is my guess.
They're not combining anything, they're sending 19 copies of one signal down 19 strands (with some offsets so they interfere in awkward ways), applying some signal processing to correct the interference which they say makes it realistic, and declaring that they've calculated the total capacity of the medium.
What you do with it at the higher layers is entirely up to you.
But, ethernet could totally do that, by essentially demuxing the parts of an individual packet and sending them in parallel across a bunch of links, and remuxing them at the other end. I'm not aware of anyone having bothered to implement it.
I assume this is just a PHY-level test and no real switches or traffic was involved.
Flex Ethernet will allow for that but it's been a long time coming.
As others have mentioned, this is mostly a proof of concept for a high core count weakly-coupled fibre from Sumitomo. I also want to highlight the use of a 19 channels MIMO receiver structure which is completely impractical. The linked article also fails to mention a figure for MIMO gain.
Worse, it's offline MIMO processing! ;D
I would guesstimate that if you try to run it live, the receiver [or rather its DSPs] would consume >100W of power, maybe even >1000W. (These things evolve & improve though.)
(Also, a kilowatt for the receiver is entirely acceptable for a submarine cable.)
To get a ballpark power usage, we can look at comparable (for some definition thereof) commercial offerings. Take a public datasheet from Arista[1], they quote 16W typical for a 400Gbps module for 120km of reach. You would need 2500 modems at 16W (38kW) jointly decoding (i.e. very close together) to process this data rate. GPU compute has really pushed the boundaries on thermal management, but this would be far more thermally dense.
[1] https://www.arista.com/assets/data/pdf/Datasheets/400ZR_DCI_...
5 replies →
Interesting work, but 19 cores is very much not standard. Multiples of 12 cores are the gold standard in the telecommunications industry. Ribbon fibre is typically 12, sometimes 24 fibres per ribbon, and high count cables these days are 864 cores or more using a more flexible ribbon structure that improves density while still using standard tooling.
You're confusing multi-core in a single cladding with multiple strands of cladding. This is 19 cores in a single cladded 125µm (which is quite impressive manufacturing from Sumitomo).
I wasn't confusing anything. To interoperate with industry standard fibre optic cables it should have a multiple of 12 or 24 cores, not the complete oddball number of 19. Yes it's cool that it's that small, but that is not the limiting factor in the deployment of long haul fibre optic telecommunications networks.
Sumitomo sells a lot of fusion splicers at very high margins. It is in their best interest to introduce new types of fibre that requires customers to buy new and more expensive fusion splicers. Any fibre built in this way will need rotational alignment that the existing fusion splicers used in telecom do not do (they only align the cores horizontally, vertically and by the gap between the ends). Maybe they can build ribbon fibres that have the required alignment provided by the structure of the ribbon, but I think that is unlikely.
Given that it does not interoperate with any existing cables or splicers, the only place this kind of cable is likely to see deployment in the near term is in undersea cables where the cost of the glass is completely insignificant compared to everything that goes around it and the increased capacity is useful. Terrestrial telecom networks just aren't under the kind of pressure needed to justify the incompatibility with existing fibre optic cables. Data centers are another possibility when they can figure out how to produce the optics at a reasonable cost.
1 reply →
The actual figures are 1,808 km. For reference US is 2,800 miles (4,500 km) wide from east to west, and 1,650 miles (2,660 km) from north to south.
For us Americans, thats about 295,680 toilet paper rolls or 2,956 KDC (kilo donkey kicks).
Or about 3 MAG (mega Ariana Grandes). https://x.com/GatorsDaily/status/1504570772873904130
1 reply →
while fascinating I'm still waiting for that transformative move from electrical. Whichever optical route you're taking, at the beginning and at the end of it has to be an electrical conversion which hinders speed, consumes power and produces (sometimes tons of) heat. Wen optical switching?
There's been a ton of research on optical computing and it just isn't impressive.
yet
[dead]