← Back to context

Comment by hbogert

6 hours ago

It stands out, because it didn't sell. Which is weird because there were some pretty big pros about using them. The latency for updating 1 byte was crazy good. Some databases or journals for something like zfs really benefited from this.

Intel did a spectacularly poor job with the ecosystem around the memory cells. They made two plays, and both were flops.

1. “Optane” in DIMM form factor. This targeted (I think) two markets. First, use as slower but cheaper and higher density volatile RAM. There was actual demand — various caching workloads, for example, wanted hundreds of GB or even multiple TB in one server, and Optane was a route to get there. But the machines and DIMMs never really became available. Then there was the idea of using Optane DIMMs as persistent storage. This was always tricky because the DDR interface wasn’t meant for this, and Intel also seems to have a lot of legacy tech in the way (their caching system and memory controller) and, for whatever reason, they seem to be barely capable of improving their own technology. They had multiple serious false starts in the space (a power-supply-early-warning scheme using NMI or MCE to idle the system, a horrible platform-specific register to poke to ask the memory controller to kindly flush itself, and the stillborn PCOMMIT instruction).

2. Very nice NVMe devices. I think this was more of a failure of marketing. If they had marketed a line of SSDs that, coupled with an appropriate filesystem, could give 99% fsync latency of 5 microseconds and they had marketed this, I bet people would have paid. But they did nothing of the sort — instead they just threw around the term “Optane” inconsistently.

These days one could build a PCM-backed CXL-connected memory mapped drive, and the performance might be awesome. Heck, I bet it wouldn’t be too hard to get a GPU to stream weights directly off such a device at NVLink-like speeds. Maybe Intel should try it.

  • One of the many problems was trying to limit the use of Optane to Intel devices. They should have manufactured and sold Optane memory and let other players build on top of it at a low level.

    • > Optane memory

      Which “Optane memory”? The NVMe product always worked on non-Intel. The NVDIMM products that I played with only ever worked on a very small set of rather specialized Intel platforms. I bet AMD could have supported them about as easily as Intel, and Intel barely ever managed to support them.

      4 replies →

>Which is weird....

It isn't weird at all. I would be surprised if it ever succeed in the first place.

Cost was way too high. Intel not sharing the tech with others other than Micron. Micron wasn't committed to it either, and since unused capacity at the Fab was paid by Intel regardless they dont care. No long term solution or strategy to bring cost down. Neither Intel or Micron have a vision on this. No one wanted another Intel only tech lock in. And despite the high price, it barely made any profits per unit compared to NAND and DRAM which was at the time making historic high profits. Once the NAND and DRAM cycle went down again cost / performance on Optane wasn't as attractive. Samsung even made some form of SLC NAND that performs similar to Optane but cheaper, and even they end up stopped developing for it due to lack of interest.

  • A ways back, I wrote a sort of database that was memory-mapped-file backed (a mistake, but I didn’t know that at the time), and I would have paid top dollar for even a few GB of NVDIMMs that could be put in an ordinary server and could be somewhat straightforwardly mounted as a DAX filesystem. I even tried to do some of the kernel work. But the hardware and firmware was such a mess that it was basically a lost cause. And none of the tech ever seemed to turn into an actual purchasable product. I’m a bit suspicious that Intel never found product-market fit in part because they never had a credible product on the NVDIMM side.

    Somewhere I still have some actual battery-backed DIMMs (DRAM plus FPGA interposer plus awkward little supercapacitor bundle) in a drawer. They were not made by Intel, but Intel was clearly using them as a stepping stone toward the broader NVDIMM ecosystem. They worked on exactly one SuperMicro board, kind of, and not at all if you booted using UEFI. Rebooting without doing the magic handshake over SMBUS [0] first took something like 15 minutes, which was not good for those nines of availability.

    [0] You can find my SMBUS host driver for exactly this purpose on the LKML archives. It was never merged, in part, because no one could ever get all the teams involved in the Xeon memory controller to reach any sort of agreement as to who owned the bus or how the OS was supposed to communicate without, say, defeating platform thermal management or causing the refresh interval to get out of sync with the DIMM temperature, thus causing corruption.

    I’m suspicious that everything involved in Optane development was like this.

  • I worked at Micron in the SSD division when Optane (originally called crosspoint “Xpoint”) was being made. In my mind, there was never a real serious push to productize it. But it’s not clear to me whether that was due to unattractive terms of the joint venture or lack of clear product fit.

    There was certainly a time when it seemed they were shopping for engineers opinions of what to do with it, but I think they quickly determined it would be a much smaller market anyway from ssds and didn’t end up pushing on it too hard. I could be wrong though, it’s a big company and my corner was manufacturing and not product development.

    • I worked at Intel for a while and might be able to explain this.

      There were/are often projects that come down from management that nobody thinks are worth pursuing. When i say nobody, it might not just be engineers but even say 1 or 2 people in management who just do a shit roll out. There are a lot of layers of Intel and if even one layer in the Intel Sandwich drag their feet it can kill an entire project. I saw it happen a few times in my time there. That one specific node that intel dropped the ball on kind of came back to 2-3 people in one specific department, as an example.

      Optane was a minute before I got there, but having been excited about it at the time and somewhat following it, that's the vibe I get from Optane. It had a lot of potential but someone screwed it up and it killed the momentum.

      5 replies →

    • A friend was working at Micron on a rackmount network server with a lot of flash memory, I didn't ask at the time what kind of flash it used. The project was cancelled when nearly finished.

  • Cost was fantastically cheap, if you take into account that Optane is going to live >>10x longer than a SSD.

    For a lot of bulk storage, yes, you don't have frequently changing data. But for databases or caches, that are under heavy load, optane was not only far faster, but if looking at life-cycle costs, way way less.

    • Optane was in the market during a time when the mainstream trend in the SSD industry was all about sacrificing endurance to get higher capacity. It's been several years, and I'm not seeing a lot of regrets from folks who moved to TLC and QLC NAND, and those products are more popular than ever.

      The niche that could actually make use of Optane's endurance was small and shrinking, and Intel had no roadmap to significantly improve Optane's $/GB which was unquestionably the technology's biggest weakness.

    • Write endurance of the drive would be measured in TBW, and TLC flash kept adding enough 3D layers to stay cheap enough, quickly enough, that Optane never really beat their pricing per TBW to make a practical product.

      I have to wonder if it isn't usable for some kind of specialized AI workflow that would benefit from extremely low latency reads but which is isn't written often, at this point. Perhaps integrated in a GPU board.

      2 replies →

    • So instead of replacing every 5 years you replace every 5 years because if you need that level of performance you're replacing servers every 5 years anyway

It feels like everyone figured out what to do with them and how just about when they stopped making them.

  • Same for the Larabee / Knights architecture. Would sure be fun to play around with a 500 core Knights CPU with a couple TB of optane for LLM inference.

    Intel's got an amazing record of axing projects as soon as they've done the hard work of building an ecosystem.

I never understood what they're meant to do. Intel seemed to picture some future where RAM is persistent; but they were never close to fast enough to replace RAM, and the option to reboot in order to fix some weird state your system has gotten itself into is a feature of computers, not a problem to work around.

In "databases and journals" you rarely update just one byte, you do a transaction that updates data, several indexes and metadata. All of that needs to be atomic.

Power failure can happen in between any of "1 byte updates with crazy latencies." However small latency is, power failure is still faster. Usually, there is a write ahead or some other log that alleviates the problem, this log is usually written in streaming fashion.

What is good, though, is that "blast radius" [1] of failure is smaller than usual - failed one byte write rarely corrupts more that one byte or cache line. SQLite has to deal with 512 (and even more) bytes long possible corruptions on most disks, with Optane it is not necessarily so. So, less data to copy, scan, etc.

[1] https://sqlite.org/psow.html

  • It's not. You won't be writing one byte, ever (even if you had layers that actually supported less-than-block writes), because the overhead of instruction would be massive and you'd be murdering both latency and bandwidth for anything non-trivial

When most people are running databases on AWS RDS, or on ridiculous EBS drives with insanely low throughput and latency, it makes sense to me.

There are very few applications that benefit from such low latency, and if one has to go off the standard path of easy, but slow and expensive and automatically backup up, people will pick the ease.

Having the best technology performance is not enough to have product market fit. The execution required from the side of executives at Intel is far far beyond their capability. They developed a platform and wanted others to do the work of building all the applications. Without that starting killer app, there's not enough adoption to build an ecosystem.

  • > There are very few applications that benefit from such low latency

    Basically any RDBMS? MySQL and Postgres both benefit from high performance storage, but too many customers have moved into the cloud where you can’t get NVMe-like performance for durable storage for anything remotely close to a worthwhile price.

    • I'm saying that there are very few downstream applications that use databases that benefit from reducing latency beyond the slow performance of the cloud. Running your database on VMs or baremetal gives better performance, but almost no applications built on databases bother to do it.

IMO, the reason they didn't sell is the ideal usage for them is pairing them with some slow spinning disks. The issue Optane had is that SSD capacity grew dramatically while the price plummeted. The difference between Optane and SSDs was too small. Especially since the M.2 standard proliferated and SSDs took advantage of PCI-E performance.

I believe Optane retained a performance advantage (and I think even today it's still faster than the best SSDs) but SSDs remain good enough and fast enough while being a lot cheaper.

The ideal usage of optane was as a ZIL in ZFS.

  • That may have been the ideal usage back in the day, but ideal usage now is just for setting up swap. Write-heavy workloads are king with Optane, and threshing to swap is the prototypical example of something that's so write-heavy it's a terrible fit for NAND. Optane might not have been "as fast as DRAM" but it was plenty close enough to be fit for purpose.

    • That would be fine if I could put it in an M.2 slot. But all my computers already have RAM in their RAM slots, and even if I had a spare RAM slot, I don't know that I'd trust the software stack to treat one RAM slot as a drive...

      And their whole deal was making RAM persistent anyway, which isn't exactly what I want.

      4 replies →

  • > The ideal usage of optane was as a ZIL in ZFS.

    It was also the best boot drive money could buy. Still is, I think, though other comments in the thread ask how it compares against today's best, which I'd also love to see.

    • This concept was very popular back in the days when computers used to boot from HDD, but now it doesn't make much sense. I wouldn't notice If my laptop boots for 5 sec instead of 10.

      1 reply →

  • Not just capacity but SSD speeds also improved to the point it was good enough for many high memory workloads.

Optane didn't sell because they focused on their weird persistent DIMM sticks, which are a nightmare for enterprise where for many ordinary purposes you want ephemeral data that disappears as soon as you cut power. Thet should have focused on making ordinary storage and solving the interconnect bandwidth and latency problems differently, such as with more up-to-date PCIe standards.

  • PCIe was a bottleneck in consumer boxes, but that wasn't the whole problem. Optane's low latency and write endurance looked great on paper, yet once you put it behind SSD controllers and file systems built around NAND assumptions, a lot of the upside got shaved off before users ever saw it.

    "Just make it a faster SSD" was never a business. The DIMMs were weird, sure, but the bigger issue was that Optane made the most sense when software treated storage and memory as one tier, and almost nobody was going to rewrite kernels, DBs, and apps for a product that cost more than flash and solved pain most buyers barely felt.

    • > and file systems built around NAND assumptions, a lot of the upside got shaved off before users ever saw it.

      What file systems ? Most common one you'd find would be ext4 or XFS and neither of them are

  • I don't think that would be my main complaint. Sticking optane in a dimm was just awkward as hell. You now have different bits of memory with very different characteristics, & you lose a ton of bandwidth.

    If CXL was around at the time it would have been such a nice fit, allowing for much lower latency access.

    It also seems like in spite of the bad fit, there were enough regular options drives, and they were indeed pretty incredible. Good endurance, reasonable price (and cheap as dirt if you consider that endurance/lifecycle cost!), some just fantastic performance figures. My conclusion is that alas there just aren't many people in the world who are serious about storage performance.

Optane was a victim of its own hype, such as “entirely new physics”, or “as fast as RAM, but persistent”. The reality felt like a failure afterwards even though it was still revolutionary, objectively speaking.