Comment by xyzzy123

25 days ago

Not a pro data guy but someone running something like what you're talking about for many years. These days 200TiB is "normal storage server" territory, not anything exotic. You can just do the most boring thing and it will be fine. I'm just running 1, tho. The hard parts are having it be efficient, quiet and cheap which always feels like an impossible triangle.

Yeah, resilvers will take 24h if your pool is getting full but with RAIDZ2 it's not that scary.

I'm running TrueNAS scale. I used to just use Ubuntu (more flexible!) but over many years I had a some bad upgrades where kernel & zfs stopped being friends. My rack is pretty nearby so for me, a big 4U case with 120mm front fans was high priority, it has a good noise profile if you replace with Noctuas, you get a constant "whoosh" rather than a whine etc.

Running 8+2 with 24tb drives. I used to run with 20 slots full of old ex-cloud SAS drives but it's more heat / noise / power intensive. Also, you lose flexibility if you don't have free slots. So eventually ponied up for 24tb disks. It hurt my wallet but greatly reduced noise and power.

  Case: RM43-320-RS 4U

  CPU: Intel Xeon E3-1231 v3 @ 3.40GHz (4C/8T, 22nm, 80W TDP)
  RAM: 32GB DDR3 ECC
  Motherboard: Supermicro X10SL7-F (microATX, LGA1150 socket)
    - Onboard: Dual Intel I210 1GbE (unused)
    - Onboard: LSI SAS2308 8-port SAS2 controller (6Gbps, IT mode)
    - Onboard: Intel C220 chipset 6-port SATA controller

  Storage Controllers:
    - LSI SAS2308 (onboard) → Intel RES2SV240 backplane (SFF-8087 cables)
    - Intel C220 SATA (onboard) → boot SSD

  Backplane:
    - Intel RES2SV240 24-bay 2U/3U SAS2 Expander
    - 20× 3.5" hot-swap bays (10 populated, 10 empty)
    - Connects via Mini SAS HD SFF-8643 to Mini SAS SFF-8087 Cable, 0.8M x 5

  Boot/Cache:
    - Intel 120GB SSD SSDSC2CW120A3 (boot drive, SATA)
    - Intel Optane 280GB SSDPED1D280GA (ZFS SLOG device, NVMe)

  Network:
    - Intel 82599ES dual-port 10GbE SFP+ NIC (PCIe x8 add-in card)

It's a super old box but it does fine and will max 10Gbe for sequential and do 10k write iops / 1k random read iops without problems. Not great, not terrible. You don't really need the SLOG unless you plan to run VMs or databases off it.

I personally try to run with no more than 10 slots out of 20 used. This gives a bit of flexibility for expanding, auxiliary pools, etc etc. Often you find you need twice as much storage as you're planning on directly using. For upgrades, snapshots, transfers, ad-hoc stuff etc.

Re: dedup, I would personally look to dedup at the application layer rather than in the filesystem if I possibly could? If you are running custom archiving software then it's something you'd want to handle in the scope of that. Depends on the data obviously, but it's going to be more predictable, and you understand your data the best. I don't have zfs de-dup turned on but for a 200TiB pool with 128k blocks, the zfs DDT will want like 500GiB ram. Which is NOT cheap in 2026.

I also run a 7-node ceph cluster "for funsies". I love the flexibility of it... but I don't think ceph truly makes sense until you have multiple racks or you have hard 24/7 requirements.

1 comment

xyzzy123

arjie 24 days ago

Very cool. Okay, I think you're right. Doing dedupe at the application layer is a much better idea. I do have 512 GiB of DDR5 (it's an Epyc 9755-based server) but I think you're right because I am fully aware of the data I'm storing (internet archive data) so I can simply delta-code on a per webpage sense.

Right, I knew from /r/homelab that many normal people now store petabytes in their nodes. My specific machine is going to be in a DC located some 1 hr from me so I don't mind noise, but I am particular about power consumption and so on.

Based on what you said I'm going to run RAIDZ2 on this. I happen to have a bunch of EXOS 18 TiB drives so I shall use those. Thank you for the advice from experience!