Comment by pizlonator

10 days ago

Some programs run as fast as normally. That's admittedly not super common, but it happens.

Some programs have a ~4x slowdown. That's also not super common, but it happens.

Most programs are somewhere in the middle.

> for the use-cases where C/C++ are still popular

This is a myth. 99% of the C/C++ code you are using right now is not perf sensitive. It's written in C or C++ because:

- That's what it was originally written in and nobody bothered to write a better version in any other language.

- The code depends on a C/C++ library and there doesn't exist a high quality binding for that library in any other language, which forces the dev to write code in C/C++.

- C/C++ provides the best level of abstraction (memory and syscalls) for the use case.

Great examples are things like shells and text editors, where the syscalls you want to use are exposed at the highest level of fidelity in libc and if you wrote your code in any other language you'd be constrained by that language's library's limited (and perpetually outdated) view of those syscalls.

79 comments

pizlonator

johncolanduoni 10 days ago

While there are certainly other reasons C/C++ get used in new projects, I think 99% not being performance or footprint sensitive is way overstating it. There's tons of embedded use cases where a GC is not going to fly just from a code size perspective, let alone latency. That's mostly where I've often seen C (not C++) for new programs. Also, if Chrome gets 2x slower I'll finally switch back to Firefox. That's tens of millions of lines of performance-sensitive C++ right there.

That actually brings up another question: how would trying to run a JIT like V8 inside Fil-C go? I assume there would have to be some bypass/exit before jumping to generated code - would there need to be other adjustments?

pizlonator 10 days ago
> While there are certainly other reasons C/C++ get used in new projects, I think 99% not being performance or footprint sensitive is way overstating it.
Here’s my source. I’m porting Linux From Scratch to Fil-C
There is load bearing stuff in there that I’d never think of off the top of my head that I can assure you works just as well even with the Fil-C tax. Like I can’t tell the difference and don’t care that it is technically using more CPU and memory.
So then you’ve got to wonder, why aren’t those things written in JavaScript, or Python, or Java, or Haskell? And if you look inside you just see really complex syscall usage. Not for perf but for correctness. It code that would be zero fun to try to write in anything other than C or C++
- reorder9695 10 days ago
  
  I have no credentials here but I'd be interested in knowing what environmental impact things like this (like relatively high overhead things like filc, vms, containers) as opposed to running optimised well designed code. I don't mean in regular project's, but in things specifically like the linux kernel that's potentially millions? billions? of computers
- kragen 10 days ago
  
  I wonder if something like LuaJIT would be an option. Certainly Objective-C would work.
- johncolanduoni 9 days ago
  
  My source is that Google spent a bunch of engineer time to write, test, and tweak complicated outlining passes for LLVM to get broad 1% performance gains in C++ software, and everybody hailed it as a masterstroke when it shipped. Was that performance art? 1% of C++ developers drowning out the apparent 99% of ones that didn’t (or shouldn’t) care?
  I never said there was no place for taking a 2x performance hit for C or C++ code. I think Fil-C is a really interesting direction and definitely well executed. I just don’t see how you can claim that C++ code that can’t take a 2x performance hit is some bizarre, 1% edge case for C++.
pjmlp 10 days ago
Books like Zen of Assembly Programming exist, exactly because once upon a time, performance sensitive and C or C++ on the same sentence did not made any sense.
It is decades of backed optimisation work, some of which exploring UB based optimizations, that has made that urban myth possible.
As the .NET team discovered, and points out on each release since .NET 5 on lengthy blog posts able to kill most browsers buffers, if the team puts down as much work on the JIT and AOT compilers as the Visual C++ team, then performance looks quite different than what everyone else expects it naturally to be like.
- ngrilly 10 days ago
  
  You got me curious and I visited one of these .NET performance posts and indeed, it crashed my browser tab!
- johncolanduoni 10 days ago
  
  What is in theory possible in a language/runtime is often less important than historically contingent factors like which languages it’s easy to hire developers for that can achieve certain performance envelopes and which ecosystems have good tooling for micro-optimization.
  In JS for example, if you can write your code as a tight loop operating on ArrayBuffer views you can achieve near C performance. But that’s only if you know what optimizations JS engines are good at and have a mental model how processors respond to memory access patterns, which very few JS developers will have. It’s still valid to note that idiomatic JS code for an arbitrary CPU-bound task is usually at least tens of times slower than idiomatic C.
mike_hearn 10 days ago
Chrome is a bad example. It uses a tracing GC in its most performance sensitive parts explicitly to reduce the number of memory safety bugs (it's called Oilpan). And much of the rest is written in C++ simply because that's the language Chrome standardized on, they are comfortable relying on kernel sandboxes and IPC rather than switching to a more secure language.
- wffurr 10 days ago
  
  Chrome security is encouraging use of memory safe languages via the Rule of 2: https://chromium.googlesource.com/chromium/src/+/main/docs/s...
  IIRC Crubit C++/Rust Interop is from the chrome team: https://github.com/google/crubit
  
  2 replies →
- johncolanduoni 9 days ago
  
  The only thing I intimated about Chrome is that if it got 2x slower, many users would in fact care. I have no doubt that they very well might not write it in C++ if they started today (well, if they decided not to start with a fork of the WebKit HTML engine). I’m not sure what Oilpan has to do with anything I said - I suspect that it would do memory operations too opaque for Fil-C’s model and V8 certainly would but I don’t see how that makes it a bad example of performance-sensitive C++ software.
conradev 10 days ago
I feel like code size, toolchain availability and the universality of the C ABI are more good reasons for why code is written in C besides runtime performance. I’d be curious how much overhead Fil-C adds from a code size perspective, though!
- pizlonator 10 days ago
  
  Code size overhead is really bad right now, but I wouldn't read anything into that other than "Fil didn't optimize it yet".
  Reasons why it's stupidly bad:
  - So many missing compiler optimizations (obviously those will also improve perf too).
  - When the compiler emits metadata for functions and globals, like to support accurate GC and the stack traces you get on Fil-C panic, I use a totally naive representation using LLVM structs. Zero attempt to compress anything. I'm not doing any of the tricks that DWARF would do, for example.
  - In many cases it means that strings, like names of functions, appear twice (once for the purposes of the linker and a second time for the purposes of my metadata).
  - Lastly, an industrially optimized version of Fil-C would ditch ELF and just have a Fil-C-optimized linker format. That would obviate the need for a lot of the cruft I emit that allows me to sneakily make ELF into a memory safe linker. Then code size would go down by a ton
  I wish I had data handy on just how much I bloat code. My totally unscientific guess is like 5x
kragen 10 days ago
Latency is the killer, I think. A GC can be on the order of 100 instructions.
- pizlonator 10 days ago
  
  It’s a concurrent GC. Latency won’t kill you
  I’ll admit that if you are in the business of counting instructions then other things in Fil-C will kill you. Most of the overhead is from pointer chasing.
  See https://fil-c.org/invisicaps
  
  10 replies →
- rwmj 10 days ago
  
  In the fast case allocations can be vastly cheaper than malloc, usually just a pointer decrement and compare. You'll need to ensure that your fast path never has the need to collect the minor heap, which can be done if you're careful. I hate this comparison that is always done as if malloc/free are completely cost-free primitives.
  
  1 reply →
aseipp 10 days ago

Chrome is not a good counter example a priori. It is a project that has hundreds of engineers assigned to it, some of them world-class security engineers, so they can potentially accept the burden of hardening their code and handling security issues with a regular toolchain. They've may have even evaluated such solutions already.
I think an important issue is that for performative sensitive C++ stuff and related domains, it's somewhat all or nothing with a lot of these tools. Like, a CAD program is ideally highly performant, but I also don't want it to own my machine if I load a malicious file. I think that's the hardest thing and there isn't any easy lift-and-shift solution for that, I believe.
I think some C++ projects probably could actually accept a 2x slowdown, honestly. Like I'm not sure if LibrePCB taking 2x as long in cycles would really matter. Maybe it would.
pizlonator 10 days ago
Most C/C++ code for old or new programs runs on a desktop or server OS where you have lots of perf breathing room. That’s my experience. And that’s frankly your experience too, if you use Linux, Windows, or Apple’s OSes
> how would trying to run a JIT like V8 inside Fil-C go?
You’d get a Fil-C panic. Fil-C wouldn’t allow you to PROT_EXEC lol
- johncolanduoni 10 days ago
  
  Thanks for telling me what my experience is, but I can think of plenty of C/C++ code on my machine that would draw ire from ~all it's users if it got 2x slower. I already mentioned browsers but I would also be pretty miffed if any of these CPU-bound programs got 2x slower:
  * Compilers (including clang)
  * Most interpreters (Python, Ruby, etc.)
  * Any simulation-heavy video game (and some others)
  * VSCode (guess I should've stuck with Sublime)
  * Any scientific computing tools/libraries
  Sure, I probably won't notice if zsh or bash got 2x slower and cp will be IO bound anyway. But if someone made a magic clang pass that made most programs 2x faster they'd be hailed as a once-in-a-generation genius, not blown off with "who really cares about C/C++ performance anyway?". I'm not saying there's no place for trading these overheads for making C/C++ safer, but treating it as a niche use-case for C/C++ is ludicrous.
  
  6 replies →
- addaon 10 days ago
  
  > Most C/C++ code for old or new programs runs on a desktop or server OS where you have lots of perf breathing room. That’s my experience. And that’s frankly your experience too, if you use Linux, Windows, or Apple’s OSes
  What if I also use cars, and airplanes, and dishwashers, and garage doors, and dozens of other systems? At what point does most of the code I interact with /not/ have lots of breathing room? Or does the embedded code that makes the modern world run not count as "programs"?
  
  3 replies →

IshKebab 10 days ago

> Some programs have a ~4x slowdown

How does it compare to something like RLBox?

> This is a myth. 99% of the C/C++ code you are using right now is not perf sensitive.

I don't think that's true, or at best its a very contorted definition of "perf sensitive". Most code is performance sensitive in my opinion - even shitty code written in Python or Ruby. I would like it to be faster. Take Asciidoctor for example. Is that "performance sensitive"? Hell yes!

pizlonator 10 days ago
> How does it compare to something like RLBox?
I don’t know and it doesn’t matter because RLBox doesn’t make your C code memory safe. It only containerizes it.
Like, if you put a database in RLBox then a hacker could still use a memory safety bug to corrupt or exfiltrate sensitive data.
If you put a browser engine in RLBox then a hacker could still pwn your whole machine:
- If your engine has no other sandbox other than RLBox then they’d probably own your kernel by corrupting a buffer in memory that is being passed to a GPU driver (or something along those lines). RLBox will allow that corruption because the buffer is indeed in the program’s memory.
- If the engine has some other sandbox on top of RLbox then the bad guys will corrupt a buffer used for sending messages to brokers, so they can then pop those brokers. Just as they would without RLbox.
Fil-C prevents all of that because it uses a pointer capability model and enforces it rigorously.
So, RLbox could be infinity faster than Fil-C and I still wouldn’t care
- IshKebab 10 days ago
  
  That feels like a very binary view of security. There are certainly cases where something like RLBox takes you from "horrific anything-goes C security" to "probably fine". Image parsing for example, which is a common source of vulnerabilities.
  So the question of performance is still relevant, even if RLBox's security properties are less tight.
  
  4 replies →

otabdeveloper4 10 days ago

All code is perf-sensitive.

Also, literally every language claims "only a x2 slowdown compared to C".

We still end up coding in C++, because see the first point.

pizlonator 10 days ago
I’m not claiming only 2x slowdown. It’s 4x for some of the programs I’ve measured. 4x > 2x. I’m not here to exaggerate the perf of Fil-C. I actually think that figuring out the true perf cost is super interesting!
> All code is perf-sensitive.
That can’t possibly be true. Meta runs on PHP/Hack, which are ridiculously slow. Code running in your browser is JS, which is like 40x slower than Yolo-C++ and yet it’s fine. So many other examples of folks running code that is just hella slow, way slower than “4x slower than C”
- Sesse__ 10 days ago
  
  FWIW, I just tested it on a random program I wrote recently, and it went from 2.085 seconds with Clang+jemalloc to 18.465 seconds with Fil-C. (No errors were reported, thank goodness!) So that's a 9x new worst case for you :-) It's basically a STL performance torture test, though. TBH I'm impressed that Fil-C just worked on the first try for this.
  
  1 reply →
- otabdeveloper4 10 days ago
  
  All code is perf-sensitive. Not all code is important enough to be written as we'd like it to be.
  
  3 replies →
Dylan16807 10 days ago
> All code is perf-sensitive.
I'm doing some for loops in bash right now that could use 1000x more CPU cycles without me noticing.
Many programs use negligible cycles over their entire runtime. And even for programs that spend a lot of CPU and need tons of optimizations in certain spots, most of their code barely ever runs.
> Also, literally every language claims "only a x2 slowdown compared to C".
I've never seen anyone claim that a language like python (using the normal implementation) is generally within the same speed band as C.
The benchmark game is an extremely rough measure but you can pretty much split languages into two groups: 1x-5x slowdown versus C, and 50x-200x slowdown versus C. Plenty of popular languages are in each group.
- otabdeveloper4 10 days ago
  
  > I've never seen anyone claim that a language like python (using the normal implementation) is generally within the same speed band as C.
  Live long enough and you will. People claimed it about PyPy back in the day when it was still hype.
  
  1 reply →

hnlmorg 10 days ago

There’s loads of good shells and text editors written in other languages.

I’m the author of a shell written in Go and it’s more capable than Zsh.

kragen 10 days ago
Interesting, what's it called? How does it compare to Andy Chu's Oil?
- hnlmorg 10 days ago
  
  It’s called Murex: https://github.com/lmorg/murex
  It’s a typed shell. So you can do jq-like data manipulation against a plethora of different documents. Unlike Zsh et al that are still ostensibly limited to just whitespace-delimited lists.
  
  1 reply →

baranul 10 days ago

Very good answer and agree. There seems to be a psychological component wrapped up in the mythology versus the reality of what's necessary.

Sesse__ 10 days ago

> Great examples are things like shells and text editors

I regularly get annoyed that my shell and text editor is slow :-)

I do agree on principle, though.

silisili 10 days ago

Super cool. What is your goal wrt performance? Is low 1.x-ish on average attainable, in your opinion?

pizlonator 10 days ago

I think that worst case 2x, average case 1.5x is attainable.
- Code that uses SIMD or that is mostly dealing with primitive data in large arrays will get to close to 1x
- Code that walks trees and graphs, like interpreted or compilers do, might end up north of 2x unless I am very successful at implementing all of the optimizations I am envisioning.
- Code that is IO bound or interactive is already close to 1x

monkeyelite 10 days ago

You are making a lot of assumptions about my code.

pizlonator 10 days ago
I'm not meaning to. I've ported a lot of programs to Fil-C and I'm reacting to what I learn.
I am curious though. What assumptions do you think I'm making that you think are invalid?
- monkeyelite 10 days ago
  
  - that 4x would not impact user experience - that my code is on a Unix time sharing system - that I only use C or C++ because I inherited it - that Unix tools do not benefit from efficient programming because of syscalls - that multi-threaded garbage collection would be good for perf (assuming I’m not sharing the system)
  
  8 replies →