Comment by GuB-42

1 month ago

What I find particularly ironic is that the title make it feel like Rust gives a 5x performance improvement when it actually slows thing down.

The problem they have software written in Rust, and they need to use the libpg_query library, that is written in C. Because they can't use the C library directly, they had to use a Rust-to-C binding library, that uses Protobuf for portability reasons. Problem is that it is slow.

So what they did is that they wrote their own non-portable but much more optimized Rust-to-C bindings, with the help of a LLM.

But had they written their software in C, they wouldn't have needed to do any conversion at all. It means they could have titled the article "How we lowered the performance penalty of using Rust".

I don't know much about Rust or libpg_query, but they probably could have gone even faster by getting rid of the conversion entirely. It would most likely have involved major adaptations and some unsafe Rust though. Writing a converter has many advantages: portability, convenience, security, etc... but it has a cost, and ultimately, I think it is a big reason why computers are so fast and apps are so slow. Our machines keep copying, converting, serializing and deserializing things.

Note: I have nothing against what they did, quite the opposite, I always appreciate those who care about performance, and what they did is reasonable and effective, good job!

20 comments

GuB-42

Aurornis 1 month ago

> What I find particularly ironic is that the title make it feel like Rust gives a 5x performance improvement when it actually slows thing down.

Rust didn't slow them down. The inefficient design of the external library did.

Calling into C libraries from Rust is extremely easy. It takes some work to create a safer wrapper around C libraries, but it's been done for many popular libraries.

This is the first and only time I've seen an external library connected via a Rube Goldberg like contraption with protobufs in the middle. That's the problem.

Sadly they went with the "rewrite to Rust" meme in the headline for more clickability.

GuB-42 1 month ago
> Calling into C libraries from Rust is extremely easy
Calling the C function is not the problem here. It is dealing with the big data structure this function returns in a Rust-friendly manner.
This is something Protobuf does very well, at the cost of performance.
- wizzwizz4 1 month ago
  
  Writing Rust bindings for arbitrary C data structures is not hard. You just need to make sure every part of your safe Rust API code upholds the necessary invariants. (Sometimes that's non-trivial, but a little thinking will always yield a solution: if C code can do it, then it can be done, and if it can be done, then it can be done in Rust.)
driftwood4537 1 month ago

What about the other way around? i recently had a use case where i needed a C shared library that persists complex C data structures into an RDBMS. Given my team had minimal C experience and this needed to be production grade. I ended up writing a thin C lib that offloads the heavy lifting to a sidecar go process. They interacted via protobuf over a local unix socket.
Would love to hear if i could've come up with a better design.

phkahler 1 month ago

>> But had they written their software in C, they wouldn't have needed to do any conversion at all. It means they could have titled the article "How we lowered the performance penalty of using Rust".

That's not really fair. The library was doing serialization/deserialization which was poor design choice from a performance perspective. They just made a more sane API that doesn't do all that extra work. It might best be titles "replacing protobuf with a normal API to go 5 times faster."

BTW what makes you think writing their end in C would yield even higher performance?

GuB-42 1 month ago
> BTW what makes you think writing their end in C would yield even higher performance?
C is not inherently faster, you are right about that.
But what I understand is that the library they use works with data structures that are designed to be used in a C-like language, and are presumably full of raw pointers. These are not ideal for working in Rust, instead, presumably, they wrote their own data model in Rust fashion, which means that now, they need to make a conversion, which is obviously slower than doing nothing.
They probably could have worked with the C structures directly, resulting in code that could be as fast as C, but that wouldn't make for great Rust code. In the end, they chose the compromise of speeding up conversion.
Also, the use of Protobuf may be a poor choice from a performance perspective, but it is a good choice for portability, it allows them to support plenty of languages for cheaper, and Rust was just one among others. The PgDog team gave Rust and their specific application special treatment.
- timschmidt 1 month ago
  
  > which means that now, they need to make a conversion, which is obviously slower than doing nothing.
  One would think. But since caches have grown so large, and memory speed and latency haven't scaled with compute, so long as the conversion fits in the cache and is operating on data already in the cache from previous operations, which admittedly takes some care, there's often an embarrassing amount of compute sitting idle waiting for the next response from memory. So if your workload is memory or disk or network bound, conversions can oftentimes be "free" in terms of wall clock time. At the cost of slightly more wattage burnt by the CPU(s). Much depends on the size and complexity of the data structure.
hn_go_brrrrr 1 month ago

Because then they never would have needed the poorly-designed intermediate library.

the__alchemist 1 month ago

I wonder why they didn't immediately FFI it: C is the easiest lang to write rust binding for. It can get tedious if using many parts of a large API, but otherwise is straightforward.

I write most of my applications and libraries in Rust, and lament that most of the libraries I wish I would FFI are in C++ or Python, which are more difficult.

Protobuf sounds like the wrong tool. It has applications for wire serialization and similar, but is still kind of a mess there. I would not apply it to something that stays in memory.

vlovich123 1 month ago

It’s trivial to expose the raw C bindings (eg a -sys crate) because you just run bindgen on the header. The difficult part can be creating safe, high-performance abstractions.
kleton 1 month ago
>Protobuf sounds like the wrong too This sort of use for proto is quite common at google
- kccqzy 1 month ago
  
  No it’s not common for two pieces of code within a single process to communicate by serializing the protobuf into the wire format and deserializing it.
  It’s however somewhat common to pass in-memory protobuf objects between code, because the author didn’t want to define a custom struct but preferred to use an existing protobuf definition.
  
  2 replies →

dchuk 1 month ago

Given they heavily used LLMs for this optimization, makes you wonder why they didn’t use them to just port the C library to rust entirely. I think the volume of library ports to more languages/the most performant languages is going to explode, especially given it’s a relatively deterministic effort so long as you have good tests and api contracts, etc

cfors 1 month ago
The underlying C library interacts directly with the postgres query parser (therefore, Postgres source). So unless you rewrite postgres in Rust, you wouldn't be able to do that.
- vineyardmike 1 month ago
  
  Well then why didn’t they just get the LLM to rewrite all of Postgres too /s
  I agree that LLMs will make clients/interfaces in every language combination much more common, but I wonder the impact it’ll have on these big software projects if more people stop learning C.

logicchains 1 month ago

> they had to use a Rust-to-C binding library, that uses Protobuf for portability reasons.

That sounds like a performance nightmare, putting Protobuf of all things between the language and Postgres, I'm surprised such a library ever got popular.

formerly_proven 1 month ago
> I'm surprised such a library ever got popular.
Because it is not popular.
pg_query (TFA) has ~1 million downloads, the postgres crate has 11 million downloads and the related tokio-postgres crate has over 33 million downloads. The two postgres crates currently see around 50x as much traffic as the (special-purpose) crate from the article.
edit: There is also pq-sys with over 12 million downloads, used by diesel, and sqlx-postgres with over 16 million downloads, used by sqlx.
- yencabulator 1 month ago
  
  Notably though, I believe neither tokio nor tokio-postgres parse SQL queries, they just pass them on the wire to the server. Generally the client side doesn't need to parse the query.
  https://crates.io/crates/sqlparser has 48 million downloads, though. It's not exactly 100% compatible (yet!) but it's pretty darn great.