← Back to context

Comment by GuB-42

6 months ago

ugrep, which is C++ and similar in scope to ripgrep is 0.9 MB on my machine, ripgrep is 4.4 MB and GNU grep us 0.2 MB. They all depend on libc and libpcre2.

Ugrep however depends on libstdc++ and a bunch of libraries for compressed file support (libz,...).

So yeah a bit bloated but we are not at Electron level yet.

It's not clear to me that you're accounting for the difference in size that results from static vs dynamic linking. For example, if I build `ugrep` with `./build.sh --enable-static --without-brotli --without-lzma --without-zstd --without-lz4 --without-bzlib`, then I get a `ugrep` binary that is 4.5MB. (I added all of those `--without-*` flags because I couldn't get the build to work otherwise.) If I add `--without-pcre2`, I get a 3.9MB binary.

ripgrep is only a little bigger here when you do an apples to apples comparison. To get a static build without PCRE2, run `cargo build --profile release-lto --target x86_64-unknown-linux-musl`. That gets me a 4.6MB `rg` binary. Running `PCRE2_SYS_STATIC=1 cargo build --profile release-lto --target x86_64-unknown-linux-musl --features pcre2` gets a fully static binary with PCRE2 at a 5.4MB `rg` binary.

Popping up a level, a fair criticism is that it is difficult to get ripgrep to dynamically link most of its dependencies. You can make it dynamically link libc and PCRE2 (that's just `cargo build --profile release-lto --features pcre2`) and get a 4.1MB binary, but getting it to dynamically link all of its Rust crate dependencies is an unsupported build configuration for ripgrep. But I don't know how much tools like ugrep or GNU grep rely on that level of granular dynamic linking anyway. GNU grep doesn't seem to do so on my system (only dynamically linking with libc and PCRE2).

Additionally, the difference in binary size may be at least partially attributable to a difference in Unicode support:

    $ echo ♥ | rg '\p{Emoji}'
    ♥
    $ echo ♥ | ugrep-7.5.0 '\p{Emoji}'
    ugrep: error: error at position 6
    (?m)\p{Emoji}
          \___invalid character class

  • These are grep, ripgrep and ugrep installed on my Debian (bookworm). The mentioned sizes are the executables only, because I think that not taking advantage of dynamic libraries if you can is a downside, though there are arguments going the other way.

    Anyways, I still took it into account when calling ripgrep "bloated". Using ldd, I counted 3.6 MB of dependencies for ripgrep and 7.1 MB for ugrep. Which coincidentally result in about 8 MB for both ugrep and ripgrep. But ugrep accounts for the entire libstdc++ and other libraries, which includes code that ugrep doesn't need (such as compression), so I would have expected ugrep to be smaller. GNU grep has 2.5 MB of dependencies btw: 1.9MB for libc and 0.6MB for libpcre2.

    And to make things clear, I don't put ugrep in the lightweight category either. C++ (modern C++ in particular) suffers from some of the same problems as Rust: lots of code generation leading to bloat and slow compile times, but (as you pointed out) it tends to play along better with dynamic libraries with a C interface.

    I don't know how much a size-optimized grep with the same features as ripgrep would take. 4 MB looks like a lot, but sometimes bloat come from unexpected places. For example, some compression algorithms may include predefined dictionaries, coloring may involve terminal databases, and Unicode support my involve databases too.

    • > The mentioned sizes are the executables only, because I think that not taking advantage of dynamic libraries if you can is a downside, though there are arguments going the other way.

      I addressed and accounted for this in my comment. All three of GNU grep, ugrep and ripgrep can dynamically link libc and PCRE2.

      ripgrep doesn't bundle any compression code or terminal databases. ripgrep does bundle a significant number of Unicode tables.

    • On my system grep is 136kb, dynamically linked to libc. Frankly, I don't use all its features and all its regex engines, can't remember when I searched for something more complex than fixed text. It supports coloring and references TERM variable. fgrep is 80kb, apparently supports all features except for coloring and pcre.

      4 replies →