Pipe Viewer – A Unix Utility You Should Know About

18 hours ago (catonmat.net)

Pipe viewer is excellent. I use it all the time.

As of version 1.8.10[1], which includes my merge request[2] to add an '--output' option, it has even completely replaced my use of 'dd' for writing disk images: 'sudo pv -Yo /dev/mmcblk0 whatever.img' is nicer, has much better progress indication, automatically selects a more sensible buffer size, and begets fewer groans from UNIX neckbeards, than the old 'sudo dd of=/dev/mmcblk0 if=whatever.img'. (The '-Y' causes pv to sync after each write, which greatly improves progress indication in Linux.)

Though it's useful for much more of course. I use it for progress when compressing files ('pv blah | gzip ...'), when uploading files to the web ('pv blah | curl --upload-file - ...' — curl doesn't show progress when uploading for whatever reason), or just when I wanna see that something is happening with an operation which would otherwise take a while (even things like a slow 'du -h /some/path | sort -h' benefits from a 'pv' squeezed in the middle just to indicate that something is happening).

[1] https://codeberg.org/a-j-wood/pv/releases/tag/v1.8.10

[2] https://codeberg.org/a-j-wood/pv/pulls/90

There's also `progress` which works for tools mainly operating on a single file, but unlike `pv`, you don't have to start the tool differently. It'd e.g. work nicely for the `gzip` example. Just call `progress` on a different terminal while `gzip` is running.

  • I was curious on how that’s supposed to work, so I took a quick look: It scans /proc for known commands, then looks up file descriptor information via their fd/fdinfo directory to get size/seek positions and then shows a percentage of the largest file.

  • pv also allows this, but you have to look up the process id manually, and pass it to the -d flag.

pv is great.

It has a limit parameter so you can limit the speed. Great if you don't want to saturate some link or have additional costs for uploading above a certain rate per hour/day.

Also useful for testing behaviour on slow filesystem / connections.

It can take a pid argument too, -d IIRC, which will get it to display progress info for all the open file descriptors of a running process.

Really useful as a quick way to check what a IO process is doing if appears to be stuck.

Pipe viewer? What's that? Let me check the post...oh, it's good old pv! Never noticed it had a full name, damn Unix utilities with their short names!

I love pv but how much does adding the pipe affect overhead? I feel like most of my big jobs I want to measure are on things where you want the program to have direct access to the underlying file or storage. `pv somefile | dd` is going to be slower than `dd somefile`. At least I think so? I have no idea what modern Linux I/O can optimize.

Also does pv necessitate doing single threaded I/O?

I like to use pv as a quick and dirty operations per second counter. Sometimes I will write a program or script that does a bunch of things in parallel (e.g. RPCs to a service I'm working on), and prints one line of output for every operation completed. Then I pipe that output to pv using the --lines option to count only lines. It shows how many lines are being printed per second, which roughly counts operations per second. (IIRC, also need to pipe to /dev/null to prevent pv's fancy output from clobbering the tool's output).

Fun stuff! Especially when combined with GNU parallel, in cases where the thing I'm measuring isn't already parallelized, and I want to be lazy.

> The obvious way to do it is:

> $ gzip -c access.log > access.log.gz

Is it?

  • I mean, if you’re the type of person who considers using tar and nc to be the obvious way to transfer a directory between two computers…

    • I might be weird, but for me the most obvious way to transfer a small directory is to do

          tar -cz dir | base64
      

      Copy output into clipboard

          base64 -d | tar -xz
      

      Paste from clipboard into input

      Works flawlessly to move configs and stuff between servers.

      I actually love the blend between terminal and GUI. For this example I'm using CLI tools to produce text and I'm using GUI to scroll, select and copy&paste the text between two terminal tabs. I wish developers put more emphasis on empowering terminal with GUI capabilities.

      2 replies →

Yes! My `,pv` is approximately: (probably a better way to make the k, but I stop once something is adequate; maybe I just need to make a `,,kof`)

    tar cpS "$@" --sort=name | pv -bratpes $(du -cks "$@"|sed -n '/.total$/ s/.total$//p')k

Which gives me progress bars for big copies like:

    ,pv files/ | tar xp -C /destination

    ,pv files/ | ssh overthere tar xp -C /destination

A little more typing, but I find dd present on most systems already, so I tend to do this:

  tar ... | dd status=progress | ...

  • I've used pv longer than dd had this option for, but that's fair! I also don't use find options, for example, since find piped into the tool everyone already knows anyway - grep - is much easier

    Sadly, dd will not give you an estimated time or allow you to limit the transfer rate, which are two features I use a lot in pv

  • One problem I've noticed with status=progress is that systems can sometimes have gigabytes of buffer space waiting to be filled, so the transfer spends most of its time in a "nearly done" state while (for instance) the SD card gets slowly filled at its real speed.

  • That's slowish, bottlenecking disk based IO. (yes you can improve it with dd options, if you are versed in the language...)