← Back to context

Comment by ndsipa_pomu

14 hours ago

You're not wrong, but there's fairly easy ways to deal with filenames containing spaces - usually just enclosing any variable use within double quotes will be sufficient. It's tricker to deal with filenames that contain things such as line breaks as that usually involves using null terminated filenames (null being the only character that is not allowed in filenames). e.g find . -type f -print0

You're not wrong, but at my place, our main repository does not permit cloning into a directory with spaces in it.

Three factors conspire to make a bug:

  1. Someone decides to use a space
  2. We use Python
  3. macOS

Say you clone into a directory with a space in it. We use Python, so thus our scripts are scripts in the Unix sense. (So, Python here is replacable with any scripting language that uses a shebang, so long as the rest of what comes after holds.) Some of our Python dependencies install executables; those necessarily start with a shebang:

  #!/usr/bin/env python3

Note that space.

Since we use Python virtualenvs,

  #!/home/bob/src/repo/.venv/bin/python3

But … now what if the dir has a space?

  #!/home/bob/src/repo with a space/.venv/bin/python3

Those look like arguments, now, to a shebang. Shebangs have no escaping mechanism.

As I also discovered when I discovered this, the Python tooling checks for this! It will instead emit a polyglot!

  #!/bin/bash

  # <what follows in a bash/python polyglot>
  # the bash will find the right Python interpreter, and then re-exec this
  # script using that interpreter. The Python will skip the bash portion,
  # b/c of cleverness in the polyglot.

Which is really quite clever, IMO. But, … it hits (2.). It execs bash, and worse, it is macOS's bash, and macOS's bash will corrupt^W remove for your safety! certain environment variables from the environment.

Took me forever to figure out what was going on. So yeah … spaces in paths. Can't recommend them. Stuff breaks, and it breaks in weird and hard to debug ways.

  • If all of your scripts run in the same venv (for a given user), can you inject that into the PATH and rely on env just finding the right interpreter?

    I suppose it would also need env to be able to handle paths that have spaces in them.

  • What a headache!

    My practical view is to avoid spaces in directories and filenames, but to write scripts that handle them just fine (using BASH - I'm guilty of using it when more sane people would be using a proper language).

    My ideological view is that unix/POSIX filenames are allowed to use any character except for NULL, so tools should respect that and handle files/dirs correctly.

    I suppose for your usage, it'd be better to put the virtualenv directory into your path and then use #!/usr/bin/env python

    • For the BSDs and Linux, I believe that shebang are intepreted by the kernel directly and not by the shell. /usr/bin/env and /bin/sh are guaranteed by POSIX to exists so your solution is the correct one. Anything else is fragile.

  • These are part of the rituals of learning how a system works, in the same way interns get tripped up at first when they discover ^S will hang an xterm, until ^Q frees it. If you're aware of the history of it, it makes perfect sense. Unix has a personality, and in this case the kernel needs to decide what executable to run before any shell is involved, so it deliberately avoids the complexity of quoting rules.

    I'd give this a try, works with any language:

      #!/usr/bin/env -S "/path/with spaces/my interpreter" --flag1 --flag2
    

    Only if my env didn't have -S support, I might consider a separate launch script like:

      #!/bin/sh
      exec "/path/with spaces/my interpreter" "$0" "$@"
    

    But most decent languages seems to have some way around the issue.

    Python

      #!/bin/sh
      """:"
      exec "/path/with spaces/my interpreter" "$0" "$@"
      ":"""
      # Python starts here
      print("ok")
    

    Ruby

      #!/bin/sh
      exec "/path/with spaces/ruby" -x "$0" "$@"
      #!ruby
      puts "ok"
    

    Node.js

      #!/bin/sh
      /* 2>/dev/null
      exec "/path/with spaces/node" "$0" "$@"
      */
      console.log("ok");
    

    Perl

      #!/bin/sh
      exec "/path/with spaces/perl" -x "$0" "$@"
      #!perl
      print "ok\n";
    

    Common Lisp (SBCL) / Scheme (e.g. Guile)

      #!/bin/sh
      #|
      exec "/path/with spaces/sbcl" --script "$0" "$@"
      |#
      (format t "ok~%")
    

    C

      #!/bin/sh
      #if 0
      exec "/path/with spaces/tcc" -run "$0" "$@"
      #endif
      
      #include <stdio.h>
      
      int main(int argc, char **argv)
      {
          puts("ok");
          return 0;
      }
    

    Racket

      #!/bin/sh
      #|
      exec "/path/with spaces/racket" "$0" "$@"
      |#
      #lang racket
      (displayln "ok")
    

    Haskell

      #!/bin/sh
      #if 0
      exec "/path/with spaces/runghc" -cpp "$0" "$@"
      #endif
      
      main :: IO ()
      main = putStrLn "ok"
    

    Ocaml (needs bash process substitution)

      #!/usr/bin/env bash
      exec "/path/with spaces/ocaml" -no-version /dev/fd/3 "$@" 3< <(tail -n +3 "$0")
      print_endline "ok";;