Thursday, September 25, 2014

Why spaces in filenames are dangerous

There seem to be more files with spaces in their names on Windows machines than in Unix. Both Windows and Linux have allowed spaces in their names for decades now, so why the difference?

I think part of the answer is that more people work at the command line in Unix than Windows. And spaces in filenames make it easier to lose files on Unix. Lose, as in, gone forever.

Really? Yes, ask anyone who has used Unix for more than a few years. The famed UNIX-HATERS Handbook at http://richard.esplins.org/static/downloads/unix-haters-handbook.pdf even has a note about this problem on page 19.

Step by Simple Step

When you do things on the command line you rarely type a file's full name. Instead you type the first
few letters and hit the Tab key and the command line shell fills in the rest of the name of the file. So

ls operTAB 

and the line changes to

ls operations.txt

Neat and fast. If there's more than one file that starts with "oper" then the choices are shown and you simply type a few more characters to choose the exact one you want.

If I want to remove all the files that match some name I can type

rm operTAB

which expands and then add a *. The * character means that the rm command will match operations.txt, operations2.txt and similar files.

But let's see what happens when we have spaces in the filenames.

-rw-r--r--   1 mdoar  staff     0 Aug 29 15:27 your_lifes_work
-rw-r--r--   1 mdoar  staff     0 Aug 29 15:21 some file
-rw-r--r--   1 mdoar  staff     0 Aug 29 15:21 some music

Here we have two files that both start with "some ".  And one file that contains all your life's work.

When I try to work with them I type 

ls soTAB

and the line changes to

ls some\  

Keep watching that trailing space that you can't see in the line above. 

When I want to remove the two files that start with "some", I type the same thing by habit:

rm someTAB*

And this completes to 

rm some\ *

Now to the command line that actually looks like two commands:

rm "some\ "

which fails because there is no such file, and more interestingly the command

rm "*"

which succeeds all too well. It removes all the files in that directory including the ominously-named file your_lifes_work

What Can I Do?

The first thing is to double check every destructive command you type. You know what you think you typed, but maybe you fat-fingered it. Anyway, look again t what is in front of you

You can alias rm to rm -i which will prompt you before removing any file. But this rapidly becomes tedious when removing more than a few files

You can alias rm to a move command (mv), so the files are moved to a Trash bin from where they can later be recovered. This is not a bad idea but there are edge cases if a file or directory of the same name already exists in this custom Trash directory

You could use the OS trash bin which is designed for jsut this case. That's true, but it's slower and not so convenient to use from the command line. 

Perhaps the best approach is to work out how to invoke the OS trash feature from the command line. 
  • http://apple.stackexchange.com/questions/50844/how-to-move-files-to-trash-from-command-line refers to some OSX apps that do this. This app looks good: http://hasseg.org/blog/post/406/trash-files-from-the-os-x-command-line/ but I haven't tried it
  • http://askubuntu.com/questions/6698/can-files-directories-deleted-from-terminal-be-restored/6703#6703 has a suggestion for the same idea in Linux