June 18, 2021 • Reading time: 9 minutes
Every now and then, I find myself facing a giant pile of files that need renaming in some way. Now, there's always a tradeoff of number of files vs. complexity where the balance of time spent makes sense to reach for an automated tool, but learning a few tricks off the top of your head can tip the balance in favour of automation and save a lot of time.
First, for demonstration purposes, I'll make a whole lot of files.
$ mkdir test && cd test
$ seq 1 100 | xargs --verbose touch
touch 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
$ ls
1
10
100
11
12
...
(I'll talk about that little piece of magic shortly.)
The smoothest experience for simple tasks is rename
. rename
is a Perl script
that you'll find preinstalled on some but not all *nix machines, and naturally
it speaks in Perl regular expressions.
First off, I forgot to add leading zeroes to those files I created before, so they're alphabetizing in a weird order.
$ rename --verbose 's/^/00/;s/^.*(...)$/$1/' *
1 renamed as 001
10 renamed as 010
11 renamed as 011
12 renamed as 012
13 renamed as 013
14 renamed as 014
...
(Here I use a hack involving two regular expressions chained together with a
semicolon: one to prepend two zeroes to every filename, the second to strip all
but the last three characters. So 74
gets renamed to 0074
and then truncated
to 074
, which is what I wanted all along.)
As mentioned, rename
doesn't come installed on every machine, but you can
pretty well count on xargs
to be there when you need it. My usage above in
setting up this environment is a classic case for xargs
, where my inputs were
known, not crazy (no spaces), and using a command that could take as many
parameters as I want to throw at it (subject to length requirements that xargs
also respects). My friend Julia has a great zine/comic covering more uses of
xargs
, but here I'm just going to talk
about using it for renaming files.
Now, mv
is a bit of a special case. I can only rename one file per invocation
of mv
, and I need to provide a variation on the filename twice to each
command, as in mv 123 123.bak
. You can do that with xargs
, but it needs a
bit of persuasion.
It does have the advantage over rename
that it can run commands that are
like mv
, but aren't mv
. A good example of this is git mv
, which allows
you to move a file that Git is tracking without breaking its version history.
Furthermore, xargs
works if your input is a list of files, like from git diff --name-only
, and it can be chained with other commands like grep
.
I spoke of persuasion to convince xargs
to construct valid mv
syntax. The
workaround I use is to invoke bash -c
as a subprocess. I can then write out
my command just like I would if I were writing a very small shell script, ie.
accepting one (or more!) parameters via the magic variables $0
, $1
, etc.
(It's worth noting that in a shell script, $0
is normally the invocation of
the script, so something like ./myscript.sh
. Here, it contains my first and
only argument.)
I also need to tell xargs
to run this command once for each input word, which
I do by adding -n 1
. Otherwise, xargs
will happily keep glomming arguments
onto the end of a single command until it runs out of room.
$ ls | xargs --verbose -n 1 bash -c 'mv "$0" "File $0"'
bash -c 'mv -v "$0" "File $0"' 001
bash -c 'mv -v "$0" "File $0"' 002
bash -c 'mv -v "$0" "File $0"' 003
bash -c 'mv -v "$0" "File $0"' 004
bash -c 'mv -v "$0" "File $0"' 005
...
Now, this syntax already has quite a few pieces to remember/get wrong, and it's
groaning under the load of a use case that's simpler than my rename
example
above. If you compare it to the much simpler syntax of seq 1 100 | xargs touch
expression I used to create the files in the first place, it becomes evident
that I'm really not using this tool in the way it was intended to be used.
There's another case that will break xargs
: whitespace in filenames. I just
introduced some in my previous example, so let's see what happens if I run
exactly the same command again. In theory I should get "File File 001", but
instead I see:
$ ls | xargs --verbose -n 1 bash -c 'mv "$0" "File $0"'
bash -c 'mv "$0" "File $0"' File
mv: cannot stat 'File': No such file or directory
bash -c 'mv "$0" "File $0"' 001
mv: cannot stat '001': No such file or directory
bash -c 'mv "$0" "File $0"' File
mv: cannot stat 'File': No such file or directory
bash -c 'mv "$0" "File $0"' 002
mv: cannot stat '002': No such file or directory
...
Introducing spaces into the mix caused xargs
to treat "File" and "001" as two
separate files. I could probably do some dark magic involving find -print0 | xargs -0
, but even that would struggle with the | bash -c
. Instead, my
favourite trick is to use sed
to craft exactly the commands I want. Unlike
xargs
, which works per word separated by whitespace, sed
works exclusively
and explicitly per line, which is exactly the sort of input I expect from ls
.
$ ls | sed -e 's/File 0*([1-9][0-9]*)/mv -v "\0" "I was \1 and now I am \0"/'
mv -v "File 001" "I was 1 and now I am File 001"
mv -v "File 002" "I was 2 and now I am File 002"
mv -v "File 003" "I was 3 and now I am File 003"
mv -v "File 004" "I was 4 and now I am File 004"
mv -v "File 005" "I was 5 and now I am File 005"
...
Here I've dropped down to the low level of straight-up writing a shell script
programmatically. It has the advantage of giving me complete control over the
transformation, subject only to the limitations of sed
, and does a much better
job of handling whitespace and special characters than xargs
.
The other thing I like about the sed
approach is that it's easy to reason
about. The output from the script is exactly what bash
will be running. The
sed
command above doesn't actually do anything, it just outputs a bunch of
lines containing the commands that it will run. Actually running the commands
is as simple as piping the output straight into bash
.
$ ls | sed -e 's/File 0*([1-9][0-9]*)/mv -v "\0" "I was \1 and now I am \0"/' | bash
renamed 'File 001' -> 'I was 1 and now I am File 001'
renamed 'File 002' -> 'I was 2 and now I am File 002'
renamed 'File 003' -> 'I was 3 and now I am File 003'
renamed 'File 004' -> 'I was 4 and now I am File 004'
renamed 'File 005' -> 'I was 5 and now I am File 005'
...
This is where I stop if I'm doing a quick-and-dirty one-off. sed
is easy to
reason about, gives me the full power of regex, and handles most of the files
that I'm likely to encounter in a given case. If there are a few special cases,
they're easy enough to handle manually.
However, renaming arbitrary files in a shell script requires more care, because
my input could be anything. The
shellcheck
-approved approach to this sort of
thing would be:
$ touch 'I was " and now I am here to mess with your `sed`'
$ for file in *; do rm -v "$file"; mkdir -v "${file/File/Directory}"; done
removed 'I was 100 and now I am File 100'
mkdir: created directory 'I was 100 and now I am Directory 100'
removed 'I was 10 and now I am File 010'
mkdir: created directory 'I was 10 and now I am Directory 010'
removed 'I was 11 and now I am File 011'
mkdir: created directory 'I was 11 and now I am Directory 011'
...
removed 'I was " and now I am here to mess with your `sed`'
mkdir: created directory 'I was " and now I am here to mess with your `sed`'
I threw a curveball here, introducing a file with a double quote and backticks,
which my sed
expression above would have completely barfed on, but for file in *
doesn't care.
One huge drawback of this last approach is that there's no easy way to dry run,
that is, to show me what's going to happen before it happens. rename
has a
--nono
flag that will print its planned operations without actually running
them, and xargs --interactive
will at least ask before acting. sed
is the
best of the bunch because it will dry run by default until I add the | bash
to
the end of the expression.