Three ways to bulk rename files

June 18, 2021 • Reading time: 9 minutes

Every now and then, I find myself facing a giant pile of files that need renaming in some way. Now, there's always a tradeoff of number of files vs. complexity where the balance of time spent makes sense to reach for an automated tool, but learning a few tricks off the top of your head can tip the balance in favour of automation and save a lot of time.

First, for demonstration purposes, I'll make a whole lot of files.

$ mkdir test && cd test
$ seq 1 100 | xargs --verbose touch
touch 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
$ ls
1
10
100
11
12
...

(I'll talk about that little piece of magic shortly.)

rename

The smoothest experience for simple tasks is rename. rename is a Perl script that you'll find preinstalled on some but not all *nix machines, and naturally it speaks in Perl regular expressions.

First off, I forgot to add leading zeroes to those files I created before, so they're alphabetizing in a weird order.

$ rename --verbose 's/^/00/;s/^.*(...)$/$1/' *
1 renamed as 001
10 renamed as 010
11 renamed as 011
12 renamed as 012
13 renamed as 013
14 renamed as 014
...

(Here I use a hack involving two regular expressions chained together with a semicolon: one to prepend two zeroes to every filename, the second to strip all but the last three characters. So 74 gets renamed to 0074 and then truncated to 074, which is what I wanted all along.)

xargs

As mentioned, rename doesn't come installed on every machine, but you can pretty well count on xargs to be there when you need it. My usage above in setting up this environment is a classic case for xargs, where my inputs were known, not crazy (no spaces), and using a command that could take as many parameters as I want to throw at it (subject to length requirements that xargs also respects). My friend Julia has a great zine/comic covering more uses of xargs, but here I'm just going to talk about using it for renaming files.

Now, mv is a bit of a special case. I can only rename one file per invocation of mv, and I need to provide a variation on the filename twice to each command, as in mv 123 123.bak. You can do that with xargs, but it needs a bit of persuasion.

It does have the advantage over rename that it can run commands that are like mv, but aren't mv. A good example of this is git mv, which allows you to move a file that Git is tracking without breaking its version history. Furthermore, xargs works if your input is a list of files, like from git diff --name-only, and it can be chained with other commands like grep.

I spoke of persuasion to convince xargs to construct valid mv syntax. The workaround I use is to invoke bash -c as a subprocess. I can then write out my command just like I would if I were writing a very small shell script, ie. accepting one (or more!) parameters via the magic variables $0, $1, etc. (It's worth noting that in a shell script, $0 is normally the invocation of the script, so something like ./myscript.sh. Here, it contains my first and only argument.)

I also need to tell xargs to run this command once for each input word, which I do by adding -n 1. Otherwise, xargs will happily keep glomming arguments onto the end of a single command until it runs out of room.

$ ls | xargs --verbose -n 1 bash -c 'mv "$0" "File $0"'
bash -c 'mv -v "$0" "File $0"' 001
bash -c 'mv -v "$0" "File $0"' 002
bash -c 'mv -v "$0" "File $0"' 003
bash -c 'mv -v "$0" "File $0"' 004
bash -c 'mv -v "$0" "File $0"' 005
...

Now, this syntax already has quite a few pieces to remember/get wrong, and it's groaning under the load of a use case that's simpler than my rename example above. If you compare it to the much simpler syntax of seq 1 100 | xargs touch expression I used to create the files in the first place, it becomes evident that I'm really not using this tool in the way it was intended to be used.

sed

There's another case that will break xargs: whitespace in filenames. I just introduced some in my previous example, so let's see what happens if I run exactly the same command again. In theory I should get "File File 001", but instead I see:

$ ls | xargs --verbose -n 1 bash -c 'mv "$0" "File $0"'
bash -c 'mv "$0" "File $0"' File 
mv: cannot stat 'File': No such file or directory
bash -c 'mv "$0" "File $0"' 001 
mv: cannot stat '001': No such file or directory
bash -c 'mv "$0" "File $0"' File 
mv: cannot stat 'File': No such file or directory
bash -c 'mv "$0" "File $0"' 002 
mv: cannot stat '002': No such file or directory
...

Introducing spaces into the mix caused xargs to treat "File" and "001" as two separate files. I could probably do some dark magic involving find -print0 | xargs -0, but even that would struggle with the | bash -c. Instead, my favourite trick is to use sed to craft exactly the commands I want. Unlike xargs, which works per word separated by whitespace, sed works exclusively and explicitly per line, which is exactly the sort of input I expect from ls.

$ ls | sed -e 's/File 0*([1-9][0-9]*)/mv -v "\0" "I was \1 and now I am \0"/'
mv -v "File 001" "I was 1 and now I am File 001"
mv -v "File 002" "I was 2 and now I am File 002"
mv -v "File 003" "I was 3 and now I am File 003"
mv -v "File 004" "I was 4 and now I am File 004"
mv -v "File 005" "I was 5 and now I am File 005"
...

Here I've dropped down to the low level of straight-up writing a shell script programmatically. It has the advantage of giving me complete control over the transformation, subject only to the limitations of sed, and does a much better job of handling whitespace and special characters than xargs.

The other thing I like about the sed approach is that it's easy to reason about. The output from the script is exactly what bash will be running. The sed command above doesn't actually do anything, it just outputs a bunch of lines containing the commands that it will run. Actually running the commands is as simple as piping the output straight into bash.

$ ls | sed -e 's/File 0*([1-9][0-9]*)/mv -v "\0" "I was \1 and now I am \0"/' | bash
renamed 'File 001' -> 'I was 1 and now I am File 001'
renamed 'File 002' -> 'I was 2 and now I am File 002'
renamed 'File 003' -> 'I was 3 and now I am File 003'
renamed 'File 004' -> 'I was 4 and now I am File 004'
renamed 'File 005' -> 'I was 5 and now I am File 005'
...

This is where I stop if I'm doing a quick-and-dirty one-off. sed is easy to reason about, gives me the full power of regex, and handles most of the files that I'm likely to encounter in a given case. If there are a few special cases, they're easy enough to handle manually.

Bonus: for

However, renaming arbitrary files in a shell script requires more care, because my input could be anything. The shellcheck-approved approach to this sort of thing would be:

$ touch 'I was " and now I am here to mess with your `sed`'
$ for file in *; do rm -v "$file"; mkdir -v "${file/File/Directory}"; done
removed 'I was 100 and now I am File 100'
mkdir: created directory 'I was 100 and now I am Directory 100'
removed 'I was 10 and now I am File 010'
mkdir: created directory 'I was 10 and now I am Directory 010'
removed 'I was 11 and now I am File 011'
mkdir: created directory 'I was 11 and now I am Directory 011'
...
removed 'I was " and now I am here to mess with your `sed`'
mkdir: created directory 'I was " and now I am here to mess with your `sed`'

I threw a curveball here, introducing a file with a double quote and backticks, which my sed expression above would have completely barfed on, but for file in * doesn't care.

One huge drawback of this last approach is that there's no easy way to dry run, that is, to show me what's going to happen before it happens. rename has a --nono flag that will print its planned operations without actually running them, and xargs --interactive will at least ask before acting. sed is the best of the bunch because it will dry run by default until I add the | bash to the end of the expression.