The Art of Naming Files: Mechanical Considerations

14 minute read

There are two hard problems in computer science: cache invalidation, naming things, and off-by-one errors.
Anonymous

It’s easy to laugh at programmers when they claim one of the hardest problems is naming things, especially in such an obviously tongue-in-cheek epigram. But while variable names in code might not matter to the computer, good names can absolutely make the difference between easy comprehension and complete bewilderment when you return to your old code months later.

Filenames have exactly the same effect. Have consistently good file and folder names in combination with a well-designed hierarchy and finding things is a cinch. Have bad file and folder names and the best you can do is hope to stumble on the right file by scanning through scores of them or using a search function which may or may not look in the right place. The principles in this post and the following post will assist you in coming up with names that help you rather than hinder you.

In this post, we’ll focus on mechanics: what characters and phrases you should use in filenames and how you can manipulate them to get useful effects. In the next post, we’ll look at developing high-level naming conventions.

Individual characters

First, let’s talk about individual characters. Most filesystems define some special characters you cannot use in filenames. In addition, you should not use certain other characters in filenames under any circumstances. Finally, you may prefer not to use some characters. Let’s look at these in turn.

Cannot-use

  • Linux fundamentally cannot handle file or folder names containing a forward slash (/), as forward slashes are reserved for separating folder names in paths.
  • HFS+, the standard Mac OS filesystem, cannot handle file or folder names containing a colon (:), as colons are reserved for separating folder names in paths. (Mac OS typically displays paths with a slash separating folders, but underneath they’re actually stored as colons.)
  • Windows cannot handle any of the following characters: <>:"/\|?*. In addition, it cannot handle unprintable characters (more on that in a minute).

Under the hood: Windows also can’t handle files with any of the following specific names, or with any of these names in a different case, or with any of these names followed by a filename extension (e.g., CON, coN, con.pdf): CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9, CONIN$, CONOUT$. These are names of hardware devices which pretend to be files. In the DOS days, one would use these special files to do fancy things like dir >prn (get a list of all files in the current folder and send it directly to the printer) or copy con file.txt (write whatever is typed on the keyboard to a new file called file.txt). It’s unlikely that you’ll ever want to give a file one of these names in particular, but this is too wonderful a piece of obscure Windows trivia for me to gloss over it entirely!

Should-not-use

Windows forbids many of these horrible offenses against reason, while Mac OS and Linux do not. No matter what operating system you’re using, including any of these characters in a filename is a dreadful idea. You will probably never try to use one of these characters, but it’s good to be aware they’re bad ideas in case you’re tempted to try:

  • Whitespace other than normal spaces (tabs, carriage returns, or their less-well-known siblings, vertical tabs and form feeds). Filenames containing these characters can actually cause security vulnerabilities!
  • Untrimmed whitespace – that is, spaces at the very beginning or end of a filename. Most programs remove this space before saving a file, so it’s difficult to get a filename with untrimmed whitespace, but if you do end up with one, programs that don’t handle this whitespace correctly may end up crashing or encountering another security vulnerability.
  • Unprintable characters (with ASCII values 0-31). These include backspaces, escape characters, and so on. Like untrimmed whitespace, it’s difficult to get a filename containing these, but if you end up with one, it will be difficult to identify or correctly print the name of the file. Unprintable characters in filenames can also cause security vulnerabilities (gee, are you seeing a theme here?).
  • -, at the very beginning of a filename: Certain programs can end up interpreting this file as an option rather than a filename. For instance, in Linux or Mac OS, if you try to delete all the files in a folder at the command line and there’s a file called -rf in the folder, it’s possible to end up deleting the folders in the same location which you didn’t select! Don’t expose yourself to bugs in other people’s programs.

Maybe-don’t-use

So we can’t use certain characters in filenames. Is that a huge burden?

Actually, no – in most cases it’s best to use an even smaller set of characters. The simpler your filenames are, the easier it is to be consistent and the easier it is to write scripts that work with filenames. The best plan depends on how strict you want to be with yourself and how much you care about scriptability. Here are a couple of suggested plans, with the least restrictive first:

  1. Don’t use any characters forbidden by Windows – even if you’re not using Windows. Why? At some point in your life, you’ll probably want to open one of your files on Windows, send a file to someone who’s using Windows, or upload a file to a web service that uses Windows behind the scenes. It’s easy to get into trouble – or get someone else into trouble without their consent – if your filenames use characters Windows forbids. A lot of these characters can also get confusing, particularly in scripts.
  2. Don’t use any special characters except ., -, and _. That is, use just whatever letters your language uses, numbers, spaces, and the three characters above. This makes your files a lot easier to work with on the command line or in scripts, it makes sure your files are compatible with any operating system or program you might toss them at while eliminating the need to memorize a string of funny characters that Windows doesn’t support, and it makes your filenames clean and consistent.
  3. In addition to #2, don’t use any spaces. Some Windows and Mac users will probably shout at me, “What!? How could I not use spaces in my filenames!?” But most Linux users have been customarily avoiding spaces in their filenames since the beginning of time and have been getting along just fine! I’ll admit that eliminating spaces is almost entirely good for computers rather than for humans. That said, when you design a system to work well for the computer, it often makes life easier for humans as well because the system ends up working better.

Under the hood: What’s with the hate for spaces? The shell languages that underlie operating systems separate parameters by spaces. For instance, the Linux/Mac OS shell command to move a file from the source location to the destination location is mv source destination. What happens if we have spaces in the filenames themselves? We have to do extra work by quoting or escaping the names and give ourselves more opportunities to make mistakes: mv "source file" "destination file" or mv source\ file destination\ file. If we don’t do this, the shell will consider all four words separate parameters and try to move the three files source, file, and destination to the destination location file. While this requirement is most annoying when you’re typing shell commands yourself or writing scripts that work with them, it’s not uncommon for even popular programs used by millions of people to contain bugs related to handling spaces in filenames.

Entire filenames

Now we know what characters we should use. But we probably care most about how we should string them together into meaningful words and phrases.

Separating words in filenames

Let’s suppose that you’ve decided not to use spaces in your filenames (which I generally recommend). How do you separate words? You have three main options:

  • PascalCase: This convention takes its name from an early programming language that frequently used this style for variable names. You capitalize the first letter of each word and leave the remainder lowercase. (The related camelCase, named after the hump created in the middle, is like PascalCase but leaves the first letter of the first word lowercase.)
  • hyphen-separation: Use a hyphen between words. This method is usually paired with leaving all the letters lowercase, but it doesn’t have to be.
  • underscore_separation: Use an underscore between words. Again, the name is usually written entirely in lowercase, but it doesn’t have to be.

For the most part, which method to use is a matter of personal preference. Consistency is much more important than the convention you choose.

Tip: Most veteran Linux users prefer hyphens to underscores, for the simple reason that hyphens can be typed without pressing the shift key! This may sound trivial and silly, but most people type a lot of filenames in their lifetimes, so I think it’s worth considering.

Separating parts of a filename

Sometimes you want a filename to contain multiple pieces of information. For instance, you might want the names of your digital photos to include both the date they were taken and a sequential number (e.g., the 61st photo taken on May 26, 2016). It’s helpful to have a standard way of separating the information; once again, the separator is particularly helpful for scripts, but it can make filenames easier to read for you, too.

Personally, I find the best method is to use hyphens to separate words and underscores to separate sections, so our filename might look something like 2016-05-26_61.jpg. This does a good job of both visually and mechanically separating the components.

Designing for filename completion

In Windows Explorer or Finder, you can begin typing the name of a file or folder in a list to jump to it (try it!). Similarly, in most command-line interfaces and programming tools, you can begin typing a filename and press Tab or Ctrl-Space to automatically complete the rest of the name. These features introduce a couple of additional considerations for naming, more about efficiency than about organization. These tips particularly apply to the names of your main high-level folders, which you’ll end up navigating through frequently.

  • Avoid names which begin with the same letters. Ideally, when you have a limited number of items, have every item begin with a different letter. For instance, rather than computer, composting, and compositions, you might prefer to use PC, composting, and essays. This way you only have to type one letter rather than up to seven to unambiguously identify the item you want.
  • Particularly avoid names which form part of other names. If you have two folders named fine and finessed, for instance, it’s easy to type fi, have the computer select fine, see a folder was selected, and immediately press Enter out of habit thinking you got finessed, when you actually got stopped at fine and are now in the wrong folder. Part-of-another-name conflicts occur rarely but can be seriously obnoxious, particularly if you don’t recognize why you keep landing in the wrong folder.
  • If you’re not worried about sort order or this tip would lead to a good sort order anyway, consider putting the part of the filename that’s most unique first. For instance, instead of current-projects and current-financial-documents, consider financial-documents-current and projects-current.

Sorting magic

Nearly all programs sort filenames alphabetically. This suggests the following tricks.

Forcing filenames to the top

This trick isn’t guaranteed to work everywhere because programs can choose how to sort special characters, but it has a high success rate. If you want to put a particular file or folder (or a small group of files or folders) at the top of a long list, start its name with an underscore (_). If the underscore doesn’t work, you can also try an exclamation point (!).

You may have to hit refresh to re-sort the folder and confirm that it worked.

If you want to put a file at the bottom, try using a tilde (~).

Chronological sorting

Ever tried to find something in a folder where the contents were named with dates in the format DAY-MONTH-YEAR, or, even worse, MONTH-DAY-YEAR?

5-8-2011
7-1-2015
9-15-2013
11-2-2012
12-3-2018

Yuck! That’s not sorted at all! Fortunately, the international date format standard, ISO 8601, comes to the rescue: if you write the date YEAR-MONTH-DAY, alphabetical order and chronological order are identical. Just make sure you include the zero if the month or day is less than 10: 2018-01-05, not 2018-1-5.

2011-05-08
2012-11-02
2013-09-15
2015-07-01
2018-12-03

Now that’s better. I’ve gotten so used to YMD format that I even date my paper notes using it. As xkcd says, YMD is the correct way:

xkcd 1179

Tip: If you need times as well, add them after the date in 24-hour HOUR-MINUTE format, where HOUR goes from 00 to 23 and MINUTE from 00 to 59.

Numerical sorting

How many times have you seen a list like this on a computer?

10.jpg
11.jpg
12.jpg
19.jpg
2.jpg
21.jpg
254.jpg
26.jpg
3.jpg

Obnoxious, but there’s an easy fix: left-pad the numbers with zeroes so every number is the same length. Here’s the result:

002.jpg
003.jpg
010.jpg
011.jpg
012.jpg
019.jpg
021.jpg
026.jpg
254.jpg

If you end up with a bunch of files that need leading zeroes added and there are enough that renaming them all would be cumbersome, you can change them with a script. Here’s an example in PowerShell – it will add zeroes to the beginning of filenames in the current folder that start with a number until the number is 3 digits wide (or any setting of $numDigits).

$numDigits = 3
function LeftPad ([string]$num, [int]$places) {
    if ($places - $num.Length -gt 0) {
        return '0' * ($places - $num.Length) + $num
    } else {
        return $num
    }
}
Get-ChildItem -File | ForEach-Object {
    if ($_.Name -match '^([0-9]+)(.*)') {
        $newName = (LeftPad $Matches[1] $numDigits) + $Matches[2]
        Rename-Item $_ -NewName $newName
    }
}

Warning: If you want to use this snippet, please back up the folder before running the code. I have tested it on my computer, but there is no warranty!

Manual sorting

It’s best to avoid sorting files manually (by adding ordered numbers or letters at the beginning of the names). It’s labor-intensive to adjust the sort order when you change the files, and it’s often not as helpful as you might hope. However, if you have a specific requirement that your files or folders be listed in a particular order, here’s a trick that can help reduce the amount of effort required for changes: begin by numbering by tens, being sure to use leading zeroes as discussed in the previous section, “Getting numbers to sort correctly.” Then if you need to add a new file between existing files, you can simply use a number between the existing numbers.

Here’s an example of how that would work out:

010 First File.docx
020 Second File.docx
025 Inserted File.docx
030 Third File.docx
040 Fourth File.docx