In the previous previous instalment we looked at using egrep to search for a particular piece of text in a stream or file. egrep is often a great tool for finding a file you are looking for, but only if the file is a plain text file, and only if you are searching for that file based on its content. What if you want to search for files based on other criteria, like the last time the file was edited, or the name of the file, or the size of the file, or the type of the file etc.? For that you need a different command, for that you need find.

Listen Along: Taming the Terminal Podcast Episode 20

The Basics of the find Command

Regardless of the criteria you wish to use, the basic form of the find command is always the same, you first need to tell it where to look, then you tell it what criteria to use when searching:

The path is searched recursively (by default), so if you give a path of ~/Documents, it will search your documents folder, and all folders within your documents folder. To search your entire computer, and all mounted drives, use a path of just /. To use the current folder as the base of your search use a path of . (which always means ‘the current folder’ as we learned in instalment 4).

Defining Search Criteria

To see a full list of all possible search criteria, you can of course read the manual entry for find with the command man find, but we’ll look at some of the more common criteria you’ll be most likely to need here.

Search by File Name

You can do simple file-name searches with the -name flag followed by a simple pattern. Note that these simple patterns are NOT regular expressions, they use the same syntax as wild card expansion in BASH, i.e. * means any number of any characters, and ? means exactly one of any character.

A lot of the time you really don’t need the added power and complexity of regular expressions, because a lot of the time all you really want is the good old fashioned DOS pattern *.extension.

IMPORTANT – remember that * and ? have meanings in BASH, so you need to escape them in some way to get reliable results. It’s ugly and hard to read \* all over the place, so my suggestion is to get into the good habit of ALWAYS quoting your patterns when using -name.

Let’s start with a really simple example, you know the full name of the file you’re looking for, but, you have no idea where it is. This is something you often come across when someone asks you to take a look at their server. You know you need to edit, say, php.ini, but you have no idea where their version of PHP is installed (this is very common when using web server packages for OS X like MAMP or XAMPP), the command below will find all files called php.ini anywhere in your computer:

NOTE – if you’re going to search your whole computer (like we did above), you’ll see a lot of ‘permission denied’ errors. To avoid this, run the command with sudo, or, if you want to just ignore the errors, redirect STDERR to /dev/null with 2>/dev/null like we learned in instalment 16.

Something else you very often want to do is find all files of a given extension in a given location. For example, the command below will list all text files in your home directory:

You can get more creative by looking for all text files with names starting with a b:

Or all text files with a four letter name with a as the second letter:

JPEG files are an interesting case, you’ll very often see them with two different extensions, .jpg or .jpeg, how can we search for those without using regular expressions? The key is the -or flag. So, to look for files ending in either .jpg or .jpeg in our home directory, we could use the following:

It looks like we’ve cracked it, but actually, we haven’t fully. Some cameras, for reasons I simply cannot fathom, use the extension .JPG instead of .jpg or .jpeg. To get around this we could either add two sets of -or -name criteria, or, we could do a case-insensitive name-search by replacing -name with -iname. This gives us a final JPEG-finding command of:

Easily 90% of the time -name and -iname will be all you need to achieve your goals, but, sometimes, you really do need the power of full regular expressions. When this is the case, you can use the -regex or -iregex flags (-iregex being the case-insensitive version of -regex). There are two very important caveats when using regular expressions with find.

Firstly, unlike with -name, -regex and -iregex do not match against just the file name, they match against the entire file path. It’s important that you remember this when constructing your patterns, or you’ll get unexpected results, either false positives or false negatives.

Secondly, by default -regex and -iregex use Basic POSIX Regular Expressions (BREs), rather than the newer Extended POSIX Regular Expressions (EREs) we learned about in instalments 17 and 18. Don’t worry, you can make find use EREs, you just need to add a -E flag to the start of the command (before the path).

Given our knowledge of regular expressions, we could re-write our JPEG search as follows:

Again, like with egrep in the previous instalment, notice that we are quoting the RE to stop BASH wrongly interpreting any of the special characters we may need like * and ? in the above example. Also notice the position of the -E flag, and that we don’t need to use ^ or $ because the ENTIRE path has to match for the result to validate. This also means that without the .* at the start of the pattern no files will be returned.

Searching Based on Modified Time

Very often, we need to find something we were working on recently, and the key to finding such files is to search based on the time elapsed since a file was last modified. We can do just that with the -ctime flag (for changed time).

By default -ctime works in units of days, however, we can explicitly specify the units of time we’d like to use by appending one of the following after the number:

s
seconds
m
minutes
h
hours
d
days
w
weeks

Unless you specify a sign in front of the number, only files modified EXACTLY the specified amount of time in the past will be returned. That’s not usually useful. Instead, what you generally want are all files modified less than a certain amount of time ago, and to do that you add a minus sign before the number.

So, to find all files in your Documents folder that have been updated less than an hour ago you could use:

Searching Based on File Size

Another criteria we may want to search on is file size. We can do this using the -size flag. The default units used by -size are utterly unintuitive – 512k blocks! Thankfully, like -ctime, -size allows you to specify different units by appending a letter to the number. The following units are supported:

c
Characters (8-bit bytes)
k
KiB = 1024 bytes
M
MiB = 1024KiB (notice the case – must be upper!)
G
GiB = 1024MiB (notice the case – must be upper!)
T
TiB = 1024GiB (notice the case – must be upper!)
P
PiB = 1024TiB (notice the case – must be upper!)

Note that this command uses the old 1024-based sizes, not the 1,000 based SI units used by OS X and hard drive manufacturers (and scientists and engineers and anyone who understands what kilo and mega etc. actually mean).

Also Like with -ctime, if you don’t pre-fix the number with a symbol, only files EXACTLY the size specified will be returned.

For example, the following command shows all files in your downloads folder that are bigger than 200MiB in size:

Similarly, the following command shows all files in your documents folder smaller than 1MiB in size:

Filtering on File ‘type’

When I say file type, I mean that in the POSIX sense of the word, not the file extension sense of the word. In other words, I mean whether something is a regular file, a folder, a link, or some kind of special file.

The type of a file can be filtered using the -type flag followed by a valid file type abbreviation. The list below is not exhaustive, but it covers everything you’re likely to need:

f
a regular file
d
a directory (AKA folder)
l
a symbolic link

This flag will almost always be used in conjunction with one or more other search flags. For example, the following command finds all directories in your documents folder that contain the word temp in their name in any case:

Inverting Search Parameters

In most situations it’s easiest to express what it is you want to search for, but sometimes it’s easier to specify what you don’t want. In situations like this it can be very useful to be able to invert the effect of a single search parameter. You can do this with the -not flag.

For example, you may have a folder where you keep your music, and it should only contain MP3 files and folders. To be sure that’s true you could search for all regular files that do not end in .mp3 and are not hidden (like those ever-present .DS_Store files) with a command like:

Limiting Recursion

By default the find command will drill down into every folder contained in the specified path, but, you can limit the depth of the search with the -maxdepth flag. To search only the specified folder and no deeper use -maxdepth 1.

Note that limiting the depth can really speed up searches of large folders if you know what you want is not deep down in the hierarchy. For example, if you have a lot of documents in your documents folder it can take ages to search it, but, if you are only interested in finding stuff at the top level you can really speed things up. Lets say we are the kind of person who makes lots of temp folders at the top level of their Documents folder (guilty as charged), and you want to find them all so you can do a bit of house keeping, you could search your entire Documents folder with:

When I do this it takes literally minutes to return because I have over a TB of files in my Documents folder. I can get that down to fractions of a seconds by telling find that I’m only interested in the top level stuff with:

Combining Search Criteria (Boolean Algebra)

We’ve already seen that we can use the -or and -not flags, but there is also a -and flag. In fact, if you don’t separate your criteria with a -or flag, a -and flag is implied.

The following example from above:

Is actually interpreted as:

We can even take things a step further and add sub-expressions using ( and ) to start and end each sub expression (they can even be nested). Note that ( and ) have meaning in BASH, so they need to be either escaped or quoted. Since I find escaping makes everything hard to read and understand, I recommend always quoting these operators.

As a final example, the following command will find large powerpoint presentations in your Documents folder, i.e. all files bigger than 100MiB in size that end in .ppt or .pptx.

Conclusions

In this instalment we’ve seen that we can use the find command to search for files based on all sorts of criteria, and that we can combine those criteria using boolean algebra to generate very powerful search queries. In the next instalment we’ll discover that you can use the find command not only to search for files, but to apply an action to every file it finds.

The find command is common to all POSIX operating systems, so it works on Linux, Unix, and OS X. OS X maintains an index of your files allowing quick searching in the Finder and via Spotlight. Because this index is kept up to date by the OS, it makes searching with Spotlight much quicker than searching with find. In the next instalment we’ll also discover that OS X ships with a terminal command that allows you to use the power of Spotlight from the command line!