Sep
7
TTT Part 20 of n – File Searches
Filed Under Computers & Tech, System Administration on September 7, 2014 at 1:04 pm
- TTT Part 1 of n – Command Shells
- TTT Part 2 of n – Commands
- TTT Part 3 of n – File Systems
- TTT Part 4 of n – Navigation
- TTT Part 5 of n – File Permissions
- TTT Part 6 of n – More File Permissions
- TTT Part 7 of n – Managing Files
- TTT Part 8 of n – Processes
- TTT Part 9 of n – Controlling Processes
- TTT Part 10 of n – man
- TTT Part 11 of n – Text Files
- TTT Part 12 of n – the Environment
- TTT Part 13 of n – PATH
- TTT Part 14 of n – Aliases & Prompts
- TTT Part 15 of n – ‘Plumbing’
- TTT Part 16 of n – Crossing the Streams
- TTT Part 17 of n – Regular Expressions
- TTT Part 18 of n – More REs
- TTT Part 19 of n – Text Searches
- TTT Part 20 of n – File Searches
- TTT Part 21 of n – More Searching
- TTT Part 22 of n – Tips & Tricks
- TTT Part 23 of n – Networking Intro
- TTT Part 24 of n – Ethernet & ARP
- TTT Part 25 of n – IP Subnets
- TTT Part 26 of n – DHCP
- TTT Part 27 of n – DNS
- TTT Part 28 of n – Network Troubleshooting
- TTT Part 29 of n – Intro to SSH
- TTT Part 30 of n – SSHing More Securely
- TTT Part 31 of n – SSH File Transfers
- TTT Part 32 of n – SSH Tunnelling
- TTT Part 33 of n – SSH ‘Bookmarks’
- TTT Part 34 of n – Introducing HTTP
- TTT Part 35 of n – HTTP Commands
- TTT Part 36 of n – screen & cron
- TTT Part 37 of n – SSH Agents
- TTT Part 38 of n – TMUX (A Screen Alternative)
- TTT Part 39 of n – Advanced TMUX
In the previous previous instalment we looked at using egrep
to search for a particular piece of text in a stream or file. egrep
is often a great tool for finding a file you are looking for, but only if the file is a plain text file, and only if you are searching for that file based on its content. What if you want to search for files based on other criteria, like the last time the file was edited, or the name of the file, or the size of the file, or the type of the file etc.? For that you need a different command, for that you need find
.
Listen Along: Taming the Terminal Podcast Episode 20
The Basics of the find
Command
Regardless of the criteria you wish to use, the basic form of the find
command is always the same, you first need to tell it where to look, then you tell it what criteria to use when searching:
find path criteria 1 [criteria 2 ...]
The path is searched recursively (by default), so if you give a path of ~/Documents
, it will search your documents folder, and all folders within your documents folder. To search your entire computer, and all mounted drives, use a path of just /
. To use the current folder as the base of your search use a path of .
(which always means ‘the current folder’ as we learned in instalment 4).
Defining Search Criteria
To see a full list of all possible search criteria, you can of course read the manual entry for find with the command man find
, but we’ll look at some of the more common criteria you’ll be most likely to need here.
Search by File Name
You can do simple file-name searches with the -name
flag followed by a simple pattern. Note that these simple patterns are NOT regular expressions, they use the same syntax as wild card expansion in BASH, i.e. *
means any number of any characters, and ?
means exactly one of any character.
A lot of the time you really don’t need the added power and complexity of regular expressions, because a lot of the time all you really want is the good old fashioned DOS pattern *.extension
.
IMPORTANT – remember that *
and ?
have meanings in BASH, so you need to escape them in some way to get reliable results. It’s ugly and hard to read \*
all over the place, so my suggestion is to get into the good habit of ALWAYS quoting your patterns when using -name
.
Let’s start with a really simple example, you know the full name of the file you’re looking for, but, you have no idea where it is. This is something you often come across when someone asks you to take a look at their server. You know you need to edit, say, php.ini
, but you have no idea where their version of PHP is installed (this is very common when using web server packages for OS X like MAMP or XAMPP), the command below will find all files called php.ini
anywhere in your computer:
find / -name 'php.ini'
NOTE – if you’re going to search your whole computer (like we did above), you’ll see a lot of ‘permission denied’ errors. To avoid this, run the command with sudo
, or, if you want to just ignore the errors, redirect STDERR
to /dev/null
with 2>/dev/null
like we learned in instalment 16.
Something else you very often want to do is find all files of a given extension in a given location. For example, the command below will list all text files in your home directory:
find ~ -name '*.txt'
You can get more creative by looking for all text files with names starting with a b
:
find ~ -name 'b*.txt'
Or all text files with a four letter name with a
as the second letter:
find ~ -name '?a??.txt'
JPEG files are an interesting case, you’ll very often see them with two different extensions, .jpg
or .jpeg
, how can we search for those without using regular expressions? The key is the -or
flag. So, to look for files ending in either .jpg
or .jpeg
in our home directory, we could use the following:
find ~ -name '*.jpg' -or -name '*.jpeg'
It looks like we’ve cracked it, but actually, we haven’t fully. Some cameras, for reasons I simply cannot fathom, use the extension .JPG
instead of .jpg
or .jpeg
. To get around this we could either add two sets of -or -name
criteria, or, we could do a case-insensitive name-search by replacing -name
with -iname
. This gives us a final JPEG-finding command of:
find ~ -iname '*.jpg' -or -iname '*.jpeg'
Easily 90% of the time -name
and -iname
will be all you need to achieve your goals, but, sometimes, you really do need the power of full regular expressions. When this is the case, you can use the -regex
or -iregex
flags (-iregex
being the case-insensitive version of -regex
). There are two very important caveats when using regular expressions with find
.
Firstly, unlike with -name
, -regex
and -iregex
do not match against just the file name, they match against the entire file path. It’s important that you remember this when constructing your patterns, or you’ll get unexpected results, either false positives or false negatives.
Secondly, by default -regex
and -iregex
use Basic POSIX Regular Expressions (BREs), rather than the newer Extended POSIX Regular Expressions (EREs) we learned about in instalments 17 and 18. Don’t worry, you can make find use EREs, you just need to add a -E
flag to the start of the command (before the path).
Given our knowledge of regular expressions, we could re-write our JPEG search as follows:
find -E ~ -iregex '.*[.]jpe?g'
Again, like with egrep
in the previous instalment, notice that we are quoting the RE to stop BASH wrongly interpreting any of the special characters we may need like *
and ?
in the above example. Also notice the position of the -E
flag, and that we don’t need to use ^
or $
because the ENTIRE path has to match for the result to validate. This also means that without the .*
at the start of the pattern no files will be returned.
Searching Based on Modified Time
Very often, we need to find something we were working on recently, and the key to finding such files is to search based on the time elapsed since a file was last modified. We can do just that with the -ctime
flag (for changed time).
By default -ctime
works in units of days, however, we can explicitly specify the units of time we’d like to use by appending one of the following after the number:
s
- seconds
m
- minutes
h
- hours
d
- days
w
- weeks
Unless you specify a sign in front of the number, only files modified EXACTLY the specified amount of time in the past will be returned. That’s not usually useful. Instead, what you generally want are all files modified less than a certain amount of time ago, and to do that you add a minus sign before the number.
So, to find all files in your Documents folder that have been updated less than an hour ago you could use:
find ~/Documents -ctime -1h
Searching Based on File Size
Another criteria we may want to search on is file size. We can do this using the -size
flag. The default units used by -size
are utterly unintuitive – 512k blocks! Thankfully, like -ctime
, -size
allows you to specify different units by appending a letter to the number. The following units are supported:
c
- Characters (8-bit bytes)
k
- KiB = 1024 bytes
M
- MiB = 1024KiB (notice the case – must be upper!)
G
- GiB = 1024MiB (notice the case – must be upper!)
T
- TiB = 1024GiB (notice the case – must be upper!)
P
- PiB = 1024TiB (notice the case – must be upper!)
Note that this command uses the old 1024-based sizes, not the 1,000 based SI units used by OS X and hard drive manufacturers (and scientists and engineers and anyone who understands what kilo and mega etc. actually mean).
Also Like with -ctime
, if you don’t pre-fix the number with a symbol, only files EXACTLY the size specified will be returned.
For example, the following command shows all files in your downloads folder that are bigger than 200MiB in size:
find ~/Downloads -size +200M
Similarly, the following command shows all files in your documents folder smaller than 1MiB in size:
find ~/Downloads -size -1M
Filtering on File ‘type’
When I say file type, I mean that in the POSIX sense of the word, not the file extension sense of the word. In other words, I mean whether something is a regular file, a folder, a link, or some kind of special file.
The type of a file can be filtered using the -type
flag followed by a valid file type abbreviation. The list below is not exhaustive, but it covers everything you’re likely to need:
f
- a regular file
d
- a directory (AKA folder)
l
- a symbolic link
This flag will almost always be used in conjunction with one or more other search flags. For example, the following command finds all directories in your documents folder that contain the word temp
in their name in any case:
find ~/Documents -type d -iname '*temp*'
Inverting Search Parameters
In most situations it’s easiest to express what it is you want to search for, but sometimes it’s easier to specify what you don’t want. In situations like this it can be very useful to be able to invert the effect of a single search parameter. You can do this with the -not
flag.
For example, you may have a folder where you keep your music, and it should only contain MP3 files and folders. To be sure that’s true you could search for all regular files that do not end in .mp3
and are not hidden (like those ever-present .DS_Store
files) with a command like:
find ~/Music/MyMP3s -type f -not -iname '*.mp3' -not -name '.*'
Limiting Recursion
By default the find command will drill down into every folder contained in the specified path, but, you can limit the depth of the search with the -maxdepth
flag. To search only the specified folder and no deeper use -maxdepth 1
.
Note that limiting the depth can really speed up searches of large folders if you know what you want is not deep down in the hierarchy. For example, if you have a lot of documents in your documents folder it can take ages to search it, but, if you are only interested in finding stuff at the top level you can really speed things up. Lets say we are the kind of person who makes lots of temp folders at the top level of their Documents folder (guilty as charged), and you want to find them all so you can do a bit of house keeping, you could search your entire Documents folder with:
find ~/Documents -type d -iname '*temp*'
When I do this it takes literally minutes to return because I have over a TB of files in my Documents folder. I can get that down to fractions of a seconds by telling find
that I’m only interested in the top level stuff with:
find ~/Documents -type d -iname '*temp*' -maxdepth 1
Combining Search Criteria (Boolean Algebra)
We’ve already seen that we can use the -or
and -not
flags, but there is also a -and
flag. In fact, if you don’t separate your criteria with a -or
flag, a -and
flag is implied.
The following example from above:
find ~/Music/MyMP3s -type f -not -iname '*.mp3' -not -name '.*'
Is actually interpreted as:
find ~/Music/MyMP3s -type f -and -not -iname '*.mp3' -and -not -name '.*'
We can even take things a step further and add sub-expressions using (
and )
to start and end each sub expression (they can even be nested). Note that (
and )
have meaning in BASH, so they need to be either escaped or quoted. Since I find escaping makes everything hard to read and understand, I recommend always quoting these operators.
As a final example, the following command will find large powerpoint presentations in your Documents folder, i.e. all files bigger than 100MiB in size that end in .ppt
or .pptx
.
find ~/Documents -size +100M '(' -iname '*.ppt' -or -iname '*.pptx' ')'
Conclusions
In this instalment we’ve seen that we can use the find
command to search for files based on all sorts of criteria, and that we can combine those criteria using boolean algebra to generate very powerful search queries. In the next instalment we’ll discover that you can use the find
command not only to search for files, but to apply an action to every file it finds.
The find
command is common to all POSIX operating systems, so it works on Linux, Unix, and OS X. OS X maintains an index of your files allowing quick searching in the Finder and via Spotlight. Because this index is kept up to date by the OS, it makes searching with Spotlight much quicker than searching with find
. In the next instalment we’ll also discover that OS X ships with a terminal command that allows you to use the power of Spotlight from the command line!
[…] opened the NosillaCast News (via Mailchimp). In Chit Chat Across the Pond Bart takes us through Taming the Terminal Part 20 of n where he teaches us how to find files by different […]
[…] In the first instalment we learned how to search for text within files and streams using egrep. In the second we learned to search for files based on all sorts of criteria with the find command. In this final […]