Roland's homepage

My random knot in the Web

Finding your own files, fast

Inspired by this blog post by Julia Evans, I set up a quick way to find files in my home directory. This howto is based on FreeBSD, but should work (with some tweaking maybe) on Linux and OS-X as well.

Introduction

Since my home directory isn’t world-readable, FreeBSD’s locate(1) doesn’t work in it. It works fine for the rest of the system, though.

The above mentioned post motivated me to make an alternative. When I’m looking for files, it is most often for my own stuff. So it makes sense to make a search function that is restricted to the names of files in my own home directory.

Note that this is about searching for filenames. For searching through file contents, I heartily recommend ag.

Direct method

The first method I tried is to use good old find directly. I should point out that my computer is several years old and does not have an SSD.

> time find /home/$USER -type f -name '*foo*' > /dev/null
0.125u 0.881s 0:01.00 100.0%        61+174k 0+0io 0pf+0w

Given that my home directory contains in the order of 36.000 files, this seems not too bad to me. But it’s a handful to type.

The same can be done using ag, using its -g option.

> time ag --nocolor -g foo ${HOME} > /dev/null
0.332u 0.253s 0:00.58 100.0%        99+191k 0+0io 0pf+0w

This is faster than find partly because it ignores things like .git directories, but still not instantaneous like locate.

Both of these methods search through everything. Including dotfiles and dot directories (although ag ignores some), temporary directories and caches. In my case, my ${HOME}/tmp/ and ${HOME}/.cache/ contain a ton of crap, because I don’t clean them very often. Generally, I’m not interested in finding stuff there.

Using a file list

Now following Julia’s idea, let’s build a “database” for searching. The easiest way is just to make a list of all files in my $HOME, excluding the things that I don’t care about.

> time find $HOME -type f -not -path '*/.git/*' \
-not -path '*/tmp/*' -not -path '*/.*' > ${HOME}/.allfiles
0.722u 0.863s 0:01.58 100.0%        61+173k 0+17io 0pf+0w

This only takes about 1½ seconds to run. On my $HOME, this yields a 2.2 MiB file containing approximately 36.000 lines.

Let’s now search in this file with egrep.

> time egrep foo ~/.allfiles >/dev/null
0.034u 0.000s 0:00.03 100.0%        73+280k 0+0io 0pf+0w

This is pretty much instantaneous, so I decided to stick with this system.

To make sure that the “database” stays up to date, I set up the above mentioned find command (minus time) in my personal crontab file to be run every hour. So unlike the locate database which is only updated weekly on FreeBSD, my personal files list is updated hourly. That’s enough for me. If I’m looking for a file, chances are that it was created more than an hour ago! :-)

To make life easier for me, I’ve defined a alias in tcsh for the egrep command shown above.

alias ffind egrep -i \!:1 ${HOME}/.allfiles

On FreeBSD, find and egrep are part of the base system, so they’re always available.


←  Adding scale markers to a scanned image Reading xlsx files with Python  →