Sometimes, it’s useful to look back on your filesystem history.
For example, after installing some new software, you might want to know which files have changed on your hard drive. Or, if you’re a programmer getting started on a new project, you may need to follow a complex and unfamiliar build process. A list of recently modified files can reveal a lot about how that build process works.
Here’s a short Python script to create such a list. It lists the contents of a folder recursively, sorted by modification time.
As a simple example, I ran it after setting up a fresh copy of my random number sequence project. Here’s the output (with some lines deleted to save space):
2013-01-14 21:44:29 5564 .\build\Testing\Temporary\LastTest.log
2013-01-14 21:44:29 29 .\build\Testing\Temporary\CTestCostData.txt
------------------------------
2013-01-14 21:28:38 91 .\build\Win32\Release\ALL_BUILD\ALL_BUILD.lastbuildstate
2013-01-14 21:28:38 1560 .\build\Win32\Release\ALL_BUILD\custombuild.command.1.tlog
2013-01-14 21:28:38 6386 .\build\Win32\Release\ALL_BUILD\custombuild.read.1.tlog
2013-01-14 21:28:38 674 .\build\Win32\Release\ALL_BUILD\custombuild.write.1.tlog
2013-01-14 21:28:38 51 .\build\CMakeFiles\generate.stamp
2013-01-14 21:28:37 91 .\build\RandomSequence.dir\Release\RandomSequence.lastbuildstate
2013-01-14 21:28:37 678 .\build\RandomSequence.dir\Release\mt.command.1.tlog
2013-01-14 21:28:37 818 .\build\RandomSequence.dir\Release\mt.read.1.tlog
2013-01-14 21:28:37 446 .\build\RandomSequence.dir\Release\mt.write.1.tlog
2013-01-14 21:28:37 7680 .\build\Release\RandomSequence.exe
...
------------------------------
2013-01-14 21:28:21 86 .\build\CMakeFiles\cmake.check_cache
2013-01-14 21:28:21 12856 .\build\CMakeCache.txt
2013-01-14 21:28:21 3712 .\build\RandomSequence.sln
2013-01-14 21:28:21 270 .\build\CMakeFiles\TargetDirectories.txt
2013-01-14 21:28:21 391 .\build\CTestTestfile.cmake
2013-01-14 21:28:21 1586 .\build\cmake_install.cmake
2013-01-14 21:28:21 4204 .\build\CMakeFiles\generate.stamp.depend
2013-01-14 21:28:21 25207 .\build\ZERO_CHECK.vcxproj
2013-01-14 21:28:21 832 .\build\ZERO_CHECK.vcxproj.filters
...
------------------------------
2013-01-14 21:27:40 959 .\randomsequence.h
2013-01-14 21:27:40 416 .\.git\index
2013-01-14 21:27:40 1255 .\main.cpp
2013-01-14 21:27:40 714 .\README.md
2013-01-14 21:27:40 246 .\CMakeLists.txt
2013-01-14 21:27:40 12 .\.gitignore
2013-01-14 21:27:40 336 .\.git\config
2013-01-14 21:27:40 201 .\.git\logs\refs\heads\master
2013-01-14 21:27:40 201 .\.git\logs\HEAD
...
The horizontal dashes separate modifications greater than 10 seconds apart, which helps organize the files visually into groups. In reverse order, you can see the groups of files created by git clone
, project files generated by cmake
, the build output from cmake --build
, and a couple of files written by ctest
.
I’ve used this kind of script to help make sense of the filesystem on Ubuntu, and to figure out where files were written on MacOS X using the App Store.
Command-Line Options
Running with no options or with --help
displays the following help message:
Usage: list_modifications.py [options] path [path2 ...]
Options:
-h, --help show this help message and exit
-g SECS set threshold for grouping files
-f EXC_FILES exclude files matching a wildcard pattern
-d EXC_DIRS exclude directories matching a wildcard pattern
You can filter the output using -f
and -d
. For example:
list_modifications.py -d obj* -f *.log -f *.bin -g 30 .git build\CMakeFiles
The above command lists the contents of the .git
and build\CMakeFiles
folders, excluding the objects
subfolder and any files ending in .log
or .bin
. It also groups files modified within 30 seconds of each other, instead of the default 10.
A Quick Look at the Code
This script is a pretty good example of the kind of problem Python can solve quickly using very little code. Here’s a quick run-through.
parser = optparse.OptionParser(usage='Usage: %prog [options] path [path2 ...]') parser.add_option('-g', action='store', type='long', dest='secs', default=10, help='set threshold for grouping files') parser.add_option('-f', action='append', type='string', dest='exc_files', default=[], help='exclude files matching a wildcard pattern') parser.add_option('-d', action='append', type='string', dest='exc_dirs', default=[], help='exclude directories matching a wildcard pattern') options, roots = parser.parse_args()
This block of code takes care of all command-line option parsing using the built-in optparse
module. optparse
is deprecated as of Python 2.7, but it’s handy and available since Python 2.5. The --help
option is handled automatically.
The -f
option uses the 'append'
action with a default of []
, which means the user can specify -f
multiple times, creating a list. In the previous example, we end up with options.exc_files
set to ['*.log', '*.bin']
. Any leftover positional arguments are assigned to roots
as another list; in the previous example, roots
becomes ['.git', 'build\\CMakeFiles']
.
def iterFiles(options, roots): """" A generator to enumerate the contents of directories recursively. """ for root in roots: for dirpath, dirnames, filenames in os.walk(root): name = os.path.split(dirpath)[1] if any(fnmatch.fnmatch(name, w) for w in options.exc_dirs): del dirnames[:] # Don't recurse here continue for fn in filenames: if any(fnmatch.fnmatch(fn, w) for w in options.exc_files): continue path = os.path.join(dirpath, fn) stat = os.lstat(path) mtime = max(stat.st_mtime, stat.st_ctime) yield mtime, stat.st_size, path
iterFiles
looks like a function definition, but the presence of the yield
statement in the body means it actually defines a generator. As such, calling iterFiles()
does not actually execute the function. It returns an iterator, which you can then use in a for
loop, as we’ll see later.
iterFiles
uses the os.walk
generator, which lets us modify the contents of dirnames
in-place during iteration. In particular, we clear the contents of the list using del dirnames[:]
to avoid descending into certain subdirectories.
In the above code, the expression any(fnmatch.fnmatch(name, w) for w in options.exc_dirs)
is known as a generator expression. It’s a lot like a list comprehension, but we’re allowed to omit the square brackets since the list is fed to a single function. In this case, the any
function will return True
if fnmatch.fnmatch(name, w)
returns True
for any item in the list.
ptime = 0 for mtime, size, path in sorted(iterFiles(options, roots), reverse=True): if ptime - mtime >= options.secs: print('-' * 30) timeStr = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(mtime)) print('%s %10d %s' % (timeStr, size, path)) ptime = mtime
Here, we feed the iterFiles
generator to sorted
, resulting in a sorted list of 3-tuples. The list is sorted by the first item in the tuple – the modification time – which is exactly what we want. We loop through, writing one line of formatted output for each tuple. Since Python lets us multiply a string by an integer, '-' * 30
is used as a shortcut for drawing horizontal lines.
That’s all there is to it! Hopefully, some readers have managed pick up a few nuggets of Pythonic goodness along the way.