Sometimes, it’s useful to look back on your filesystem history.
For example, after installing some new software, you might want to know which files have changed on your hard drive. Or, if you’re a programmer getting started on a new project, you may need to follow a complex and unfamiliar build process. A list of recently modified files can reveal a lot about how that build process works.
Here’s a short Python script to create such a list. It lists the contents of a folder recursively, sorted by modification time.
As a simple example, I ran it after setting up a fresh copy of my random number sequence project. Here’s the output (with some lines deleted to save space):
2013-01-14 21:44:29 5564 .\build\Testing\Temporary\LastTest.log 2013-01-14 21:44:29 29 .\build\Testing\Temporary\CTestCostData.txt ------------------------------ 2013-01-14 21:28:38 91 .\build\Win32\Release\ALL_BUILD\ALL_BUILD.lastbuildstate 2013-01-14 21:28:38 1560 .\build\Win32\Release\ALL_BUILD\custombuild.command.1.tlog 2013-01-14 21:28:38 6386 .\build\Win32\Release\ALL_BUILD\custombuild.read.1.tlog 2013-01-14 21:28:38 674 .\build\Win32\Release\ALL_BUILD\custombuild.write.1.tlog 2013-01-14 21:28:38 51 .\build\CMakeFiles\generate.stamp 2013-01-14 21:28:37 91 .\build\RandomSequence.dir\Release\RandomSequence.lastbuildstate 2013-01-14 21:28:37 678 .\build\RandomSequence.dir\Release\mt.command.1.tlog 2013-01-14 21:28:37 818 .\build\RandomSequence.dir\Release\mt.read.1.tlog 2013-01-14 21:28:37 446 .\build\RandomSequence.dir\Release\mt.write.1.tlog 2013-01-14 21:28:37 7680 .\build\Release\RandomSequence.exe ... ------------------------------ 2013-01-14 21:28:21 86 .\build\CMakeFiles\cmake.check_cache 2013-01-14 21:28:21 12856 .\build\CMakeCache.txt 2013-01-14 21:28:21 3712 .\build\RandomSequence.sln 2013-01-14 21:28:21 270 .\build\CMakeFiles\TargetDirectories.txt 2013-01-14 21:28:21 391 .\build\CTestTestfile.cmake 2013-01-14 21:28:21 1586 .\build\cmake_install.cmake 2013-01-14 21:28:21 4204 .\build\CMakeFiles\generate.stamp.depend 2013-01-14 21:28:21 25207 .\build\ZERO_CHECK.vcxproj 2013-01-14 21:28:21 832 .\build\ZERO_CHECK.vcxproj.filters ... ------------------------------ 2013-01-14 21:27:40 959 .\randomsequence.h 2013-01-14 21:27:40 416 .\.git\index 2013-01-14 21:27:40 1255 .\main.cpp 2013-01-14 21:27:40 714 .\README.md 2013-01-14 21:27:40 246 .\CMakeLists.txt 2013-01-14 21:27:40 12 .\.gitignore 2013-01-14 21:27:40 336 .\.git\config 2013-01-14 21:27:40 201 .\.git\logs\refs\heads\master 2013-01-14 21:27:40 201 .\.git\logs\HEAD ...
The horizontal dashes separate modifications greater than 10 seconds apart, which helps organize the files visually into groups. In reverse order, you can see the groups of files created by git clone, project files generated by cmake, the build output from cmake --build, and a couple of files written by ctest.
I’ve used this kind of script to help make sense of the filesystem on Ubuntu, and to figure out where files were written on MacOS X using the App Store.
Command-Line Options
Running with no options or with --help displays the following help message:
Usage: list_modifications.py [options] path [path2 ...] Options: -h, --help show this help message and exit -g SECS set threshold for grouping files -f EXC_FILES exclude files matching a wildcard pattern -d EXC_DIRS exclude directories matching a wildcard pattern
You can filter the output using -f and -d. For example:
list_modifications.py -d obj* -f *.log -f *.bin -g 30 .git build\CMakeFiles
The above command lists the contents of the .git and build\CMakeFiles folders, excluding the objects subfolder and any files ending in .log or .bin. It also groups files modified within 30 seconds of each other, instead of the default 10.
A Quick Look at the Code
This script is a pretty good example of the kind of problem Python can solve quickly using very little code. Here’s a quick run-through.
parser = optparse.OptionParser(usage='Usage: %prog [options] path [path2 ...]') parser.add_option('-g', action='store', type='long', dest='secs', default=10, help='set threshold for grouping files') parser.add_option('-f', action='append', type='string', dest='exc_files', default=[], help='exclude files matching a wildcard pattern') parser.add_option('-d', action='append', type='string', dest='exc_dirs', default=[], help='exclude directories matching a wildcard pattern') options, roots = parser.parse_args()
This block of code takes care of all command-line option parsing using the built-in optparse module. optparse is deprecated as of Python 2.7, but it’s handy and available since Python 2.5. The --help option is handled automatically.
The -f option uses the 'append' action with a default of [], which means the user can specify -f multiple times, creating a list. In the previous example, we end up with options.exc_files set to ['*.log', '*.bin']. Any leftover positional arguments are assigned to roots as another list; in the previous example, roots becomes ['.git', 'build\\CMakeFiles'].
def iterFiles(options, roots):
"""" A generator to enumerate the contents of directories recursively. """
for root in roots:
for dirpath, dirnames, filenames in os.walk(root):
name = os.path.split(dirpath)[1]
if any(fnmatch.fnmatch(name, w) for w in options.exc_dirs):
del dirnames[:] # Don't recurse here
continue
for fn in filenames:
if any(fnmatch.fnmatch(fn, w) for w in options.exc_files):
continue
path = os.path.join(dirpath, fn)
mtime = os.path.getmtime(path)
size = os.path.getsize(path)
yield mtime, size, path
iterFiles looks like a function definition, but the presence of the yield statement in the body means it actually defines a generator. As such, calling iterFiles() does not actually execute the function. It returns an iterator, which you can then use in a for loop, as we’ll see later.
iterFiles uses the os.walk generator, which lets us modify the contents of dirnames in-place during iteration. In particular, we clear the contents of the list using del dirnames[:] to avoid descending into certain subdirectories.
In the above code, the expression any(fnmatch.fnmatch(name, w) for w in options.exc_dirs) is known as a generator expression. It’s a lot like a list comprehension, but we’re allowed to omit the square brackets since the list is fed to a single function. In this case, the any function will return True if fnmatch.fnmatch(name, w) returns True for any item in the list.
ptime = 0 for mtime, size, path in sorted(iterFiles(options, roots), reverse=True): if ptime - mtime >= options.secs: print('-' * 30) timeStr = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(mtime)) print('%s %10d %s' % (timeStr, size, path)) ptime = mtime
Here, we feed the iterFiles generator to sorted, resulting in a sorted list of 3-tuples. The list is sorted by the first item in the tuple — the modification time — which is exactly what we want. We loop through, writing one line of formatted output for each tuple. Since Python lets us multiply a string by an integer, '-' * 30 is used as a shortcut for drawing horizontal lines.
That’s all there is to it! Hopefully, some readers have managed pick up a few nuggets of Pythonic goodness along the way.
Why not use mtree?
Important thing about generator expressions is that they are lazy, not that they just don’t require square brackets.
Yes, I did find a nugget I did not know yet: any(). I’m going to re-write some code now
I can’t get it past the help message. I try to give it a filepath and get no result. Am I using the wrong path or something? How should it be formatted to get a result?
all posts
If you like this blog, and you've found the posts valuable to you in some way, consider leaving a tip!
© 2011-2012 Jeff Preshing. Powered by WordPress.