Roland's homepage

My random knot in the Web

Keyword expansion with git

One of the things I liked about the old rcs revision control system was that it supported keyword expansion in files. Unlike systems like rcs, cvs and subversion, the git revision control system cannot provide keyword expansion. The cause for this is that you can’t modify a file with information about the commit after you’ve committed, because git checksums the file first.

Git git will let you inject text in a file when it is checked out, and remove it when it is checked in. There are two ways of doing this. First, you can use the ident attribute. For any file type that has the ident attribute set (in .gitattributes), git will look for the string $Id$ on checkout and add the SHA-1 of the blob to it like this: $Id: daf7affdeadc31cbcf8689f2ac5fcb6ecb6fd85e $. While this unambiguously identifies the commit, it is not all that practical.

  • It cannot tell you the relative order of two commits.
  • It doesn’t tell you the commit date.

Luckily, keyword expansion can be done with git using attributes.

In my global git configuration file (~/.gitconfig) I have defined a filter called “kw”:

[filter "kw"]
   clean = kwclean
   smudge = kwset

This configuration uses two programs (which should be in your \$PATH) called kwset and kwclean to expand and contract keywords. These are two scripts written in python 3.

Note

Creative Commons Public Domain marker

To the extent possible under law, Roland Smith has waived all copyright and related or neighboring rights to kwset.py and kwclean.py. These works are published from the Netherlands.

To enable these substitutions, you have to use git attributes. E.g. to have keyword substitutions in all files in a repository, you need to add the following to the .gitattributes file in that repository;

* filter=kw

Such a general use of filters can be problematic with e.g. binary files like pictures. As a rule, modifying the contents of a binary (especially adding or removing bytes) tends to break them.

It is therefore better to be explicit and specific as to what types of file the filter should apply to;

*.py filter=kw
*.txt filter=kw

With this filter setup, file types that contain keywords and which are listed as such in the .gitattributes file will have them expanded on checkout.

To make these updated keywords visible in the working directory, changed objects will have to be checked out after their changes have been committed. To accomplish this, we can use the post-commit hook. There are several possible choices here. You can e.g.:

  • Check out the files which have changed since the previous commit.
  • Check out all files.

The first one is probably the most common case. I wrote the script update-modified-keywords.py for it. After a check-in it checks out all the files that were modified in the last commit.

But if all the directories in one file are part of one project, you probably want all files to carry the same date/revision. This is what the update-all-keywords.py script is for. After a check-in it checks out all the files that are under git’s control.

Put both these scripts in a location in your $PATH, and then make symbolic links from .git/hooks/post-commit to the appropriate script.


For comments, please send me an e-mail.


Related articles


←  Initializing a new git repo for a project Preventing the ~/Desktop direcory  →