(Note: I talk a lot about Git in this essay, but the same thing can probably be said about Mercurial, Bazaar, et al. I simply use Git because that’s the system that I’m personally familiar with.)
I’ve been a Git user for a few years now. I love it. Being able to branch-n-merge freely is a very powerful tool for anyone who needs to manage a codebase. But I find that there is one area where Git falls short; in this essay I will make a humble first attempt at designing a system that tries to solve this problem. I hope that someone will pick up the glove and create this project. If anyone implements this well, I would personally consider it to be the next revolution after Git; i.e., The difference in productivity between this system and Git would be on the same scale as the difference between Git and SVN.
The missing feature is: Templating.
In a codebase you often have code snippets that repeat themselves a few times in different places. For example you have a code file
chair.py that defines a class
Chair, and another file
bar_stool.py that defines a class
BarStool, and they share a lot of code, but they have a few small differences between them. You don’t want your source-control to store two copies of the same code; not because you’re cheap on hard-disk space, but because every time you’ll want to change the shared code, you’ll have to change it in two (or more) places. This is called the DRY principle, “Don’t repeat yourself”, and it’s a very good principle.
I know what you’re thinking. “You should refactor the
BarStool classes so that the shared code will be put in a base class!” Sure, if you can. I’m personally a refactoring fanatic. I refactor everything that moves. But sometimes you just can’t refactor. Sometimes you’re completely unable to refactor, and sometimes you are able to refactor but it’s just not practical, because it will make your code too dynamic and introduce too much indirection.
GarlicSim has several examples of situations in which you have repeating code that would be impractical to refactor. A good one is the
setup.py files. GarlicSim is comprised of three Python packages:
garlicsim_wx. Like all Python packages, they each have a
garlicsim_wx/setup.py. As you can see, these files have many lines in common with each other. But there is no practical way to refactor those identical parts away.
The solution is templating. Instead of manipulating the code files directly, you manipulate succinct descriptions of the code files. So instead of maintaining three
setup.py files, I would have to maintain one generic
setup.py template, and then describe each of the three actual
setup.py files as deviations from that template.
Any programmer worth his salt would feel in his guts that templating is “the correct solution,” that it’s the way things should be. But then, almost no open-source project uses such a templating scheme to maintain their code. Why?
There’s a big difference between solving a problem in the abstract and solving it in practice. A templating system is indeed “the correct solution,” but that’s not enough to make it a solution worthy of being used.
Imagine if I tried to use a templating system, say Jinja, to produce my
setup.py files. I would have a general template, and then three descriptions of how each
setup.py file deviates from the templates. And then I would have to generate the actual
setup.py files from the templates.
That would be annoying. One of the big issues is that you can’t edit the generated code. I’ve often seen messages like:
# This file was created automatically by the templating system. # Don't modify this file, modify the templating interface instead.
That really sucks. It sucks to be forced to edit your template files instead of the code files. Your IDE just can’t grok the template as a source file, so it loses all intelligence features. The same thing can be said about the developer himself, actually… You always want to be able to view and edit your source files directly.
Here’s a design that might work. Let’s codename it “Leapfrog”, just for the sake of easy reference.
This is a design that works on top of Git, or on top of any other source-control system. Now, when you have a Git repo, Git saves all its data in a hidden
.git folder in your project’s root folder. We are going to do something similar; we’ll save all of our data in a
.leapfrog folder that will sit in the root folder alongside the
.leapfrog folder will contain all the templates and all the data needed to generate your project’s source files. The
.leapfrog folder will be the only thing tracked by Git; your actual source files will be git-ignored.
So the contents of your
.leapfrog folder may look like this:
.leapfrog/ setup.py.abstract garlicsim/ setup.py garlicsim_lib/ setup.py garlicsim_wx/ setup.py
setup.py.abstract is a template, and each
setup.py file extends that template. This is all the information that is needed to generate the actual
setup.py files in the so-called “working directory.” (The template format can be in Jinja or whatever.)
Now, in order to synchronize the working directory with the
.leapfrog directory, you’ll have an interface quite similar to Git. A
leapfrog checkout action would generate the code files from the templates. But the powerful feature would be that you could go the other way too; you could edit the code files and then “stage” that to the templates.
For example, you could edit
garlicim_lib/setup.py and add a bunch of lines. Then you would run
leapfrog add . to stage, just like you do in Git. The changes will be added to the
.leapfrog/garlicim_lib/setup.py template. Then you could use
git add . and
git commit to stage and commit the changes to the template to Git.
Probably some people will say, “there are many existing templating systems out there that you can use!” To which I can give the Drew Houston reply:
You: There are a million of templating systems out there!
Me: Do you use any of them to manage your codebase?
There are probably mistakes in my design. There are probably problems I didn’t think of. Good chances are that the eventual implementation will look nothing like the design I described. That’s okay, this is how these things work. I just hope to get the ball rolling for people to start thinking about this problem and come up with solutions.
I really hope that some open-source developer out there will read this essay and implement it, or implement his own take on it, so we could all benefit from this tool.
 I personally remember one contract Django job in which I had a few repeating code segments in a few classes. I refactored them into one base class. But that architecture was just not meant to be. It was too dynamic; I was creating class attributes dynamically and doing other nasty stuff. Maybe in LISP one would be able to come up with an elegant refactoring for that specific problem, but not in Python. That refactoring was a mistake on my part that cost my client a bit of money.
Thanks to Amir Rachum for helping me brainstorm on this idea.