My name is Ram Rachum, and I'm a Python software developer based in Israel.

This is my personal blog. I write about technology, Python, programming and a bunch of other things.

GitHub profile

Personal Website

5th April 2011

Text

The next revolution after Git

(Note: I talk a lot about Git in this essay, but the same thing can probably be said about Mercurial, Bazaar, et al. I simply use Git because that’s the system that I’m personally familiar with.)

I’ve been a Git user for a few years now. I love it. Being able to branch-n-merge freely is a very powerful tool for anyone who needs to manage a codebase. But I find that there is one area where Git falls short; in this essay I will make a humble first attempt at designing a system that tries to solve this problem. I hope that someone will pick up the glove and create this project. If anyone implements this well, I would personally consider it to be the next revolution after Git; i.e., The difference in productivity between this system and Git would be on the same scale as the difference between Git and SVN.

The missing feature is: Templating.

The problem

In a codebase you often have code snippets that repeat themselves a few times in different places. For example you have a code file chair.py that defines a class Chair, and another file bar_stool.py that defines a class BarStool, and they share a lot of code, but they have a few small differences between them. You don’t want your source-control to store two copies of the same code; not because you’re cheap on hard-disk space, but because every time you’ll want to change the shared code, you’ll have to change it in two (or more) places. This is called the DRY principle, “Don’t repeat yourself”, and it’s a very good principle.

I know what you’re thinking. “You should refactor the Chair and BarStool classes so that the shared code will be put in a base class!” Sure, if you can. I’m personally a refactoring fanatic. I refactor everything that moves. But sometimes you just can’t refactor. Sometimes you’re completely unable to refactor, and sometimes you are able to refactor but it’s just not practical, because it will make your code too dynamic and introduce too much indirection.[1]

GarlicSim has several examples of situations in which you have repeating code that would be impractical to refactor. A good one is the setup.py files. GarlicSim is comprised of three Python packages: garlicsim, garlicsim_lib and garlicsim_wx. Like all Python packages, they each have a setup.py file: garlicsim/setup.py, garlicsim_lib/setup.py and garlicsim_wx/setup.py. As you can see, these files have many lines in common with each other. But there is no practical way to refactor those identical parts away.

The solution, in the abstract

The solution is templating. Instead of manipulating the code files directly, you manipulate succinct descriptions of the code files. So instead of maintaining three setup.py files, I would have to maintain one generic setup.py template, and then describe each of the three actual setup.py files as deviations from that template.

Any programmer worth his salt would feel in his guts that templating is “the correct solution,” that it’s the way things should be. But then, almost no open-source project uses such a templating scheme to maintain their code. Why?

The solution needs to be comfortable to work with

There’s a big difference between solving a problem in the abstract and solving it in practice. A templating system is indeed “the correct solution,” but that’s not enough to make it a solution worthy of being used.

Imagine if I tried to use a templating system, say Jinja, to produce my setup.py files. I would have a general template, and then three descriptions of how each setup.py file deviates from the templates. And then I would have to generate the actual setup.py files from the templates.

That would be annoying. One of the big issues is that you can’t edit the generated code. I’ve often seen messages like:

# This file was created automatically by the templating system.
# Don't modify this file, modify the templating interface instead.

That really sucks. It sucks to be forced to edit your template files instead of the code files. Your IDE just can’t grok the template as a source file, so it loses all intelligence features. The same thing can be said about the developer himself, actually… You always want to be able to view and edit your source files directly.

A design that might work

Here’s a design that might work. Let’s codename it “Leapfrog”, just for the sake of easy reference.

This is a design that works on top of Git, or on top of any other source-control system. Now, when you have a Git repo, Git saves all its data in a hidden .git folder in your project’s root folder. We are going to do something similar; we’ll save all of our data in a .leapfrog folder that will sit in the root folder alongside the .git folder.

The .leapfrog folder will contain all the templates and all the data needed to generate your project’s source files. The .leapfrog folder will be the only thing tracked by Git; your actual source files will be git-ignored.

So the contents of your .leapfrog folder may look like this:

.leapfrog/
    setup.py.abstract
	garlicsim/
	    setup.py
	garlicsim_lib/
	    setup.py
	garlicsim_wx/
	    setup.py

Where setup.py.abstract is a template, and each setup.py file extends that template. This is all the information that is needed to generate the actual setup.py files in the so-called “working directory.” (The template format can be in Jinja or whatever.)

Now, in order to synchronize the working directory with the .leapfrog directory, you’ll have an interface quite similar to Git. A leapfrog checkout action would generate the code files from the templates. But the powerful feature would be that you could go the other way too; you could edit the code files and then “stage” that to the templates.

For example, you could edit garlicim_lib/setup.py and add a bunch of lines. Then you would run leapfrog add . to stage, just like you do in Git. The changes will be added to the .leapfrog/garlicim_lib/setup.py template. Then you could use git add . and git commit to stage and commit the changes to the template to Git.

Pre-rebuttal

Probably some people will say, “there are many existing templating systems out there that you can use!” To which I can give the Drew Houston reply:

You: There are a million of templating systems out there!
Me: Do you use any of them to manage your codebase?
You: No.
Me: …

Final words

There are probably mistakes in my design. There are probably problems I didn’t think of. Good chances are that the eventual implementation will look nothing like the design I described. That’s okay, this is how these things work. I just hope to get the ball rolling for people to start thinking about this problem and come up with solutions.

I really hope that some open-source developer out there will read this essay and implement it, or implement his own take on it, so we could all benefit from this tool.

————————————————

Notes

[1] I personally remember one contract Django job in which I had a few repeating code segments in a few classes. I refactored them into one base class. But that architecture was just not meant to be. It was too dynamic; I was creating class attributes dynamically and doing other nasty stuff. Maybe in LISP one would be able to come up with an elegant refactoring for that specific problem, but not in Python. That refactoring was a mistake on my part that cost my client a bit of money.

Thanks to Amir Rachum for helping me brainstorm on this idea.

Comments
All content in this website is copyright © 1986-2011 Ram Rachum.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License, with attribution to "Ram Rachum at ram.rachum.com" including link to ram.rachum.com.
To view a copy of this license, visit: http://creativecommons.org/licenses/by-sa/3.0/