Lately there’s been a lot of discussion about whether Python 3 is working out or not, with many projects reluctant to move to Python 3, especially big, mature projects that are in the “if it’s not broken don’t touch it” phase.
I still fully believe in Python 3, but this blog post is not about discussing 2-vs-3; I’d like to make my own modest contribution to the Python 3 cause by sharing with you my method of supporting both Python 2 and Python 3 which I use in my open-source project
When I originally read about the different ways to support both Python 2 and 3, I was appalled. There seemed to be 3 ways, and all 3 had properties that made me not want to even consider them.
The 3 approaches seem to be:
six. Pros: Not having to maintain two different codebases, or autogenerate code. Cons: Your code is ugly as shit because it has to support a wide range of Python versions.
I’ve spent quite some time thinking which approach to take, and I’ve settled on the first approach. I’ve implemented it a few months ago, and it’s been working really well.
metaclass=MyTypeyou need to specify
six.with_metaclass(MyType), instead of using
stryou need to use
six.text_type. That’s not what Python is about. It’s critical for me to have the code be as succinct as possible.
Having two separate codebases is the only solution that gives you full control of both codebases. You can tweak each codebase to fit the Python version it’s serving, and use its features in the most idiomatic way.
Now the big question is, how do you deal with having two separate codebases? I gave this question some thought. The main problem seems to be this: If I’m adding a feature in the Python 2 version of the library, I want to have that feature in the Python 3 branch, (or vice versa) but I don’t want to type the code again, nor to copy-paste. That’s the crux of the problem, and if that’s solved, having 2 codebases becomes less of an issue. (It’s not like we’re trying to save on diskspace.)
So, when developing a feature for the Py2 version and having it appear in the Py3 version I have to do something like a merge between the two codebases, because the two codebases are different. Normally I would use
git merge, but I can’t do that in this case because both codebases are in the same repo. (I considered using git submodules and having each codebase on a different submodule, but the path leading up to submodules is littered with the corpses of desperate developers who regretted ever touching them.)
I came up with a solution that works great. All you’ll need is to get a merge program that supports 3-way merging (I use the excellent but proprietery Araxis Merge, but open source alternatives are available), and follow the instructions below. They’re a bit lengthy, but after you get used to it, you can do them quickly enough that it’s not a big toll on the development cycle.
Create a folder structure similar to mine:
python_toolbox/ <--- Repo root source_py2/ python_toolbox/ __init__.py (All the source files, in their Python 2 version.) source_py3/ python_toolbox/ __init__.py (All the source files, in their Python 3 version.) setup.py README.markdown (All the usual files...)
setup.py file contains this simple snippet:
if sys.version_info == 3: source_folder = 'source_py3' else: source_folder = 'source_py2'
Then, the rest of the code in
setup.py refers to
source_folder instead of a hardcoded folder. This way a Python 2 user gets the Python 2 version installed, while a Python 3 user gets the Python 3 version installed. So far so good.
Now you’re asking, how do you deal with the in-repo merge problem?
First, before making the split to support Python 3, ensure that you’re starting from a commit where all the code works great and the test suite passes. Then, use 2to3 just one time to create a copy of your code that supports Python 3. Put that in
source_py3, and put the original code in
source_py2. Debug the test suite on the Python 3 version and edit it until all the tests pass. Fix your
setup.py files to take the correct source folder using the snippet I gave above, and confirm that it works by creating a source distribution and installing it on empty virtualenvs of both Python 2 and Python 3.
So far so good; you now have a working version of your code that works for both Python versions. What you do at this point is create a Git branch called
parity pointing to this commit. You push it to your Git remote, of course. You make the following rule, either with yourself in case of a single developer or with your fellow developers: You merge code to
parity only if the Python 2 codebase and the Python 3 codebase are equivalent. Equivalent means that if a feature has been implemented in one, it was merged (more about how later) to the other. If a bug was fixed in one codebase, it was merged to the other. Never let anyone push code to the
parity branch if that code doesn’t have parity between Python versions.
Now, how do you actually do the merge? Say that on your
development branch you’ve developed a new feature in the Python 3 codebase, and you want to merge it into the Python 2 version. (If you want to go the other way, just flip 2 and 3 in my explanation below.) What you do is this: First you ensure that you committed your change. Then, you create a local clone of your Git repo, with the
parity branch checked out. (Do a
git pull to be sure that you have the latest version.) Fire up your merge program and do the following three-way folder merge:
source_py3folder in the clone, which has the
paritybranch checked out, without your new feature.
source_py3folder in the original git repo, which has the
developmentbranch checked out, and does include your new feature.
source_py2folder in the original git repo, which has the
developmentbranch checked out, but does not include your new feature because it’s the Python 2 folder.
The merge you’re doing can be verbally described as: “Take the difference between the old Python 3 codebase and the new Python 3 codebase, and apply it to the Python 2 codebase.” In practice, it’s done like so: You go over the list of files, looking for files which changed between column 1 and column 2. For each file like that, you open it for file merging. Your merge program will show the 3 different versions of the file, with differences between each two columns clearly marked. You put the caret on the middle column, and page through the differences. (Preferably using a keyboard shortcut like
alt-down, consult your merge program documentation.)
As you go down the file, you’ll see three kinds of differences. Differences between column 1 and 2, between column 2 and 3, and between all columns.
ctrl-right. (This takes new code from the Python 3 codebase and copies it over to the Python 2 version.) Do take a brief look at the code you’re merging to ensure it’s Python 2 compatible.
Keep going over all files like that, until you’ve finished with all of them. Save all the files. Then run the test suite on both Python versions, and if there are any bugs, fix them until the suite passes.
Congratulations! You’ve achieved parity again. Commit your changes and push them to the
parity branch. If you wish to make a PyPI release at this point, you’re good to go and your code will work on both Python versions.
You don’t have to do this process on every feature; you can do it once in a while, or every time before you merge changes to master.
fix-foo-bugbranch you can create a temporary
fix-foo-bug-paritybranch to use as your parity branch, so you won’t have to use the same
paritybranch for all branches.)
That’s it. The process is a bit complex, but in my opinion the results are worth it; you have 2 completely separate codebases, you don’t depend on either code generation or compatibility libraries, and you can enjoy writing Python 3 idiomatic code on the Python 3 codebase.