My name is Ram Rachum, and I'm a software developer based in Israel, specializing in the Python programming language.

This is my personal blog. I write about technology, programming, Python, and any other thoughts that come to my mind.

I'm sometimes available for freelance work in Python and Django. My expertise is in developing a product from scratch.

GitHub profile

Personal Website

11th January 2014

Text

Support Py2+3 in two separate codebases: How to do it and why it’s great

Lately there’s been a lot of discussion about whether Python 3 is working out or not, with many projects reluctant to move to Python 3, especially big, mature projects that are in the “if it’s not broken don’t touch it” phase.

I still fully believe in Python 3, but this blog post is not about discussing 2-vs-3; I’d like to make my own modest contribution to the Python 3 cause by sharing with you my method of supporting both Python 2 and Python 3 which I use in my open-source project python_toolbox.

When I originally read about the different ways to support both Python 2 and 3, I was appalled. There seemed to be 3 ways, and all 3 had properties that made me not want to even consider them.

The 3 approaches seem to be:

  1. Maintain 2 completely separate codebases. Pros: Complete control over each copy of the code. Cons: You have to maintain 2 codebases.
  2. Maintain one codebase, targeting Python 2, and use 2to3 to automatically generate a second codebase that supports Python 3. (Or vice versa.) Pros: You need to maintain only one codebase. Cons: You’re now dealing with autogenerated code, which is hard to edit or debug.
  3. Support both Python 2 and Python 3 in the same codebase (like Django does) by using compatibility libraries like six. Pros: Not having to maintain two different codebases, or autogenerate code. Cons: Your code is ugly as shit because it has to support a wide range of Python versions.

I’ve spent quite some time thinking which approach to take, and I’ve settled on the first approach. I’ve implemented it a few months ago, and it’s been working really well.

Why is two codebases the best approach?

  • Autogenerated code sucks too much. I like my code to be an actual text file that I can always edit, especially when debugging it, not an ephemeral file created by autogeneration from a different file and using a set of algorithms.
  • A codebase that supports Python 2 and Python 3 forces you to only use features that are in both versions of Python, and makes your code ugly. I don’t know about you, but one of the big perks about programming in Python has always been the elegance and clarity of the code. If you’re using compatibility libraries, then instead of specifying metaclass=MyType you need to specify six.with_metaclass(MyType), instead of using str you need to use six.text_type. That’s not what Python is about. It’s critical for me to have the code be as succinct as possible.

Having two separate codebases is the only solution that gives you full control of both codebases. You can tweak each codebase to fit the Python version it’s serving, and use its features in the most idiomatic way.

How to make a dual codebase approach painless?

Now the big question is, how do you deal with having two separate codebases? I gave this question some thought. The main problem seems to be this: If I’m adding a feature in the Python 2 version of the library, I want to have that feature in the Python 3 branch, (or vice versa) but I don’t want to type the code again, nor to copy-paste. That’s the crux of the problem, and if that’s solved, having 2 codebases becomes less of an issue. (It’s not like we’re trying to save on diskspace.)

So, when developing a feature for the Py2 version and having it appear in the Py3 version I have to do something like a merge between the two codebases, because the two codebases are different. Normally I would use git merge, but I can’t do that in this case because both codebases are in the same repo. (I considered using git submodules and having each codebase on a different submodule, but the path leading up to submodules is littered with the corpses of desperate developers who regretted ever touching them.)

I came up with a solution that works great. All you’ll need is to get a merge program that supports 3-way merging (I use the excellent but proprietery Araxis Merge, but open source alternatives are available), and follow the instructions below. They’re a bit lengthy, but after you get used to it, you can do them quickly enough that it’s not a big toll on the development cycle.

Create a folder structure similar to mine:

    python_toolbox/  <--- Repo root
        source_py2/
            python_toolbox/
                __init__.py
                (All the source files, in their Python 2 version.)
        source_py3/
            python_toolbox/
                __init__.py
                (All the source files, in their Python 3 version.)
        setup.py
        README.markdown
        (All the usual files...)

My setup.py file contains this simple snippet:

    if sys.version_info[0] == 3:
        source_folder = 'source_py3'
    else:
        source_folder = 'source_py2'

Then, the rest of the code in setup.py refers to source_folder instead of a hardcoded folder. This way a Python 2 user gets the Python 2 version installed, while a Python 3 user gets the Python 3 version installed. So far so good.

Now you’re asking, how do you deal with the in-repo merge problem?

How to deal with merges

First, before making the split to support Python 3, ensure that you’re starting from a commit where all the code works great and the test suite passes. Then, use 2to3 just one time to create a copy of your code that supports Python 3. Put that in source_py3, and put the original code in source_py2. Debug the test suite on the Python 3 version and edit it until all the tests pass. Fix your setup.py files to take the correct source folder using the snippet I gave above, and confirm that it works by creating a source distribution and installing it on empty virtualenvs of both Python 2 and Python 3.

So far so good; you now have a working version of your code that works for both Python versions. What you do at this point is create a Git branch called parity pointing to this commit. You push it to your Git remote, of course. You make the following rule, either with yourself in case of a single developer or with your fellow developers: You merge code to parity only if the Python 2 codebase and the Python 3 codebase are equivalent. Equivalent means that if a feature has been implemented in one, it was merged (more about how later) to the other. If a bug was fixed in one codebase, it was merged to the other. Never let anyone push code to the parity branch if that code doesn’t have parity between Python versions.

Now, how do you actually do the merge? Say that on your development branch you’ve developed a new feature in the Python 3 codebase, and you want to merge it into the Python 2 version. (If you want to go the other way, just flip 2 and 3 in my explanation below.) What you do is this: First you ensure that you committed your change. Then, you create a local clone of your Git repo, with the parity branch checked out. (Do a git pull to be sure that you have the latest version.) Fire up your merge program and do the following three-way folder merge:

  • Set the first column to the source_py3 folder in the clone, which has the parity branch checked out, without your new feature.
  • Set the second column to the source_py3 folder in the original git repo, which has the development branch checked out, and does include your new feature.
  • Set the third column to the source_py2 folder in the original git repo, which has the development branch checked out, but does not include your new feature because it’s the Python 2 folder.

The merge you’re doing can be verbally described as: “Take the difference between the old Python 3 codebase and the new Python 3 codebase, and apply it to the Python 2 codebase.” In practice, it’s done like so: You go over the list of files, looking for files which changed between column 1 and column 2. For each file like that, you open it for file merging. Your merge program will show the 3 different versions of the file, with differences between each two columns clearly marked. You put the caret on the middle column, and page through the differences. (Preferably using a keyboard shortcut like alt-down, consult your merge program documentation.)

As you go down the file, you’ll see three kinds of differences. Differences between column 1 and 2, between column 2 and 3, and between all columns.

  • When you see differences between column 1 and 2, merge that snippet from column 2 to column 3, probably by using a keyboard shortcut like ctrl-right. (This takes new code from the Python 3 codebase and copies it over to the Python 2 version.) Do take a brief look at the code you’re merging to ensure it’s Python 2 compatible.
  • When you see differences between column 2 and 3, ignore them and move on. These are existing differences between your two codebases which you’ve already approved before.
  • When you see differences between all three columns, it’s time to wake up from your merge-induced coma. You’ve hit upon a sensitive line, which is different in Python 2 and Python 3, and was modified. You’ll probably want to manually edit the Python 2 version to add the same functionality in a Python 2 compatible way.

Keep going over all files like that, until you’ve finished with all of them. Save all the files. Then run the test suite on both Python versions, and if there are any bugs, fix them until the suite passes.

Congratulations! You’ve achieved parity again. Commit your changes and push them to the parity branch. If you wish to make a PyPI release at this point, you’re good to go and your code will work on both Python versions.

You don’t have to do this process on every feature; you can do it once in a while, or every time before you merge changes to master.

Notes:

  • You can also create branch-specific parity branches, for example if you have a fix-foo-bug branch you can create a temporary fix-foo-bug-parity branch to use as your parity branch, so you won’t have to use the same parity branch for all branches.)
  • If you’re using an IDE, it’s recommended you create two separate IDE projects, one for each Python version, and in each one exclude the files belonging to the other Python version. That way you’ll be sure you’re never editing files of the wrong Python version.

———-

That’s it. The process is a bit complex, but in my opinion the results are worth it; you have 2 completely separate codebases, you don’t depend on either code generation or compatibility libraries, and you can enjoy writing Python 3 idiomatic code on the Python 3 codebase.

Tagged: pythonplanetpython

Comments
All content in this website is copyright © 1986-2015 Ram Rachum.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License, with attribution to "Ram Rachum at ram.rachum.com" including link to ram.rachum.com.
To view a copy of this license, visit: http://creativecommons.org/licenses/by-sa/3.0/