My name is Ram Rachum, and I'm a Python software developer based in Israel.

This is my personal blog. I write about technology, Python, programming and a bunch of other things.

GitHub profile

Personal Website

9th April 2011

Text

2 out of 3 GitHub forks are completely empty

About 2 years ago I started the GarlicSim project on GitHub. GitHub’s most famous feature is “forking”: When another person forks your code, they create their own copy which they may work on and change the code, and later show you their changes (a.k.a. “pull request”) so you could merge those changes into your own repo. True open-source spirit.

GarlicSim was forked 4 times in the 2 years of its existence. I remember the first time someone forked GarlicSim. I was very excited. I thought someone was going to help me work on it. I asked that person what he was thinking of adding. He didn’t respond to my messages. He didn’t add any code to his fork. Neither did the 3 other forks of GarlicSim that followed.

Then it became clear to me that GitHub forks don’t mean much. I guess most people just click “fork” because they think it’s cool, or because they would like to imagine themselves working on the project. But they don’t actually do any development on their forks.

I wanted to check whether this phenomenon happens on other peoples’ repos too; maybe I was just unlucky with GarlicSim? Apparently not. My small research showed that 2 out of 3 GitHub forks are completely empty.

My modest research

I checked 12 random GitHub projects which have at least one fork, not including the original repo. I used GitHub’s own “Random repository” link to find them, and pruned through all the fork-less repos (more than a hundred) until I got my 12. Then I counted how many forks each of these repos had; the smallest had just one fork, the biggest had 8 forks. (Again, not including the original repo.) The median number of forks was 1. The total number of forks across the 12 projects was 27.

Then I used Github’s excellent “Network” view to see how many of these repositories actually had any commits not present in the original repo. Only 8 out of the 27 forks had any content in them.

To put it simply: 2 out of every 3 forks in GitHub are completely empty. Their owners did not add even a single trivial line of code to the project, not even just for playing around in their own fork.

And of course, regarding those ~33% of forks which actually contains code, I did not check how many of these have trivial code and how many have substantial code. It’s quite possible that a big share of them just add a few trivial lines that never get merged into the original project.

So keep that in mind next time you browse Github and see a project with 20 forks. There aren’t really 20 active forks, there are perhaps 7 active forks, and even that is being optimistic.

(And I’d like to note: I love GitHub with all my heart. It’s a great tool and I use it both in my OSS work and my job. I just want people to have a realistic view of what a GitHub fork means.)

Comments
All content in this website is copyright © 1986-2011 Ram Rachum.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License, with attribution to "Ram Rachum at ram.rachum.com" including link to ram.rachum.com.
To view a copy of this license, visit: http://creativecommons.org/licenses/by-sa/3.0/