We all know how retarded it is to evaluate a programming project by counting how many lines of code it has. Counting KLOCs encourages complexity instead of discouraging it. It discriminates against programmers who get more done in fewer lines of codes. It discriminates against programming languages that let the programmer do that. As someone once said, “Measuring programming progress by lines of code is like measuring aircraft building progress by weight.”
But many people still count lines of code, and use that as a metric for progress. These people are not retarded. There are some advantages to counting lines of code, and in some situations they can sort-of outweigh the disadvantages. I guess the main advantage is how objective, intuitive and no-nonsense it is. You put the codebase through a LOC counter, and you’ve got a number. That’s it. It’s like standing on the scale and seeing how much you weigh. You get a number, and that’s it, there are no excuses or explanations. (This is especially useful for non-programmers who need to evaluate a software project— They may be the managers, or the clients, or whoever.) Having such an objective measure is helpful sometimes, even when it’s as unreliable as KLOC count.
So KLOC count is still a terrible measure, but it has situations where it’s useful. So I propose an alternative to KLOC count: Code file count.
It’s simple: Instead of counting the lines of code, you count the number of code files you have in your codebase. And with that you can estimate how big your project it. For example, GarlicSim has around 350 code files.
Before people jump in to attack me: I know that counting code files is a terrible measure as well. My goal here wasn’t to make a good measure to estimating programming progress. My goal is to make a measure which has the same advantages as KLOC count, but whose disadvantages are slightly less severe.
Why does code-file count suck less than KLOC count?
For one, I believe that breaking down your codebase into many files is a good thing! Especially if you have a big hierarchy of files and folders. (Or as we call them in Python, modules and packages.) I think that having a well-thought-out folder hierarchy in your codebase is a great way to describe your project’s architecture.
Small files, big hierarchy— That’s a good way to organize code, in my humble opinion. When you have big files with lots of code in them, it’s hard to browse. The flat structure of a file makes it hard to understand which lines are the most important and which are less. I’ve often found myself drowning in files containing thousands of lines of code in them. (Yes, somebody else’s code, and GUI code at that.) I think a folder hierarchy helps separate the important stuff from the less important stuff.
So counting lines of code encourages this behavior.
I would say though, that the main reason is this: When you look through lines of code in a file, you see implementation. You see a description of “how the program is working.” A big number of lines means: “The implementation is complex.”
When you look through files in a code base, you see architecture. You see a description of “what the program is trying to do.” A big number of files means, “The architecture is complex.”
What I’m saying is, having a complex architecture is a more telling sign of progress than having a complex implementation. So I think this is a good reason to use code-file count instead of KLOC count.
Of course, the best way to measure progress is to actually understand the program and its objectives, and to analyze its approach, but unfortunately that’s not always easy or possible.
I’d be happy to hear any comments or opinions about this.