Saturday, May 06, 2006

The Problem with Metrics

From time to time I am asked what I think about evaluating software systems using metrics. The issue with most metrics is their close association to implementation artifacts. One of the wide-spread benchmarks is LOC. Measuring a software systems by lines of code does not make much sense in many cases. What does it mean that a subsystem contains a specific number lines of code? What about lines generated by model-driven software development tools, middleware generators, GUI builders, or IDEs? That does not mean, however, that LOCs are completely worthless. At least, they can give you hints where quality problems might lurk. For example, if a method contains several hundreds lines of code, then you definitely got a problem with inappropriate modularization. Another example is Cyclomatic Complexity (CC) introduced by McCabe. The CC of your system's runtime graph can be calculated by CC = E - N + P (see wikipedia link to CC). Here, E denotes the number of edges in the graph, N is the number of nodes, and P the number of connected components. According to McCabe a value of CC greater than 50 means your (part of the) system has too much complexity and reveals high risk. When applied to the Observer pattern with 50 observers, the CC will be larger than 50! Unfortunately, we all know that the Observer pattern is everything but complex and risky even when used with high numbers of observers. The problem here is that CC simply recounts all connections even if they are all of the same type. What does all that mean? My point here is that metrics are of limited value for a software architect. For each metrics used, a software architect should be aware of the strengths and limitations. Architecture quality and their lack can be evaluated better using other means. I will write about these qualities in a further posting. Of course, in the meantime I am very interested in what you think about metrics?

5 comments:

mixi said...

We wouldn't have problems with complexity if there are metrics. If so, we could simply work along them and everything would be fine.
This problem isn't just one in our profession. Psychology, analytical and clinical, is also seekin a way since ever to measure success of therapies. In pharmacy you've the double blind test to assure that and in which dimensions meds are working. You can't do that in psycho analytics.
Don't you think it's curious that a non-technical profession has the same trouble?
P.S.: If you able to solve the same problem in less LOCs (or SLOCs), that's definitley a good metric. (policy: about 2 operations per line)

Anonymous said...

I think we do generate too much metric data which only wastes CPU cycles. Almost no metric is ever read because if you have a system with several hundreds of modules and over 20 or 30 different metric numbers for each of them you are soon lost. If you look into the data and find a deviation you have to drill down and find out most of the time that it was an artefact.

Yours,
Alois Kraus

Michael said...

Mixi said, that problems would go away when there were metrics. I don't have the same opinion. Metrics do only work wirh things than can be measured. Otherwise, metrics won't work at all. Take a developmental quality such as extensibility or flexibility as an axample. How would you measure that? How would you detect flaws in design using metrics? Even if metrics could cover most aspects, you completely depend on the fact all requirements of your system are complete, consistent and that you did underatand them right. That alone is impossibl eto gurantee. And if you refer to other disciplines. What about designing buildings? There are things you can measure such as stability and robustness? But how would you measure aspects such as architectural beauty?

mixi said...

I just came across that book Code Quality Management. It seems to deal massively with metrics. Perhaps it solves "The Problem with Metrics".

Michael said...

Yes, I guess quality is the right answer. I only scanned through the example chapters but I believe the book is also more focusing on quality issues with respect to the code than on metrics in general. The code and the architectures are two sides (of the same coin), but the same architecture is implementably in a myriad of ways. For example, an archirecture (done by the architect) might reveal low quality, while its implementation (done by the developers) might have bad quality. And vice versa. Thus, we need two sets of quality functions, one for the architecture itself and one for the implementation. The book is about the latter issues.