Hitchhiker's Guide to Software Architecture and Everything Else - by Michael Stal

Thursday, March 05, 2015

Technical Debt - The Downside of Metaphors

The term "debt" is a metaphor from economy. Its use for software development seems to be very reasonable. Whenever developers fail to address quality issues in their software, they will have to pay this debt back. That is quite simple, isn't it?

However, we can easily find weaknesses of the term "technical debt":

When a system reaches its end of lifecycle, all debt will be gone. Try this for financial debt.

Financial debt is a separate entity, i.e. two debts do not intertwine, while software debt sometimes can't be easily located, isolated or separated, because it is woven into the system.

Technical debt may stay unpaid, if software engineering comes to the conclusion that the cost for refactoring would be higher than keeping the debt untouched.

Financial debt is created intentionally. This is certainly true for some technical debt issues as well such as temporary workarounds. However, many kinds of technical debt are accidental without architects even recognizing it.

Technical debt is destructive. If possible, you would like all of it to be eliminated. In contrast, financial debt often is more constructive - such as increased cash flow.

Financial debt is independent of other system parts, while technical dept in one place can be affected by other parts, and vice versa. An example would be modification of the system that makes the code parts containing the technical debt obsolete.

Interest rates are mostly determined by banks and usually stay fixed over the specified time. In the context of design debt the interest rates increase the longer the debt is not treated. The pay back time is usually not predetermined. And the same kind of debt may be more expensive to pay back in one system part than in other parts.

While almost all forms of financial debt are paid back continuously until the debt plus the interest rates are fully covered, each single technical debt in most cases must be paid back at once.

There is one kind of debt in the financial domain, while technical debt has many different shapes.

If you think about it, you'll find other metaphors that are more convincing. For example, a tumor or infection has many similarities with what we call technical debt: It can occur accidentally, or intentionally if you don't take care of your health. It must be paid back at once - you are either ill or not, which isn't quite true in practice, but very close. The longer it is not treated, the worse the health condition will become - think of viruses that spread and damage other organs. Infections and technical debt are mostly destructive. Both can be monitored in imaging devices or labor diagnostics. There are tumors that are difficult to get rid of and that cause severe damage, while others are harmless if treated early. The worst ones can even become lethal. So my first guess would be to call it software infection instead of software debt.

Another possibility is to use environmental pollution as a metaphor. Like technical debt it may occur incidentally or (most of the time) accidentally. You must pay it back, but you can do this part by part. If left untreated, the situation worsens. It is purely destructive. Often, it can't be easily separated from the environment. Thus, another possibility would be to call it software pollution. Software Architecture Sustainability is a metaphor that is related to environmental issues. And so is the term "Software Ecosystem".

Other metaphors could be considered as well such as terms for similar issues in hardware or building architecture (consider design erosion).

My personal favourite is Software Infection (respectively technical infection, code infection, design infection). Of course, this metaphor has its own liabilities, but it is much closer to its cousin in software development than the debt metaphor is.

And it sounds appropriate to speak about a sick component or architecture. Or to visualize an infection using a modality like in medicine.

I know that it is too late to get rid of the expression "technical debt", but at least we should handle it with care. Some metaphors that sound good in the beginning may turn out to be not well aligned with what we like to visualize.


Saturday, February 28, 2015

Are Patterns like Mummies?

In 1994, the Gang of Four, Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides published their groundbreaking book on design patterns. At almost the same time we were creating the first POSA book (Pattern-Oriented Software Architecture) which was eventually published in 1996. The software engineering community got flooded by the Pattern Wave. And soon there arrived an inflation of pattern books, some of them excellent, but many of them of mediocre quality.

I remember, that almost every software development magazine addressed patterns regularly and enthusiastically in the Nineties. The GoF became the Beatles of architects and developers. And most experts anticipated a myriad of new patterns rising at the horizon.

Actually, several pattern books have been published until now, but without the impact the GoF had. If you ask software engineers about the patterns they know, almost all will mention GoF patterns, many may illustrate some POSA patterns as well, and only a few will come up with other patterns such as those in Martin Fowlers book on Enterprise patterns.

This could suggest, that no other important patterns can be found in software habitats anymore, because GoF already got them all. But even if this were the case for general purpose design patterns, shouldn't there be some excellent patterns lurking in more specific domains? Pattern experts have tried to come up with complete pattern languages for their domains. In theory, such pattern languages are beneficial. In practice, it is impossible to cover a medium to large-sized domain with a pattern language because of the inherent complexity involved. Thus, it is not surprising that existing languages cover only tiny domains or fail to completely cover larger domains.

Does the whole universe only know the GoF patterns and that's it? In this case the seminal GoF book would be the holy grail of software development. If not, where do all the unknown patterns hide?

Let us assume we are asked to improve the GoF book, what exactly would we change. Patterns like the Null Object Pattern or the Extension Interface pattern (see POSA Vol. 2) could be added, the Singleton pattern removed and other patterns such as Abstract Factory improved. Patterns are not carved in stone but subject to future changes, albeit not with a high evolution speed.

If you look at patterns from 30000 feet, you'll recognize that there are some benefits of patterns not related to coding.

Pattern forms are a good mental tool to document design. For example, they comprise a name, a context, a problem with forces, and a solution. This helps document all kinds of architecture decisions in a structured way

Patterns are also applicable to document best practices for transformations, data representations, and many other topics. For example, each refactoring can be considered a transformation pattern. Best Practices are ideal candidates for patterns

Patterns introduce an idiomatic viewpoint, as they define a language. Effective usage of software platforms in terms of best practices for frameworks, libraries, APIs and protocols are idiomatic as well. Experience shows that idiomatic approaches help better understand structures and concepts. In software engineering all structures are idiomatic. Languages change and so do Patterns.

The value of patterns is not their content, but also their usage as good mental tools, with the capability of addressing all activities. Patterns help share best practices with others. They unfold a language of idioms that provide understandability, maintainability.

Patterns are dead. Long live patterns.

- Posted using BlogPress from my iPad


Tuesday, February 24, 2015


Recently, I discussed architecture design with some colleagues who are involved in Product Line Engineering projects. When the term "Reference Architecture" came up, it became obvious after a while that they used the term specifically for Product Line Architecture, while I had Reference Architectures in mind, which are typically created to steer and guide standardization. They might also present architecture styles that are commonly accepted within a domain. Think of compilers which are in most cases structured as a Pipes & Filters architecture which is composed of a lexer, a parser, a semantic evaluator, an optimizer, and a code generator.

A Reference Architecture (RA), on the other hand, is a coarse grained architecture template different organizations use for guiding their design activities. You may remember the OSI 7 Layers model or the OMG Reference Architecture for CORBA, both invented in the Stone Age of software engineering. A RA is a blueprint which doesn't include any further assets but documents. In fact, it is not very specific but rather abstract.

A Product Line Architecture (PLA) is an architecture for the product line of an organization. It captures the commonalities of a set of similar products and defines variation points using feature models or meta models. It contains different core assets, some of which are ready for use. A PLA is much more specific than a RA and defines a framework with (partially) implemented artifacts and variation mechanisms.

An RA can be used as the core of a PLA. For this purpose, the RA is concretized in Domain Engineering by addressing requirements and constraints of an organization. In the process, engineers may provide additional variation points or bind existing variation points. In the latter case the PLA transforms a variation point to a commonality. If many organizations do this for the same variation point in similar ways, a new extended RA is born.

A PLA can become the base of a RA if it is subject to abstraction and considered as an established practice in the domain, i.e. if most PLAs will end up in the same RA.

- Posted using BlogPress from my iPad


Sunday, February 01, 2015

I am back

After a long time absence due to health issues I am finally back. I will add new content in the following months with the publication frequency increasing over time.

At the OOP 2015 conference which ended last friday I was in charge of the architecture track. In my own talks I covered internal & external quality as welll as ways to ruin a project by (wrong) architecting. The latter tutorial introduces failure patterns for harming software development from the perspective of architects. Unfortunately, there are uncountable ways for failure, but only few for success.

A software system may be considered as a living organism. If the organism is not robust enough, even small illnesses may cause large damage. Architecture is basically the infrastructure necessary for metabolism, neural transmission or structural integrity (such as the bones). If parts are damaged, e.g., organs, we may either put in new or artificial organs, or maybe only repair a small part.
The software organism is created in an evolutionary process with biological patterns and is subject to design erosion during its whole lifecycle. Obviously, it is important to check the organism regularly to identify potential health problems as soon as possible. With Architecture Tomographs these checks can be fast and intense. In some cases design erosion or cancer dissemination is high enough to make the system impossible to repair. In these situations we may better create a new organism. However, for less harmful kinds of design erosion we may refactor using patterns. In this context, external quality is every property we can observe from outside such as speed, reliability, ... Internal qualities comprise heart rate, blood pressure, insuline level, bone strength, ....

Of course, the model has its limitations, but for me it turned out to be really useful.

- Posted using BlogPress from my iPad

Monday, July 14, 2014

Micro Management for Micro Brains - why Micro Services suck

It sounds like a great idea. Instead of building a monolithic enterprise application, just split functionality into small components and run them independently in their own processes. These micro services need to cooperate with each other to provide more advanced functionality. For that purpose, they are going to communicate with other micro services.

Such an approach promises better flexibility in changing and deploying systems. If only a small part of the functionality is changed, this will affect only a small number of micro services. Or, as we like to say, "small is beautiful".

What a wonderful new architecture style. Eventually, we found the silver bullet. Agreed, micro services may not be a silver bullet, but, at least, they are silver shotgun shells.

This idea is not new, though. For example,  "actors" are providing the same kind of solution. They encapsulate fine grained services behind message-based interfaces that enforce argument and result types to be value types, so that no side effects may occur. In order to achieve a common goal, actors interact with each other.

"Micro services" contains the term "services" for a reason. Micro services basically introduce a stripped down resurrection of SOA.

However,  the delicious looking and tempting apple may be poisoned. There are several "challenges" when using this architecture style:

  • Complexity:  If we are building a non-trivial application, it will reveal inherent complexity.  For the moment, let us assume, there is no accidental complexity involved. When this application is based on micro services, where does the complexity go? Unfortunately, it does not go away, but manifests itself  in complex connection and cooperation patterns between tiny services. 

  • Modifiability: If our enterprise application is partitioned into small micro services, we only need to touch and redeploy small services for changing the system, instead of facing the mess of application monoliths. In theory, this is a perfect solution. In practice, it is not, because for non-trivial changes of micro services, we may need to rewrite  a lot of other micro services that are connected with the modified micro services. Remember, the complexity does not disappear but shows up in the topology of the micro services network. This does not only affect design but also evolution, assessment and refactoring activities.

  • Internal  Architecture Quality: Architecture is about strategic design. Its main purpose is to map the problem domain to the solution domain. If micro services are used as the primary concept for structuring functionality, the problem domain will be mixed up with the solution domain, especially, if the usage of micro services is not transparent. Thus, the architecture will very likely be technology-based instead of problem-based. Note, that this is another variant of Conway's Law: "show me your architecture and I will know the technologies it depends on".

  • External Architecture Quality: As we all have learned the painful way, main reason for architecture failure typically are not functionality aspects.  Main reason are quality attributes like performance, extensibility, fault tolerance, and so forth. For each of these quality attributes, well known design strategies and design tactics are available that present alternative solutions how to introduce the respective quality into the architecture. Architecture design uses utility trees and scenario diagrams to specify external quality requirements. These quality attributes are often crosscutting, which is why complex networks of services make it hard and sometimes even impossible to design and implement  quality attributes.

  • Infrastructure: We may use a technology stack for leveraging the micro services architecture pattern or we may build it ourselves. The latter option is a no go, because we are not in the middleware business, at least most of us aren't.

Sunday, July 22, 2012

Reviewing systems - the sum is more than its parts

The profession of a software or system architect is not only about creating new systems, it is also about assessing and improving existing systems. If change is the rule and not the exception, architectures grow and change continuously. To keep them sustainable, review and assessment techniques are of uttermost importance.

An architecture review consists of different phases:

  • In the clarification or scoping phase we need to define the goal of the architecture  review as well as the one to three key questions the review is supposed to answer. For example, a goal could be to validate that our new system architecture implements the desired dependability quality attributes appropriately. One of the key questions could be: “can the software architecture achieve five nines of availability?”
  • In the analysis or information phase we need to read documents, interview stakeholders, check code and test cases, watch demos, among other things, so that we are able to answer the key questions.
  • In the evaluation phase, reviewers investigate the strengths, weaknesses, opportunities and threats imposed by the current system and related to the review goal. If there are risks in the system, reviewers should define recommendations to mitigate these risks.
  • In the feedback phase, all information (findings, recommendations) provided by the review is returned back to the team responsible for system development.   

These phases roughly correspond to the model specified by RUP (IBM Rational Unified Process): Inception, Elaboration, Construction, Transition. 

Unfortunately, there are many different types of reviews and review techniques, qualitative and quantitative reviews, scenario-based and experience-based reviews, code, design and architecture reviews. So which one should we choose for which purpose?

In my experience, it is often best to conduct an experience-based review as outlined  in the description of the phases, but integrate other review techniques as well, for example:

  • If quality attributes are in the main scope of the review, ATAM (Architecture Tradeoff Analysis Method) could be added to concretize quality attributes using scenarios and compare them with the actual architecture, thus determining the risk themes.
  • For obtaining information from stakeholders on architecture issues ADRs (Active Design Reviews) are a complimentary means instead of relying on interviews only.
  • Quantitative assessments such as metrics, prototypes, simulators help to obtain more detailed information about the system under review and its capabilities and limitations.
  • Code and design reviews help reviewers gain more insights about the details of the system. Of course, the code and design reviews are constrained to the parts relevant for the overall review goal.

The toolbox of the software or system architect should contain the relevant review techniques. Whenever architects are involved in a review, they should determine in the clarification phase which techniques they will use in the subsequent phases.

Note, that reviews do not need to follow a waterfall model. You may and should use an agile approach with answering the most relevant key question or its most important aspects first using time-boxed increments. 

Such reviews may require weeks, but can also be conducted as flash reviews on one day. The more you are able to narrow the review scope, the less effort it takes. If you conduct regular reviews after each increment, the architectural deltas are typically small which lets you narrow the scope of your regular reviews which can then be done in a day. The findings are used to refactor and improve the system architecture.

Reviewers should be experienced in software/system architecture as well as in review techniques. For building up a review culture in your organization, start enabling the lead architect to conduct reviews. Then, use the master-apprentice model to constantly increase the review skills in the rest of the team.

But as mentioned earlier: do not rely on one single review method but establish a toolbox of review and assessment techniques that can be used in combination to enforce architecture sustainability.

Saturday, January 07, 2012


DSLs are currently being promoted by a large number of activists. If you ever read the seminal Pragmatic Programmers book, you'll also find such a recommendation there. So, will the software engineering universe soon turn into a paradise? Let me tell you two examples from industrial practice: Once upon a time, in the logistics domain, a smart and lazy software engineer introduced a new language just for himself, in a very ad-hoc manner. When they saw the result, all his colleagues got very excited about the concept and started extending the language for their own purposes. The language inevitably grew and grew. Soon, an increasing number of system parts were depending on the language. Unortunately, when our smart inventor reached a high age, he had to leave and let his colleagues alone. But now, there was no one left in the company who really understood the language concept and its realization. It is claimed that the language keeps still growing. Other theories assume the system had to be completely rewritten in the meantime. In another universe and another time, another software engineer was convincing his fellows to introduce DSLs. All the IT dwarfs went crazy inventing their own languages. Yes, they even made competitions who could come up with the most fasinating or useful language. After a while, the DWARF software system was nothing but an ocean of languages. Only wizards were able to generate a working solution. When a furious dragon (= customer) attacked their town, everything imploded. No one has a clue where the remains of the dwarf town are located. What can we learn from these failures? DSL design is (like) architecture design. An ad-hoc language will lead to design erosion, and cause more harm than good in the long run. DSLs represent strategic design tools that should only be in the hands of experienced designers. There are purposes for which DSLs are a perfect fit. But there are also circumstances where they shouldn't be applied. Overdoing it increases accidental complexity. It is like component-based re-use. Each DSL should only have one responsibility. Plan to grow the DSLs systematically, refactor it and verify its correctness. Communicate with all stakeholders that are affected by the DSL. Consider the design choices for the language syntax. Should it be a text-based language, a graphical language, or both? Mind the XML hell - long, long ago almost everyone was striving for XML based languages. But in some rare contexts XML made writing configurations and documents ineffective, uncomprehensible, and tedious. I've seen systems wirh thousands of XML files. Let experts build and grow languages. It is amazing how (often) our discipline gets addicted to all kinds of panacea. Once you are lost in the buzzword jungle, all systems and stakeholders will suffer fom DSL paranoia. DSLs are an awesome means for boosting productivity. However, used by the wrong persons or with the wrong motivations, it is easy to shoot yourself in the foot.