Hitchhiker's Guide to Software Architecture and Everything Else - by Michael Stal

Tuesday, August 08, 2006

Blind Spots

Did you ever encounter the following problem: Suppose, you have written a text document. There are some very obvious typos in it. But, no matter how often you read and check your text, you just can't detect those errors. They somehow resemble blind spots. Same for writing source code although the compiler will point you to all syntax errors.These blind spots are even harder to find if they appear on the semantic or logical level. E.g., you are absolutely sure that you should do a depth first search, while in fact a breadth first search is more appropriate such as in a chess program. What is the reason for these blind spots? I don't know whether there is a biological or psychological explanation. However, I assume that the brain due to its filtering capabilities let's you just ignore the problem because it doesn't focus on the details but on the whole picture.
What does this imply for software engineering? From my viewpoint, this is a great example why pair programming and the 4 eyes principle are important. And also why code and architecture views are so essential. Whenever I am writing a document or creating a design, I will always ask someone else to crosscheck. Blind spots are a principal problem of humans. Thus, it is no good idea to pretend they do not exist.

Monday, August 07, 2006

Hammer and Nail

It is probably human that people always try the same solutions and tools they have previously used, even in situations when this is not suitable. This often leads to the Hammer and Nail syndrom: Do you want to put a painting on the wall? Take a hammer and a nail. Do you want to connect two pieces with each other? Take the hammer and the nail. While this approach makes you look incredibly stupid in real life, it is often applied in software engineering. That's the reason why some projects are doomed to fail. Still not convinced? Make the following experiment: Go to a team of architects and developers who have used CORBA and C++ in all their previous projects. Tell them about a new project where a distributed system needs to be built. Guess, what technologies they will recommend? I was often involved in projects where I have heard that technology X will be a must. The responsible members didn't even know what the problem was they were going to solve but were sure that a specific technology should be part of the solution. That's exactly what I mean by Hammer and Nail syndrome in software engineering. Another example are those distributed systems where people were using synchronous RPC style communication even for event-based asynchronous applications such as network management or control systems. Here, synchronous RPCs are the last thing you should use. Note that these issues are not always obvious. For instance, James Gosling never stopped telling me that 90% of all enterprise Java projects used EJB even if there was no need for a component container. While having said all this, I must mention that the other extreme is the best-of-breed syndrome. People are dividing their problem space in a large number of sub-problems and chose for each sub-problem the best technology or tool available. This approach may lead to a nightmare as no one ever will be able to handle dozens of different tools and technologies, especially when it is difficult to combine them. Sometimes, it is better to resort to sub-optimal solutions instead, thus minimizing the number of technologies and tools.What I often do as a consultant when asked what technology to use: I will ask the stakeholders about a list of detailed and prioritized requirements. Then I will ask independent technology experts how they believe their favourite technolgy or tools is able to meet these requirements and also ask them to do the same for other technology options (as a crosscheck). Of course, detailed study of reports and articles is another possibility. Mostly, this decision matrix shows very clearly a technology that fits best to the problem context. If multiple options fit, I will ask management to make a decision (never let them you decide as a consultant because this will be like shooting in your own foot, anyway). If there are related problems and decisions to be made, build groups of solutions.Of course, the Hammer and Nail syndrome is not only about tools and technologies but also about software architecture. People tend to apply the same architectural principles again and again even if they are not suitable. Look at all these architectures where you'll find an observer or strategy almost everywhere. If architects or engineers detect a new toy, they want to play with it all the time. This problem can only be addressed by code and design reviews (or by pair-programming). If you face such a problem, tell the engineers exactly why using that kind of pattern or architectural solution isn't smart in that particulart context. For instance, the observer pattern makes no sense if there is a bi-directional 1:1 dependency between two components.The problem is that everyone of us (me too) may fall addicted to the Hammer and Nail syndrome from time to time. I often found that in some cases I had to come up with a quick solution so I used what I already knew. That is human, but limits creativity. Creativity also means to find new, innovative solutions instead of trying to apply the same old solutions again and again.

Sunday, August 06, 2006

Death by Flexibility

I remember a project where engineers were using CORBA as their favourite middleware solution. Those of you familiar with CORBA know that as in most other middleware solutions, CORBA provides generic data types such as (Dyn)Any (may dynamically represent any other data type) and unions (a set of alternative data types). Engineers in that particular project found these generic data types very useful. When they faced the problem of implementing a tree structure which should then be transmitted over the network, they came up with a solution that relied on any and unions. Once finished, performance behavior was incredibly bad. It turned out that the middleware must at run-time interpret and (de)marshal tree structures that consist of unions and anys. While being very flexible, the solution was useless due to performance penalties. In another project we saw engineers implementing a mediator component as the central part of their architecture. Every other subsystem depended on that mediator in the middle. Thus, the mediator became the centre of this application's universe. After a (short) while, design erosion started to badly impact the architecture. What the designers intended to create was a flexible solution, but what they got was a maintenance nightmare. Other projects often get addicted to the strategy symptom. Every functionality is hidden behind a strategy implementation. In a large system this flexible solution leads to a configuration problem as, after all, someone needs to configure all strategies during start-up time. Using strategies everywhere, basically means "I don't know now what kind of implementation should be used here. Thus, I leave that decision open for later generations of developers". But how should they know? Using a centralized or decentralized approach for finding peers is also a flexibility issue. A dynamic lookup-approach such as available in Jini or Peer-to-Peer solutions offers high flexibility, but also more network load at the same time.
So, what can we learn from such experiences? Over-flexibility is doomed to fail. For instance, in the first example with all the any and union types, it is much smarter to either use CORBA value types or a concrete constructed type that the middleware does not need to parse at run-time. In the "overuse of strategies" example, solutions would be to open only those places of the architecture for extension or change where it is really required (Open/Close principle). Overuse of strategies is often a sign for missing requirements, experience or knowledge. In the god-like mediator example, it seems as if the developers have forgotten to perform a use case / scenario analysis. By considering the relevant main workflows, the architecture dependencies often show up very soon. For places where workflows need to be flexible use solutions such as rules engines, observers, and/or dependency injection.
What we also see is that performance and flexibility are difficult to achieve at the same time. Very flexible solutions, often lead to performance penalties. Very performant solutions often get their performance boosts from directly accessing critical hardware and software layers which doesn't leave much space for flexibility.
However, don't take this as a general rule. For example, the BEA JRocket Java VM increases performance by flexibility. At start-up time, the VM detects what kind of environment it is running on and then automatically adapts its configuration accordingly such as choice of multithreading strategy or cache sizes. In other words, flexible patterns for resource management are very valuable, especially when adapted to their runtime environment as start-up or maintenance time.
To sum up, my main recommendation shortened to one single sentence is: Be as concrete as possible and as flexible as really necessary.