Hitchhiker's Guide to Software Architecture and Everything Else - by Michael Stal

Monday, February 25, 2008

DSLs revisited

DSLs are everywhere. At least, we can read and hear a lot of good things about Domain Specific Languages. Unfortunately, many experts I meet are not always that sure what a DSL really is.

As a lazy person, I will not bore you with my own definition but just cite what Markus Voelter once said. In a former posting Markus defined a DSL as follows:

A DSL is a concise, precise and processable description of a viewpoint, concern or aspect of a system, given in a notation that suits the people who specify that particular viewpoint, concern or aspect.

Given that definition, UML is a vehicle to define DSLs. Take use case diagrams as an example. However, UML might better suit software engineers.

An ADL (Architecture Description Language) like Wright is a formal language to describe architectural designs.

BPEL or ARIS are DSLs to describe business processes. While BPEL is much more system centric, ARIS can also be understood by business analysts.

Thought questions:

  • Is a programming language such as Java also a DSL?

  • And what about natural language?

DSLs are useful to describe a viewpoint, concern or aspect of a system. But, what means do we have to introduce our own DSLs?

  • Meta data annotations (Java) or attributes (.NET) can serve as the core elements of a DSL. Take the different attributes WCF (Windows Communication Foundation) introduces to help describe contracts.

  • UML provides a meta model for introducing DSLs as well as specific DSLs (the different diagrams to specify viewpoints).

  • XML can serve as a foundation for introducing DSLs. Example: BPEL, WS-*, XAML.

  • Programming languages like Ruby or Smalltalk let you intercept the processing of program statements, thus enabling DSLs. Example: Ruby on Rails.

  • Integrated DSLs like LINQ are embedded into their surrounding language.

  • APIs define a kind of language.

  • The hard way might be designing your own language using plain-vanilla tools such as ANTLR. Note: It is not only the compiler or interpreter but also the run-time system we have to cope with.

According to the Pragmatic Programmers designing a DSL can be very effective.

  • It may help you visualize and communication the viewpoint, aspect, concern the DSL addresses.

  • It may enable you providing a generator that generates significant parts instead of forcing you to hand-craft. This is not restricted to code but you could also generate tests or documents.

Stefan Tilkov provided an excellent example. Suppose, you are developing a Java application. Writing the unit tests in JUnit would be the normal way of operation. But, maybe JRuby would be a much better way to write unit tests. As both run on top of the JVM, this can be achieved easily. In this context, we use JRuby+xUnit APIs as our DSL.

I believe, DSLs help climb over the one-size-fits-all barrier traditional programming languages introduce. Creating a DSL does not need to be as complex as defining a programming language. Using XML Schema or Meta data annotations or UML Profiles are nice examples.

Saturday, February 23, 2008

Generative versus Generic Programming

Andrey Nechypurenko, a smart colleague of mine, recently brought up an excellent thought. If we already know a domain very well, why should we bother building model-driven generators? Couldn't we just provide an application framework instead? Indeed, each framework introduces a kind of language. It implements commonalities but also hooks to take care of variability. So, when should we use model-driven techniques and when should we better rely on frameworks?

My first assumptions were:

  • A DSL is much closer to the problem domain which helps different stakeholders to participate establishing the concrete model. If stakeholder participation is essential, than a DSL might definitely be the better way.
  • DSLs are (often) more productive. It is much more verbose to program against a framework than designing using a DSL. Needless to say, that we need to take the efforts for providing the generator into account.

At least in my opinion the answer could also be to use both, a framework that captures the domain's core abstractions and a generator that generates applications on top of this framework.

Any thoughts about these issues? 

Monday, February 18, 2008


Today, I had an inspiration. My spouse told me it will take 5 minutes to get ready for leaving home, but actually it took her over 20 minutes. As those "events" constantly happen, my assumption is that women are moving with much faster speed than men. This way,  5 minutes on their spaceship equals 20 minutes of our time.

I also remember a professor at my university asking a student about the "evaluator problem".  The student did not really understand the question. Do you? Actually, the right answer turned out to be that he was using C style malloc for allocating storage in his compiler's attribute evaluator.

What can we learn from this? Humans might have a completely different perspective on reality depending on her/his background and context. Whenever we are dealing with architecture design, we'll have to communicate with all those stakeholders. In order to prevent misunderstandings like the ones illustrated above, we need to introduce a kind of formal language and a common scale - with other words: a common understanding.

One of the most critical points in design is the consideration and elicitation of requirements. This also holds for requirement priorities. For example, a project manager might be more interested in partitioning the architecture so that the different teams can develop in the most effective way, while an architect will follow a completely different rationale for modularisation.

In many projects I was also surprised by the fact how important a common project glossary is. Otherwise, people will interpret the same terms differently which is not a big boost for effective communication. For example, in one SOA-based project senior managers considered process monitoring to reveal the business perspective (Business Process Monitoring). They were interested to observe the number of successful business transactions or cash flow. What they actually got from development was system monitoring functionality such as showing load, bandwidth or faults. Guess, how long it took development to correct this?

It is very likely, that you also experienced the architecture/implementation gap in your career. This inevitably occurs whenever architects throw their design specification over the fence and then leave the development team alone.  Did you ever read a specification and understood it without any problems? Me neither! Some parts might be missing, other ones might not be implementable, and others might be contradicting. Even formal standards such as CORBA, SOAP, HTTP that were developed by expert groups within long time periods, reveal those problems. How can you then expect your own specification to be perfect? What is not perfect, will then be corrected or interpreted in unanticipated ways.

Thus, it is essential that such clarifications happen as soon as possible in a project.  It is your responsibility as an architect to tackle this problem. Remember how often I already mentioned that effective communication is the most important skill - one of the few moments where I do not abide to the DRY principle. There are lot of methods and tool that explicitly address this issue. Think of agile methods, domain driven design, use case analysis, to name just a few.

Reality is relative. Thus introduce a common frame of reference for the stakeholders. This holds for software engineering, but also for real life :-)

Saturday, February 16, 2008


Workflow engines such as Microsoft's WF (Workflow Foundation) are only rarely used these days. Most developers totally underestimate their power. However, in an integrated world where processes respectively workflows increasingly need to connect disparate heterogeneous applications, just hard coding these workflows in traditional languages such as Java does not make any sense. First of all, coding in Java or C# requires programming skills, while often business analysts are the ones in charge of business processes. And secondly, hard coding in Java, C#, C++ implies that all changes must happen in source code with the code base representing a beast containing both business logic and workflow related content.  

Before covering the meat of this posting, I need to define what I understand by a workflow. By the way, I will use the terms workflow and business process interchangeably as I don't consider the differences stated by other authors very convincing.

If we just refer to Wikipedia we will find the following definition:

  • A workflow is a reliably repeatable pattern of activity enabled by a systematic organization of resources, defined roles and mass, energy and information flows, into a work process that can be documented and learned. Workflows are always designed to achieve processing intents of some sort, such as physical transformation, service provision, or information processing.

Sounds rather confusing, right? Let me try my own definition.

  • A workflow defines a set of correlated activities each of which representing a (composite) component that takes input, transforms it, and sends the result via its output ports to subsequent activities. Transitions to an activity may be triggered implicitly by (human or machine initiated events) or by explicit transfer of control
  • Workflows can be considered as a Pipes & Filters architecture with the filters being the activities and the pipes being the communication channels between them
  • All activities within a workflow share the same context such as common resources
  • Workflows are (composite) activities themselves
  • A workflow language is a DSL that allows to declaratively express workflows. Examples include BPEL, ARIS, or XAML as used in WF
  • A workflow runtime manages the (local) execution and lifetime of workflow instances

In fact, a workflow closely resembles traditional program code with the statements being the activities. 

A typical real-life example is visiting a restaurant.

  1. The workflow starts when you enter the restaurant and arrive at the "wait to be seated" sign.
  2. The waitress will now ask you whether you have a reservation.
  3. If yes, she will look up the reservation and escort you to your table
  4. If no, she will try to determine whether a table is available
  5. If yes, she will escort you to the table
  6. If no, she may ask you to come back later => stop
  7. The waitress will provide you with a menu
  8. At the same time, another waitress might offer you water
  9. ...

You will recognize easily that UML activity diagrams could be used to design workflows. Activities are either triggered by events or by active commands (e.g., you ask the waitress to bring you another beer which is like an event or she could ask you if you need anything else which is more like an "invocation"). Thus, you could either use such a diagram to express a workflow or maybe a more domain-specific kind of language/model such as BPEL. This language could be graphical, textual, audio-based, and so forth.

The advantage of a (domain-specific) language is obvious. It is much easier for domain experts to express the workflows themselves. Moreover, the workflow descriptions could be either used to generate the actual code or to execute the workflow in a runtime engine.

Sometimes workflow transitions depend on conditions such as "if customer is a special guest, give him a window table." These conditions however might also be subject to change. Coding such conditions in activities thus reveals the same problems as coding workflows in C# or Java. As a consequence, we should also specify such rules in a declarative fashion. This is where Business Rules Engines enter the stage. Business rules might be explicitly expressible in a workflow language or just be hidden in special activity types.

How can we retrieve the workflows in the act of architecture creation. User stories, Use cases and scenarios, help determing the workflow in a system. Which of these should be hard coded and which ones should be better implemented using a Workflow language, depends on requirements, frequency of change, required variability, and other factors, questions a software architect needs to address.

Friday, February 15, 2008

Removing unnecessary abstractions

A good architecture design should always only contain a minimal set of abstractions and dependencies. According to Blaise Pascal a system is simple, when it is not possible to remove something without violating the requirements. Again, we can consider this from an architecture entropy viewpoint. Minimalism and simplicity imply the lowest possible entropy.

Let me give you an example how a design should not look like. Suppose, you are developing a graphical editor. In the first design sketch you introduce an abstraction called Shape with functionality such as Draw(). From this base type you are then inheriting subtypes such as PolygonicShape and RoundedShape. Finally, the abstractions Rectangle and Square and Triangle will become direct descendants of PolygonicShape, and so forth.

After a while you might recognize that it does not make any sense to introduce the intermediary layer consisting of PolygonicShape and RoundedShape. You find out that in your context all concrete classes such as Ellipse, Rectangle, Triangle, ... should be directly derived from Shape, because you don't differentiate between types of shapes in your software.

By removing the intermediary layer you have eliminated unnecessary abstractions and dependencies. Obviously, this also adheres to the KiSS Principle. But why, you may ask, are we not removing Shape too? The answer is very simple: Shape introduces common behavior required to handle all concrete shape subtypes uniformly in the editor. Otherwise, we would need to introduce large switch statements, type codes or reflection mechanisms in several places of our application. Thus, removing Shape would impose a much higher entropy.

As you can surely see, determining the applicability of this architecture refactoring is not trivial, because you need to consider different layers and levels of abstractions in your design. For example, in another context it might be important to introduce intermediary abstractions such as PolygonicShape because you need to treat polygons different from rounded shapes. In this sense truth is always relative.

Thursday, February 14, 2008

Work Life Balance - Revisited

Right after each of my postings related to Work Life Balance, my inbox grew due to people asking me about how I personally keep the balance. That's the reason why I want to spend some sentences on this personal issue although it is not related to Software Architecture (al least at the first sight).

I still consider myself a workaholic on rails. Only a few years ago I used to work the whole week without too many exceptions. After some health problems, I remembered the time when I had been totally enthusiastic about sports. And then I started to go biking and running regularly. That's where the rails come in. Ok, I did too much running and got problems with my knee which is the reason why I am more biased to biking these days. Typically, I will run or bike almost every day from spring to autumn. For example, I enjoy to bike to work (10 km). On my way back to home I will often use a detour leading through a large forest and along a bike lane near the Isar river here in Munich (30-50 km).  Sometimes, I leave office late afternoon and continue my work at home. If possible, I am also running which is an incredibly excellent way to relax, especially in winter when biking is not that cool (or should I better  say when it is too cool). After 10-12 km which may take more than 60 minutes depending on my training intensity all problems are gone and my mind is open for new ideas.

My other hobbies include composing music, reading books, listening to podcasts, audiobooks or music on my IPod, among many other interests. Yes, I definitely got more interests than time.

My experience has taught me that sports (no matter what) is the most important way to keep your balance. So is meeting people who have no interest in any IT topics. Same for all hobbies that help forgetting about any work related challenges (formerly known as problems). I have never been as creative as when keeping my work/life in balance. The more time I spend for sports and other interests, the more productive I will be. This is kind of surprising as I always believed in the opposite statement (the more time spent, the more productivity).

Just relax! 

All about Cycles

Here comes a real world example. In a management application the distributed agents in charge of managed objects (such as routers, load balancers, switches, firewalls) are accessed by a centralized monitoring application.  For this purpose, the sensors offer functionality such as setter/getter functions. The monitoring subsystem inherently depends on the agents it monitors. After a while, the developers of the agent subsystem think about returning time stamps to the monitoring subsystem whenever it requests some information. But where could they obtain these central time stamps? The team decides to add time stamp functionality to the central monitoring subsystem. Unfortunately, the agents now depend on the monitoring subsystem. Developers have established a dependency cycle.

"So, who cares?", you might say. The problem with such circular references is that they add accidental complexity. Mind the implications such as:

  • If there is a cycle within your architecture involving two or more subsystems, you won't be able to test one of the subsystems in isolation without needing to test the other ones.
  • Re-using one of the subsystems in the cycle implies you also need to re-use the other subsystems in the cycle.

It is pretty obvious that dependency cycles are a bad thing. They represent an architectural smell. But how can we cure the problem? Can you hear the horns playing a fanfare? The answer is: by applying an architectural refactoring!

So, how could we get rid of a dependency cycle?

  • Option 1: by redirecting the dependency to somewhere else. In the example above, we could introduce a time subsystem the agents use to obtain time stamps instead of adding this functionality to the monitoring subsystem. This way, we broke the cycle.
  • Option 2: by reversing one of the dependencies. For example if we exchange a publisher/subscriber pull model (event observer itself asking for events) with a push model (event provider always notifies event consumer). Another example is introducing interfaces: instead of letting a component depend on another component we could introduce interfaces. This way, a component always uses interfaces and never component implementations directly.
  • Option 3: by removing the dependency completely if it is not necessary. For example consider Master/Detail relationships in relational databases. Only one dependency direction makes sense here. The other one is obsolete.

Dependency Injection as provided by all those IoC Containers is very helpful to avoid dependency cycles. Even if you got no dependency cycles in your system, you should always avoid unnecessary dependencies - because they inevitably increase entropy and thus the chance of circular references.

But how can we detect such dependency cycles within our system? This is where CQM (Code Quality Management) tools enter the stage. Of course, these tools are not restricted to problems with dependencies. They are also helpful with a lot of other architecture quality related measurements.

Can we always avoid cycles? No, because sometimes they are inherent in the domain. For example, in Microsoft COM due to the usage of reference counting cycles were required in graph structures consisting of COM objects. However, one could certainly ask if the design of COM was appropriate in this respect. But that's a completely different issue.

Thus, the general guideline should always be: avoid unnecessary (accidental) dependencies, break dependency cycles!

Tuesday, February 12, 2008


Suppose, you are going to develop a traffic control system for a rather large city. The goal is to optimize traffic throughput which could be measured by average miles of each cyclist and motorist. In a first approach, you may suggest to use a purely centralized control subsystem.

There are various problems with this task:

  • In a rather large traffic infrastructure, traffic can never be controlled from a centralized component. First of all, this component would represent a single point of failure. Secondly, it would not scale or perform at all due to the sheer amount of traffic signals involved.
  • The requirement of traffic optimization reveals a significant problem. Assume, that there is a main road through the city. The best way to achieve the goal could be to just set all traffic lights on the main road to green. Participants driving on side streets would then need to wait forever. That's what is called starvation in multithreading.

How could we solve those problems?

  • Instead of using a centralized approach, we could rely on a completely decentralized approach. For example, each street crossing could permanently measure the traffic and adapt its traffic lights automatically to the current situation. If all street crossing act independently, local tactics might dominate global strategy. Obviously, this is not what we want. Strategy should always dominate tactics. Hence, let local decisions of street crossings depend on "neighboring" street crossings. If appropriate, we could also mix centralized control with decentralized autonomous adaptations. That basically means, that we control the overall strategy, but allow the system to automatically and locally adapt its tactics. Is it always useful to apply decentralized approaches? No. Here is a real life experiment: Take a number of people in a row each of them stretching one of their fingers. Put a large stick on their fingers. And then tell the group to systematically and gently put down the stick to the ground. They won't succeed until finally one person will become the leader and coordinate the whole action.
  • We also learn from the example that we need really to address requirements with much more care, especially in systems with complex interactions. In the example above the target function should not only consider average flow but also take possible starvation into account. By the way, the traffic example could also be replaced by a Web shop and Web traffic.  In this example, starvation would imply that some shoppers would have a fantastic experience while others could not receive any Web page in an appropriate response time. Guess, how often the latter ones would re-visit the shop. Wrong understanding of requirements can seriously endanger your whole business goals. 

The question is how to design such decentralized systems. There are lots of options:

  • Infrastructure: P2P networks are one way to organize resources so that they can be located and used in a decentralized way. Cloud Computing formerly known as Grid Computing is another cool example.
  • Algorithms: Genetic algorithms help systems to automatically adapt themselves to their context.
  • Cooperation models: Swarm Intelligence such as ants also illustrates how emergent behavior can substitute centralized control.
  • Architecture: Patterns such as Leaders/Followers or Blackboard help introduce decentralized approaches.
  • Technologies: Rules engines and declarative languages such as Prolog are useful for providing decentralized computing.

The problem with decentralized versus centralized systems is often when to use what. And in addition, to decide how much decentralized the system should be. This is not always obvious. Let me give you some examples instead:

  • In a sensor network you might place many small robots to a specific area in order to monitor environmental data. These robots would be completely autonomous. Even if some of them fail, the overall system goal can be achieved.
  • In a Web Crawler several threads could search for URLs in parallel, but they would need to synchronize themselves (e.g., using a tuple space) in order to prevent endless loops. This resembles ant populations.
  • In a car or plane you need central control, but some parts can also operate and adapt  independently such as Airbags or ESP. This is more a hybrid approach with the global strategy implemented by a centralized approach.
  • In embedded (real-time) controllers (such as your mobile phone) everything typically is centralized  because behavior must be statically predicted and configured.

The more co-operation your components require and/or the more determinism you need, the less a decentralized approach seems appropriate, and vice versa. 

The best way to design such decentralized systems is to introduce the concept of emergent behavior. For example, define a set of simple local rules to which the system parts must abide. And then let the parts take control.  By the way, this is exactly how the Internet works. But in the Internet, you will also find a hybrid approach combining centralized backbones with decentralized nodes. 

Saturday, February 09, 2008

Nervous Breakdown by Communication Overkill

Whenever I meet IT experts in airline lounges or conference rooms they are often very busy in synchronizing their memory with the continuous information flow from their company, customers, and other sources. Typically, they are armed with a quite impressive weaponry that includes blackberries, (>= 1) mobile phones, notebooks, and PDAs. Asking them about the effectiveness and efficiency of their work, they mostly complain about the impossibility to perform real work in their office in addition to the burden of all these travels and meetings. If we did a cost/benefit analysis relating the quantity of information exchange to work effectiveness, we would certainly be (negatively) surprised.

Let me just provide a theoretical example.

  • In average, meetings will require 2 hours per day. Given the fact, how ineffective meetings are planned and performed, I assume that at least 2 thirds of the time spent is a complete waste.
  • If you receive 100 mails per day, then throwing away 50 of them, might only take 5 minutes. However, reading 30 mails will require additional 30 minutes. And answering the remaining 20 will require 60 minutes. Thus, you've spent at least 1 1/2 hours for handling email.
  • Travel might also require 1 hour in average per day.
  • From time to time, you will have some personal face to face communication with colleagues, some of them just flocking to your office.
  • Telephone calls will require at least another hour per day.

Each media break (such as the arrival of e-mail, SMS, phone calls, meetings, etc) also interrupts your work and causes continuous context switches. As you know, context switches are ineffective time durations, not only in a CPU but also in real life.

If you summarize all of this, it is not very difficult to recognize the problem.

How does this relate to software architects? As you definitely know, a software architect is in charge of designing a software architecture which requires a lot of communication activities. The more ineffective communication proves to be, the less time can be spent for the actual job.

We are all part of a vicious cycle that involves ineffective communication, unnecessary breaks and context switches. The only way to solve the problem is not to stop communicating but to spend your time much more effectively.

Personally, I consider the following means very helpful:

  • If you need to work on a document or review, use your home office, given that you can work there without any major interrupts.
  • When you feel tired, exhausted or relax, go to the gym, or do whatever you feel appropriate to improve your mood. It is simple math. If you need to spend n hours for your work, it is not necessary to spend these hours in a row. For your company, mixing work time and recreation time is much more productive and less error-prone, especially when your job requires creativity.
  • Time box all meetings. I found out that m one-hour meetings are much more effective than one m-hour meeting. Needless to mention, meetings should be effective themselves (agenda known in advance, specific and clear objectives of the meeting available, have a moderator, and so forth).
  • Prioritize mails and phone calls. Tell people and train them, that you are not able to answer all mails or phone calls all the time. Consider voice-mail and mailboxes as friends. They allow you answering whenever you are prepared, not when the callers are. Switch off mobile phones, blackberries when you are in meetings, need to relax. Not to communicate sometimes is not a luxury but a necessity. By the way, it is not rude to tell colleagues that you are not ready to talk to them but that you will come back tothem later, if you are currently busy.
  • In your spare-time or holidays don't accept any business-related communication except for emergencies.
  • Travel is important as face-to-face meetings are the most effective form of communication, but don't underestimate the impact of traveling on your effectiveness. Thus, only travel when really necessary.
  • Plan your time thoroughly (time management). And also plan sufficient spare time for recreation to keep your work/life balance and may be even your relationships healthy.
  • Diversify your work day. Three meetings or telephone conferences in a row are boring, exhausting and ineffective. The same holds for sitting in front of your PC or notebook (or TV) for several hours.   

There is the common misconception that knowledge workers such as software architects should always be available for communication and work 12 hours a day without any break. Eventually, the job of architects is to build high quality software architectures, and to do this successfully they need motivation, fun, and focus. In the end, multitasking such as communication in one thread while designing in the other is not feasible for humans. Sufficient recreation time (for example due to avoidance of communication overkill) is an essential means to do our jobs successfully.

Sometimes relaxing is more productive than working.

Architecture Governance

I consider cross-cutting concerns such as error handling as one of the major challenges in any software development project. These cross-cutting issues should be subject to what I call architecture governance. 

  • We need to establish rules how to uniformly handle these issues throughout the project. Otherwise, every developer or architect will provide her/his own solution which significantly reduces symmetry and orthogonality.  I've seen many projects where error reporting was provided by some subsystems using return values while others used exceptions instead. Even in projects where exception handling had been common practice, developers introduced their own exception types for similar exceptions. Guess, how well such an architecture is balanced.
  • We need to define roles and persons in charge of supervising software development and enforcing the rules. In one project architects had created documents for all of these concerns. Unfortunately, developers didn't care about the documentation. Policy enforcement tools such as FxCop for .NET can provide help. Trust is good but control is better.

The best way to guarantee proper handling of these concerns is to provide guidelines from day 0. Architects are in charge to establish a strategic architecture document as well as the design guidelines and programming conventions in the beginning, not as an afterthought. All places where those rules are then violated within the project are subject to refactoring.

What are typical cross-cutting concerns in this context? Examples include:

  • Error Management
  • Security Checks (such as most cross-cutting non-functional qualities)
  • Naming Conventions
  • Mandatory Patterns

Why is that important? Because allowing everyone doing whatever she/he considers best, will lead to less expressiveness and readability, high accidental complexity, less developer usability, missing orthogonality and symmetry. With other words, it will have an negative impact on all architecture qualities.   

This is why Architecture Governance is essential!