Hitchhiker's Guide to AI, Software Architecture, and Everything Else: The Complexity Of Interface Extraction

From day 1 when remoting middleware appeared on the horizon, one of the main challenges for application developers consisted of extracting interfaces from application functionality to allow decomposing applications into a network of objects. The same issue is still relevant to appease the gods of Service-Oriented-Architecture. Unfortunately, many developers have been fooled by middleware proponents constantly praising the transparency caused by middleware. This transparency makes developers believe building a distributed system is exactly the same like building a non-distributed one. Just take your application classes and make them service interfaces. That's all what you need when migrating from a traditional to a distributed universe. Believe this and your project is doomed to fail. What, the hell, is the right approach to extract interfaces for a distributed software system and what are common pitfalls? Want some examples? Here we go:

Granularity: Suppose, you have developed a Java class that implements a graph-like structure such a s a tree. You got classes such as Tree and Node. As you are a smart programmer, you already have extracted interfaces such ITree, INode instead of bloating the class interfaces with oceans of methods. When you navigate to a subtree, a method such as navigate(theChildINeed)returns the right child tree. If you now naively convert all of your classes, i.e. the aforementioned interfaces, into remote interfaces, things will rapidly get messy. Each invocation of navigate will now return another remote object. All of a sudden, network traffic will explode. You can't simply develop remote classes as POJOs/POJIs or PONOs/PONIs. For the same reason, it does not make sense to transfer fine-grained data in service invocations. Sequences such as node.setName("Michael"), node.setStreet("Marketstreet"); node.setCity("SFO"); are DONOTs in any SOA or Remoting context. Firstly, they imply a lot of traffic. Secondly, the calls are related and might force the server to keep session state across method calls.
Impedance Mismatch: In heterogeneous networked environments (such as SOA) you'll always face the problem of mapping between the object model of your language and the object model of the remotung technology, and vice versa. Whenever you try to transmit a HashTable between a .NET application and a Java EE application over a SOAP connection, things will turn out to be like hell. This does not work without significant work arounds. As a relational database programmer or O/R victim you will now what I am talking about. Most remoting solutions will only provide the least common denominator approach in their object model. As soon as you are trying to integrate the richness of your preferred programming language into that middleware, you are calling for trouble.
Operational Qualities: Add to all this the complexity of operational qualities such as security or performance. Basically, you are now forced to combine the security infrastructure of your application with the security infrastructure of the middleware. And you have to take performance of your application objects into account, but also the costs of communication. And you have to integrate distributied transaction monitors with local transaction APIs. And ...
Contracts: An interface is not just a matter of syntax. Nor is it just a set of signatures. It is also about semantics such as preconditions and postconditions. A finite state automaton could specify in which order which methods are allowed to be called. The contract might also deal with local policies and define the protocols with which clients can bind to the service. Note, that this is much more than what you would expect from an interface in a non-networked environment.
Architectural Balance: Let us suppose, we have magically solved all these issues. How should we compose our application to remote components or services. We might end up in an ocean of components. Or we might end up in an ocean of service interfaces. Architectural patterns such as Layers or Extension Interface are helpful in this context. I must confess that no rule of thumb exists for this architectual structuring. It really depends on the concrete problem context how to address this issue. One really helpful way is to think in terms of roles and responsibilities.

Basically, what all this means is that when you start to develop your distributed system from scratch, you need to thoroughly consider the issues explained above. However, when you are in the situation to integrate legacy code, things are much more complex: you'll need a whole bunch of refactoring activities to reorganize your application in such a way that is will consist of classes and interfaces that easily fit into the middleware ecosystem. In the best case, it might be sufficient to add some wrapper {facade} objects. However, in the case of bad luck you will even need re-engineering efforts.

I tried to shed some light on the complexities of application integration and provisioning of remote interfaces. What appeared so obvious and simple in the beginning, in fact is one of the most complex tasks engineers have to cope with. Efficiently building efficient distributed systems implies thorough treatment of all these issues. Believe me, prevention is better than cure!

Hitchhiker's Guide to AI, Software Architecture, and Everything Else

Friday, April 13, 2007

The Complexity Of Interface Extraction

No comments:

About Me