Hitchhiker's Guide to Software Architecture and Everything Else - by Michael Stal

Thursday, October 15, 2009

Data centric Architecture

In most projects there is a big emphasize on functionality. Use case modelling defines responsibilities of the system as well as how it interacts with actors. Logical modelling then maps the use cases to related subsystems/components which, of course, is functionality based. At the end we'll get functional interfaces between subsystems or between components. Do you recognize that methods such as UML mostly focus on methods?



However, in many cases the data exchanged between architectural entities is rather complex. For example, in healthcare scenarios complex structures with patient information or charging information are the rule. Not dealing with data as first class citizens might lead to various issues such as versioning problems, or inadequate partitioning or aggregation of data. Thus, architects should deal with data modelling very soon if they are facing complex structures in their development projects.



Data modelling starts with the domain model. What are the main entities in your problem domain and which kind of data do they need to persist, share, or transfer?

Then in the next "layer" when we are introducing the subsystems we'll have to think about data structures in subsystems and data structures exchanged between subsystems. For example, what exactly are the data structures required in the functional interfaces. Obviously, these data structures can be partially derived from the domain model with further data structures required by infrastructural entities or non-functional qualities.

In the component layer we are starting to deal with finer grained data types.

Last but not least we'll have to define how data is mapped to implementation artifacts.



Thus, we should also follow a top-down approach for data modelling. And likewise we need to consider bottom-up constraints such as special data requirements of infrastructures or platforms. With other words, data modelling must go hand in hand with all the other architectural modelling activities:



Important questions to ask when modelling data:


  • Who is the originator of data?

  • Who is the consumer of data?

  • How does data move from the originator to the consumers?

  • Do we need to collect/aggregate data from different sources (transfer objects)?

  • What about quality constraints such as performance (e.g., caching data), security (e.g., encrypting data), modifiability (e.g., versioning data), concurrency (e.g., locking data), availability and reliability (e.g., replicating data, data persistence)?

  • Where and how is data stored?

  • How do different data structures depend/relate?

I think, data modelling is an underestimated and undervalued activity in architecture design. What about your experiences and opinions?