IBrokering - Avoiding meatball code

The What and Why of 'IModelModifiedBroker' and 'IBrokeredDataObject.'

I am justifying my use of the term Brokered, from the definition of the base term broker as "One that acts as an agent for others", the explanation for which is as follows.

This is a pattern I have been using for a while, and it is similar to the Observer patterns in some way.

At times, when I have a relatively large, complex, object model, there have been times when the creation, modification or deletion of an object may effect another object directly or indirectly. This can lead to a sort of object based spaghetti code, where relationships between loosely coupled, unrelated objects are forced on the model simply so objects can inform, or perform functions on, dependent objects. I have shamelessly named this programming technique meatball code, an extension of spaghetti code, a lumpy, bolanaise based anti-pattern. Again, I offer no apologies for my terminology.

To put it simply, the IModelModifiedBroker interface is a contract for an object instance that is shared throughout an object model that reports all changes to the model. Objects subscribe to its events to receive messages regarding changes to the model, and use it to report changes to itself to the model. This allows clean separation of objects that may need to respond to changes to other objects in the model.

To give an example from my Data Warehousing days, consider a star schema that provides source data to a cube that sources a report. To build an application to allow this concept to be modelled, you may come up with the following design;

Meatball code recipe

  • A star schema, or fact, has metric data (not shown in model) that is defined by the dimensions referenced in the star.
  • The dimension, in this case having a single hierarchy, is composed of levels, such as Product Group, Product Sub Group, Brand, SKU
  • Each level is represented by some data, i.e. Key, Surrogate Key, Name, Description and the like, that represented by database columns.
  • The star schema sources a cube, which in turn sources a report

This is quite clean and elegant. Consider the following requirements;

Requirement 1)

There are a mandatory set of columns that each level must contain. For example, Key and SurrogateKey and Name.

The naming convention for these columns is set to

  • [DimensionLevelName]_Key
  • [DimensionLevelName]_SurrogateKey
  • [DimensionLevelName]_Name

We could handle this simply by either

  1. If we don't mind storing the instance of the parent DimensionLevel in the DataColumn, we can store a base name concept on certain columns, and deriving it the column's full name whenever the Name property is called;

  2.         public string Name
            {
    
                get
                {
                    if (string.IsNullOrEmpty(_baseName)) return _name;
    
                    return string.Format("{0}_{1}", ParentDimensionLevel.Name, _baseName);
                }
            } 

  3. If we don't want to store the instance of the parent DimensionLevel in the DataColumn, we could loop through the Level's DataColumns and set the names individually.

  4.         public string Name
            {
                get
                {
                    return _name;
                }
                set 
                {
                    _name = value;
    
                    foreach (DataColumn column in Columns.Where(c => !string.IsNullOrEmpty(c.BaseName)))
                    {
                        column.Name = string.Format("{0}_{1}", value, column.BaseName);
                    }
                }
            }   

 Both methods have their pros and cons, but as the coupling between DimensionLevel and Column is relatively strong, there is no real danger of the model being changed to cater for it. The second requirement, however may be a bit more challenging.

Requirement 2)

Each DataColumn requires a BusinessName property. This can be used in the application to source the header fields of any item in a report that sources that field directly.

It is a loose coupling, the report may use any header text for a field title, but a property can be set on the report so that when a data row is mapped to a report from a cube, through a start schema referencing a dimension, the column Business Name is used as the header text.

As an example, it's a touch tenuous, as the report would generally reflect cube metadata, but in this application a report might be built directly off a database table as well as a cube, so for consistencies sake, we want to be able to maintain this functionality regardless of how many levels of separation there are from the data to the report.

Now we have a requirement that the report object (in reality it would probably be the report object hierarchy, but let's ignore that for present), needs to be able to react when a DataColumn's name was changed. We may do that a number of ways, some of the less satisfying I have seen done are below;

Let the Data Column be responsible for the name change in Reports.

This could be done by having Data Column to be holding a List<Report> collection of all the Reports referencing it, which would require some, indubitably inelegant, code to keep this collection up to date.

It could also be achieved by having a GetAllReportsDataColumnIsReferencedIn() style function somewhere up the hierarchy, which would require parent objects to be referenced in each object. Storing parent objects in this sort of object hierarchy I do not have a problem with, it has proved useful in the past, and adds little, if any, complexity to the code. The GetAllReportsDataColumnIsReferencedIn() function however would require many a nested loop to ascertain what reports referenced what DataColumns, which is not elegant, and certainly not going to deliver top performance.

Conversely, the report field header field could be derived when the property is called on the report object. This is probably the most elegant solution, but additional code needs to be written to ensure that the header field needs to be linked to the DataColumn Business name.

An additional problem arises also when a DataColumn, DimensionLevel, Dimension or Star is deleted. Again we have to decide how to handle this, with the same set of options available, however it could be argued the most elegant solution above is no longer a very good candidate, as we probably should be aware that there has been a break in the relationship between a report field and a DataColumn before it is 'Just In Time' requested. 

The main problem with this sort of coding is that we very quickly set up strong coupling between objects to handle loosely coupled requirements, as well as have business code that effects one object in a completely separate class (as in the Data Columns updating the Report Fields, above).

In this example, if a Broker object was shared throughout the model, a Report would be automatically informed of all DataColumn Business Name changes by subscribing to the Broker events.

It would still require some way to decide whether the DataColumn change was one it should care about, but the benefit of the Broker pattern is that all code pertaining to a Report's response to a Data Column Business Name change is now in the Report class itself. Also, as the Report is informed of the Data Column Business Name changed through the Broker, no extra plumbing code has to be written to inform pertinent objects of the change.

I shall discuss the structure and functionality of these two interfaces in the future posts.