The other day I blurted this out on Twitter:
Then I received an interesting comment:
A repartee:
The debate picks up:
And we get to the definition of a brittle system:
Finally, my reply:
And that was the point is when the discussion petered out…
What is a brittle system?
In the simplest terms, a system is brittle if it is easy to break it and hard to bring it back to full functionality.
For the record, there are systems which are easy to break and yet they’re not brittle. Such systems are not brittle; they are resilient. They are robust.
Most systems observed in nature belong to that category. Yes, it is very easy to break almost anything that we find in nature. But one interesting aspect of naturally evolved systems is that despite those systems being easily breakable, they do not remain in a broken state for long. They seem capable of healing spontaneously.
Change causes breakages
It is my observation that, generally speaking, software developers dislike change. The reason for that is that almost any time something changes, the system implemented as a software solution tends to break. Those breakages are causing a lot of consternation because, once broken, the system would need extra care and additional work to bring it back to life. Software systems in general don’t seem to have the ability to spontaneously bounce back to previously healthy state once they break down.
Software developers prefer if the solution that is proven to work well continues working well. One way to ensure that is to try to ignore subsequent changes.
The other proposed way to ensure that working solutions continue working despite changes is a very peculiar one; people call that approach “future proofing our solution”. Now, what does “future proofing” actually imply?
To be able to “future proof” something, one would need to have the ability to see into the future. That means there must be a proposed way that would enable us to predict what kind of a change will hit our working solution at some future time.
The problem with that approach is in the fact that none of us seem to be in the possession of a crystal ball — we cannot predict the future with any reasonable degree of certainty. Change remains mysterious to us, regardless of how much effort we may put into trying to predict it.
And because change is mysterious and cannot be predicted, it is disruptive — it breaks our system.
Why is there breakage?
The next logical question is: why is there breakage when a change occurs? And what is it that breaks?
One way to describe a system is to say that it is a collection of smaller units that is assembled in such way that those units are somehow coupled. By coupling, those units enable the system to function and to remain operational.
When a change occurs, if it affects previously established coupling, the functioning of the system degrades. From that we see that it is coupling that breaks under the pressures of a change. If it wasn’t for the coupling, a change would not be able to adversely affect the functioning of the system.
Can we remove the coupling?
If we were then to remove the coupling (so to avoid breakages), we would in effect remove the system itself. It is not possible to stand up a system unless its constituent parts are somehow coupled and inter-dependent.
Can we loosen the coupling?
Since it is not possible to remove the coupling without destroying the system, the next thing to look into is the possibility of loosening the degree of coupling.
A tightly coupled system is very vulnerable to change. If any of the coupling parameters deviates even slightly, the system breaks. From that we see that tight coupling may not be desirable.
Yes, but how do we loosen the coupling? There are two ways to loosen the coupling:
Control the timing of the binding (i.e., coupling)
Control the location of the knowledge necessary to bind/couple the system’s parts
How to control the timing of the binding?
If the parts that comprise the system are bound together at the design time (i.e., early on, eagerly, prematurely), we call that “early binding”. Early bound systems are very brittle, and cannot easily withstand any change in their configuration.
If the parts that comprise the system are bound together at the execution/run time (i.e., at the last responsible moment), we call that “late binding”. Late bound systems are less brittle, and could withstand some changes in their configuration.
So, knowing that, how do we control the timing of the binding/coupling? Obviously, it is more desirable to build a system that enjoys the benefits of late binding, as such systems are more capable of withstanding unpredictable changes.
The trouble is, it seems much easier to design and build systems that are early bound. Which is why we see much more early bound systems in production than we see late bound systems. But just because it’s easier to build early bound systems, doesn’t mean it’s more desirable to do so.
The challenge, then, is to acquire skills that would enable us to design and build systems that would be late bound. Make that very late bound!
What skills would be needed? In my view, one of the most empowering set of skills that will take us to the point of being able to create late bound systems that can evolve without the danger of breaking, is the understanding of how the world wide web works. We need to understand native web mechanisms.
Once we fully understand how the web works, it will be easier to build systems where we control the timing of the binding.
How to control the location of the knowledge necessary to bind/couple the system’s parts?
Before we take a quick look into the way the web works, we need to examine the ways systems tend to locate the information needed to enable the binding/coupling/interaction.
There are two possible locations of that knowledge:
Out-of-band
In-band
Let’s look first into the out-of-band location of the knowledge necessary to complete the operation. Out-of-band actually denotes asynchronous interaction. For a client to be able to interact with the system in productive, non-breaking ways, the client needs to obtain the knowledge that explains how to do that. That knowledge is located outside of the current band — meaning the system itself cannot procure that knowledge to the calling client. The client would have to first obtain that knowledge (asynchronously). If the client fails to do so, the client will not be able to interact with the system.
How does a client obtain that knowledge? Typically, the client would have to get hold of some documentation. That documentation could take various forms, and could be stored in various locations. But the important thing to note is that the location of the necessary documentation is out-of-band. Nowadays, we mostly find that documentation in the form of API Docs, implemented as Swagger, RAML, whathaveyou. Bottom line, the client will have to know ahead of time where to go to find and study pertinent documentation. And that activity must happen prior to the client attempting to interact with the system.
With the in-band location, all knowledge necessary to successfully couple/interact with the system is delivered in real time, synchronously, to the client. That knowledge is contained in the response that the system sends to the client. The client need not know anything ahead of time. Such arrangement enables ample ambiguity, which, as we’ll see later in this article, is a prerequisite for building robust, resilient systems.
Why is out-of-band not a desirable model?
Out-of-band approach to specifying how should the interaction/coupling between the client and the system unfold is not desirable because it promulgates early/premature binding. That in turns hardens the system and makes it extremely brittle.
Another undesirable aspect of out-of-band type of interaction/coupling is the real and frequently observed pattern where the documentation is quickly deviating from the implemented functionality. While teams are evolving the system, oftentimes in the heat of the battle the documentation lags behind. That creates issues, since clients would study the out-of-date documentation (not being aware that the documentation is stale), then try to apply the absorbed knowledge, and in the process break the system.
OK, now that we’ve looked into how to control the timing of the binding (i.e., coupling) and how to control the location of the knowledge necessary to bind/couple the system’s parts, let’s go over the way the web works.
How does the web work?
The web works by following the Robustness Principle. That principle states:
“Be conservative in what you do, be liberal in what you accept from others.”
It is thanks to this principle that today, in the year 2023, we can still find and operate websites that were created back in 1993. We can accomplish that even when we are using the latest versions of most modern web browsers. That’s some impressive robustness!
One of the effective ways of grasping how does the web work is to view the web as a collection of resources. Those resources are implemented somewhere in the cloud (or on premise). Clients (i.e., people or other machines) are interested in utilizing/leveraging some of those resources. But the way things work on the web, it is impossible to access those resources. Resources are safe, no one can touch them (other than their creators/maintainers). So, what’s a client to do then?
Well, the good news is that resources know how to represent themselves. The only way to make use of those resources is to interact with resource representations. Now, the interesting part is that each resource can potentially represent itself in more than one way.
The interaction between a client and a resource unfolds in the following fashion:
A client obtains the location on the network of some resource that the client may be interested in (the location may be obtained via consuming some advertising, or via search engines, or as a forwarded bookmark, etc.)
The client then sends the request to the resource, asking it to represent itself
The resource processes that request and prepares a suitable representation of itself, then sends that representation back as a response
The client consumes the received response and decides what to do next, based on what the client finds in the received response (in-band information)
Keep in mind that in the above choreography, a client stands for both a human who operates a web browser, and for a machine who programmatically interacts with the resource via its representation.
The web enables late binding/loose coupling
By following the above described protocol, the web as a computing platform enables us to design and implement loosely coupled systems that follow the very late binding operational model. As such, any system built that way is way less brittle, and is more resilient to changes.
The important thing to notice is that such systems are based on ambiguity — nothing is preordained. It is ambiguous what the resource will offer as a response when the client sends a request to the resource, asking it to represent itself. The client has no idea what to expect. The client is in a very ambiguous state.
And yet, the system continues to function as expected. Nothing breaks. Loosely coupled systems enable this ambiguity, and this ambiguity enables robustness and resilience. Which in turn frees up the teams to safely evolve their solutions without fearing that their clients may break when attempting to interact with the system.
All that thanks to loose coupling, late binding, in-band information, and ambiguity.