The monolithic architecture that ruled internet-based applications is dying fast, with more and more companies adopting the so-called microservices architecture. The idea behind the microservices architecture is to enable small software development teams to build, deploy, and scale a single feature or a small set of features, independently of other features. Each of these microservices is owned by a small team to develop all aspects of the software without affecting other parts for the system.
Also, microservices architecture addresses the problem of 'scaling' agile teams – a lot of work has gone into devising techniques for scaling the agile development model – the SAFE model, for example. But all these have shown rather lackluster results. Microservices based development side solves this problem, by having small agile teams developing services independently.
Or so goes, the folklore.
In practice, microservices-based development becomes complex, complicated, and uncontrollable in no time. Apart from the fact that it is extremely difficult to architect large systems with features that are truly 'independent' of each other, microservices-based systems are manifestations of distributed systems. Problems associated with distributed systems were so far restricted to system software – operating systems development, core telecom backbone software, and infrastructure level software such as RDBMSs, etc. Now they are also surfacing in the world of enterprise applications.
Distributed systems bring in complications due to their asynchronous properties, concurrency, and latency issues. All these attributes make them hard to diagnose, debug, and restart. While monitoring tools are arriving fast, it's equally hard to find trained people who can go through numerous lines of logs to figure out points of failure. Ability to test, diagnose, and recover must be architected right from the beginning in such systems.
The first thing is to understand and accept is that microservices-based systems are indeed distributed systems and one cannot approach them in the same way as classic monolithic applications that mostly run in a single process space, leaving an easy to follow the trail of execution. For example, in a system that consists of, say, a hundred microservices, it is almost impossible to recreate a bug – unless it has been specifically architected to do so, using the event-sourcing model for instance. Moreover, it’s extremely hard to model the state of the whole system at any point in time.
All of these are classic problems of distributed systems and have been known to system programmers (people who create operating systems, database software, infrastructure software such as distributed queues). Even Java and C# developers routinely run into thread synchronization problems that are hard to diagnose, debug, and fix. In case of distributed systems, these problems are amplified by an order of magnitude (if not many orders of magnitude!).
Similarly, concurrency or two/more programs trying to access the same resource can lead to issues that drive developers crazy. Again, the theme is the fact that when concurrency failures occur, they are hard to diagnose and fix. Systems must be built ab initio with the ability to deal with the problems of concurrency.
Distributed systems were originally investigated and formalized in the 1970s by the very smart people, like Dr. Leslie Lamport, (and others) – whose work paved the way for much of the distributed computing architecture in the 1990s and thereafter. His work on logical clocks, Paxos algorithms, and so on – are directly responsible for the creation of cloud infrastructure that we use today. The earliest application of his work was in the area of two-phase commit for RDBMS transactions. Those interested may look at his profile and try to understand the core concepts created by him. As one might have guessed, Professor Lamport is a Turing awardee. In recent years, he has created a formal specification language called TLA+ that enables us to define error-free distributed systems.
Coming back to the practical digital transformation using microservices, here's what I suggest we can do:
Have a comprehensive discovery phase, identifying services or clusters of services – can be packaged as microservices.
While the temptation is there to have thousands of microservices, it is not advisable. The costs of monitoring them, diagnostics, and recovering from failure – can be exceedingly high. The idea is to build services that are clustered around a business function. Using such models, one can limit the number of services.
Seriously invest in architecting the system, before embarking on development. The idea that architectures can be refactored as we go along, is absolute nonsense. While it may be easy to refactor an individual microservice, it is nearly impossible to refactor a microservices-based system.
Build a complete CI/CD chain with automated unit/integration/function tests. Without this, it will be unfeasible to react to critical bugs in the future or even add new features that span multiple services.
If possible, choose the right language for each of the problem. Microservices architectures allow us to use polyglot stacks. Even cloud functions can be used at places, but since they bring in an even larger dimension into distributed computing, they must be used sparingly.
Build a robust plan for diagnostics and recovery, using appropriate event-logging systems and other methods. All programs must be written with the assumption that they will be hard to recover, and they must leave a robust audit trail.
Critical security aspects must be built in upfront, especially with IoT based systems that can use actuators to shut down and restart devices.
Into this already complex ecosystem, distributed computing is going even more granular – with micro-frontends and edge computing. Life is going to be interesting, extraordinarily interesting!
If your company is embarking on a digital transformation program, that requires decomposing existing large systems into a microservices-based one, please reach out to us at Extentia. We will show you the way!
About the author – Anand is a programmer and a software architect who has decades of experience in building software systems ranging from compilers, operating systems, real-time systems, and remarkably large-scale distributed systems.
Read other Extentia Blog posts here!