The desire to improve a little bit, every day
… was our reason for taking monitoring & reporting to the next level.
We always say that integration is complex, but not difficult. Complex because of the extensive process, much bigger than just the technical aspect of it, and yet, at the same time not difficult if you have the right knowledge. What also makes our field complex is the dependence on external applications with the technology they use, it is never the same and always evolving.
With the growth in the number of integrations, the need for predictability has increased, especially in those parts of our solution that were not available to every consultant. Or more difficult to understand. In daily practice, we distinguish a number of the most common possible causes that lead to disruptions or overload of the infrastructure;
In each step in the integration flow, data is exchanged from one component to the next. If that cannot be handled within a certain time frame, a queue of “pending exchanges” will arise. This has a negative effect on the speed with which the entire flow is handled, but also on the infrastructure used. We often see this with (external) endpoints that cannot process the data offered fast enough. The best comparison to our daily lives is traffic jams on the highway, the number of cars using the highway at the same time, determines whether we can continue driving and at what speed, or in the worst case scenario come to a standstill.
Within the flow manager in Dovetail you can get insight into the number of pending exchanges. By using the trace-functionality, information can be found to optimise the flows. It is “best practice” to assume the “happy flow” will not always work and be critical of the performance in the first weeks after going live.
In this case it “just” goes wrong – a processing cannot be done and the flow falls back into the error flow. While a “pending exchange” basically causes a delay, a “failed exchange” most likely disrupts the functional process. Finding the cause of a “failed exchange” is therefore critical and we consider finding preventive measures to be an iterative process. But, often when you put a flow live, you don’t know everything.
We usually see failed exchanges around (external) endpoints that provide an unforeseen answer, or no answer at all. It is therefore important to pay extra attention to that endpoint and to handle the provided answers properly. At the same time, you have to be prepared for what you don’t know yet or couldn’t know at that point.
Variables affecting the result
Besides pending and failed exchanges, a well-functioning integration platform is influenced by use. In practice, this use is influenced by the quality of the integrations built as described above and certainly also by the quality of the endpoints. Other factors of influence are the volume and frequency of incoming and outgoing data, the volume of file format conversions and mappings, the number of flows and components used. The available shared or private infrastructure, in particular the number of cores and available memory, is another factor of influence. Scaling up has a direct relationship to costs that must be able to be passed on and it cannot be implemented free of charge.
The new Health Check APIIn Dovetail version 4.13.1. the health check API has been further expanded. In addition to flow information (pending, failed, completed), data is now also offered about the use of CPU and memory in the test and production containers. By combining data about flows and infrastructure, it is possible to zoom in and find correlations about the “health” of flows.
Error route and flow function monitoringTo get a better grip on failed exchanges and function monitoring, our consultants have developed some best practices that our partners can add to their own Dovetail implementation(s). This catches errors and processes them into a database for reporting and analysis.
Error route monitoring (ERM)The goal of the error route monitoring is to collect, analyse and report failed exchanges with all response information so that the error can be prevented next time. An example is; failed authentication because a user needs to reset something in their own system. By informing the person responsible in such a process about an error that has occurred, this person can often solve the problem himself and thus prevent the integration process from coming to a standstill (for a long time).
Flow function monitoring (FFM)
In FFM everything that does not meet the assumptions and expectations is caught. Earlier we mentioned the so-called “happy flow” in which the behaviour of endpoints is taken into account as much as possible. This does not prevent (undocumented) responses from leading to something unexpected. The result of FFM is that previously unknown responses from endpoints become known and can be handled functionally within the definition of the flow.