During the last few weeks, we found the system a little greedy on memory consumption, so we had to do some additional hand-holding from time to time. We experienced a similar behavior tonight while having the opportunity to analyze the root cause.
In certain circumstances we don’t exactly know about yet, the system will trigger a message loop on the MQTT bus. After that, all services consuming messages from there started suffering badly. It looks like as we start receiving more traffic on the acquisition system it gets more likely that this error is triggered.
We traced the root cause back to a feature released with Kotori 0.20.0 and enabled on our platform in May 2017. Funny enough, this feature actually is about error handling.
In certain conditions, an invalid message received from the MQTT bus started kicking off the message loop and things obviously spiraled out of control.
Building message loops is one of the fine arts when running bus or network systems on a shared medium and so we are finally happy we are now part of the family ;]! It is really strange this hasn’t happened before as we are running the system in this configuration for almost a year now. However, we are always happy to catch such edge cases to be able to add more robustness to the system, as we will do with the next software release.
We are sorry for any inconveniences this might have caused for you.
With kind regards,
the people of Hiveeyes.