How to detect anomalies and analyze their root causes with AI in real-time
Machine learning and self-learning algorithms have taken the world by storm, with breakthroughs appearing almost daily in the news. The drivers for this development are the existing computing capacities, the developed algorithms and the possibility to store and process a large amount of data. Streaming technologies like Kafka enable to analyze a large amount of log files in real-time. The software design has also evolved, as today’s architectures are as distributed as they are interconnected. We show how one can use machine learning to find and understand cascading failures in big interconnected software and hardware landscapes. The collection and stream processing of log files allows us to train machine learning models, so that global system anomalies are detected, clustered and displayed live in R Shiny dashboards. Furthermore, we use Open source tools to detect and eliminate the root causes. This machine learning assisted monitoring can compliment traditional system monitoring solutions. In segmenting and grouping problems, it is even superior and fills the gaps of traditional strict rule-based monitoring systems. We show what is possible with modern streaming technologies and machine learning algorithms and anticipate that this and more sophisticated future AI-based monitoring solutions, will serve the global financial markets, overlook processes in the manufacturing industry and ensure the safe communication worldwide.