A System Architecture for Real-time Anomaly Detection in Large-scale NFV Systems
Virtualization as a key IT technology has developed to a predominant model in data centers in recent years. The flexibility regarding scaling-out and migration of virtual machines for seamless maintenance has enabled a new level of continuous operation and changed service provisioning significantly. Meanwhile, services from domains striving for highest possible availability – e.g. from the telecommunications domain – are adopting this approach as well and are investing significant efforts into the development of Network Function Virtualization (NFV). However, the availability requirements for such infrastructures are much higher than typical for IT services built upon standard software with off-the-shelf hardware. They require sophisticated methods and mechanisms for fast detection and recovery of failures. This paper presents a set of methods and an implemented prototype for anomaly detection in cloud-based infrastructures with specific focus on the deployment of virtualized network functions. The framework is built upon OpenStack, which is the current de-facto standard of open-source cloud software and aims at increasing the availability and fault tolerance level by providing an extensive monitoring and analysis pipeline able to detect failures or degraded performance in real-time. The indicators for anomalies are created using supervised and non-supervised classification methods and preliminary experimental measurements showed a high percentage of correctly identified anomaly situations. After a successful failure detection, a set of pre-defined countermeasures is activated in order to mask or repair outages or situations with degraded performance.
Published in: Procedia Computer Science, 10.1016/j.procs.2016.08.076, Elsevier BV