SNR: Network-aware geo-distributed stream analytics
Emerging applications such as those running on the Internet of Things (IoT) devices produce constant data streams that need to be processed in real-time. Distributed stream processing systems (DSPs), with geographically distributed cluster networks interconnected via wide area network (WAN) links, have recently gained interest in handling these applications. However, these applications have stringent requirements such as low-latency and high bandwidth that must be guaranteed to ensure the quality of service (QoS). These application requirements raise fundamental DSPs resource management and scheduling challenge. In this paper, we formulate the problem of placement of worker nodes on a geo-distributed DSPs cluster network as a multi-criteria decision-making problem and propose an additive weighting-based approach to solve it. The proposed solution finds the trade-off among different network parameters and allows executing the tasks according to the desired performance metrics. We evaluated the proposed approach using the Yahoo! streaming benchmark on a testbed and compare it against mechanisms deployed in Apache Spark, Apache Storm, and Apache Flink. The results of the evaluation show that our approach improves the performance of Spark up to 2.2x-7.2x, Storm up to 1.2x-3.4x, and Flink up to 1.4x-3.3x compared to other approaches, which makes our approach useful for use in practical environments.
Published in: IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID), 10.1109/CCGrid51090.2021.00100, IEEE