Delay-Resistant Geo-Distributed Analytics

dc.contributor.authorMostafaei, Habib
dc.contributor.authorSmaragdakis, Georgios
dc.contributor.authorZinner, Thomas
dc.contributor.authorFeldmann, Anja
dc.date.accessioned2023-06-05T12:55:50Z
dc.date.available2023-06-05T12:55:50Z
dc.date.issued2022-12
dc.description.abstractBig data analytics platforms have played a critical role in the unprecedented success of data-driven applications. However, real-time and streaming data applications, and recent legislation, e.g., GDPR in Europe, have posed constraints on exchanging and analyzing data, especially personal data, across geographic regions. To address such constraints data has to be processed and analyzed in-situ and aggregated results have to be exchanged among the different sites for further processing. This introduces additional network delays due to the geographic distribution of the sites and potentially affecting the performance of analytics platforms that are designed to operate in datacenters with low network delays. In this paper, we show that the three most popular big data analytics systems (Apache Storm, Apache Spark, and Apache Flink) fail to tolerate round-trip times more than 30 milliseconds even when the input data rate is low. The execution time of distributed big data analytics tasks degrades substantially after this threshold, and some of the systems are more sensitive than others. A closer examination and understanding of the design of these systems show that there is no winner in all wide-area settings. However, we show that it is possible to improve the performance of all these popular big data analytics systems significantly amid even transcontinental delays (where inter-node delay is more than 30 milliseconds) and achieve performance comparable to this within a datacenter for the same load.en
dc.identifier.eissn1932-4537
dc.identifier.urihttps://depositonce.tu-berlin.de/handle/11303/19101
dc.identifier.urihttps://doi.org/10.14279/depositonce-17898
dc.language.isoen
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject.ddc600 Technik, Medizin, angewandte Wissenschaften::620 Ingenieurwissenschaften::620 Ingenieurwissenschaften und zugeordnete Tätigkeiten
dc.subject.otherwide-area analyticsen
dc.subject.otherbig data analyticsen
dc.subject.othergeo-distributed systemsen
dc.subject.othernetworked systemsen
dc.titleDelay-Resistant Geo-Distributed Analyticsen
dc.typeArticle
dc.type.versionpublishedVersion
dcterms.bibliographicCitation.doi10.1109/tnsm.2022.3192710
dcterms.bibliographicCitation.issue4
dcterms.bibliographicCitation.journaltitleIEEE Transactions on Network and Service Management
dcterms.bibliographicCitation.originalpublishernameIEEE
dcterms.bibliographicCitation.originalpublisherplaceNew York, NY
dcterms.bibliographicCitation.pageend4749
dcterms.bibliographicCitation.pagestart4734
dcterms.bibliographicCitation.volume19
dcterms.rightsHolder.referenceCreative-Commons-Lizenz
tub.accessrights.dnbfree
tub.affiliationFak. 4 Elektrotechnik und Informatik::Inst. Telekommunikationssysteme::FG Internet Network Architectures (INET)
tub.publisher.universityorinstitutionTechnische Universität Berlin

Files

Original bundle
Now showing 1 - 1 of 1
Loading…
Thumbnail Image
Name:
Mostafaei_etal_Delay-Resistant_2022.pdf
Size:
3.99 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.23 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections