Automatic Data Layout Optimizations for GPUs

dc.contributor.authorKofler, Klaus
dc.contributor.authorCosenza, Biagio
dc.contributor.authorFahringer, Thomas
dc.date.accessioned2018-06-04T14:59:51Z
dc.date.available2018-06-04T14:59:51Z
dc.date.issued2015
dc.description.abstractMemory optimizations have became increasingly important in order to fully exploit the computational power of modern GPUs. The data arrangement has a big impact on the performance, and it is very hard for GPU programmers to identify a well-suited data layout. Classical data layout transformations include grouping together data fields that have similar access patterns, or transforming Array-of-Structures (AoS) to Structure-of-Arrays (SoA). This paper presents an optimization infrastructure to automatically determine an improved data layout for OpenCL programs written in AoS layout. Our framework consists of two separate algorithms: The first one constructs a graph-based model, which is used to split the AoS input struct into several clusters of fields, based on hardware dependent parameters. The second algorithm selects a good per-cluster data layout (e.g., SoA, AoS or an intermediate layout) using a decision tree. Results show that the combination of both algorithms is able to deliver higher performance than the individual algorithms. The layouts proposed by our framework result in speedups of up to 2.22, 1.89 and 2.83 on an AMD FirePro S9000, NVIDIA GeForce GTX 480 and NVIDIA Tesla k20m, respectively, over different AoS sample programs, and up to 1.18 over a manually optimized program.en
dc.identifier.isbn978-3-662-48096-0
dc.identifier.isbn978-3-662-48095-3
dc.identifier.issn0302-9743
dc.identifier.urihttps://depositonce.tu-berlin.de//handle/11303/7910
dc.identifier.urihttp://dx.doi.org/10.14279/depositonce-7071
dc.language.isoenen
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subject.ddc004 Datenverarbeitung; Informatikde
dc.subject.otherGPUen
dc.subject.otherOpenCLen
dc.subject.otherArray-of-Structuresen
dc.subject.otherStructure-of-Arraysen
dc.subject.otherAoSen
dc.subject.otherSoAen
dc.subject.othermemoryen
dc.subject.otherdata layouten
dc.subject.othergraph-based modelen
dc.titleAutomatic Data Layout Optimizations for GPUsen
dc.typeConference Objecten
dc.type.versionacceptedVersionen
dcterms.bibliographicCitation.doi10.1007/978-3-662-48096-0_21en
dcterms.bibliographicCitation.originalpublishernameSpringeren
dcterms.bibliographicCitation.originalpublisherplaceBerlin ; Heidelberg ; New York, NYen
dcterms.bibliographicCitation.pageend274en
dcterms.bibliographicCitation.pagestart263en
dcterms.bibliographicCitation.proceedingstitleEuro-Par 2015: Parallel Processing. Euro-Par 2015. (Lecture Notes in Computer Science, vol 9233)en
tub.accessrights.dnbfreeen
tub.affiliationFak. 4 Elektrotechnik und Informatik>Inst. Technische Informatik und Mikroelektronik>FG Architektur eingebetteter Systemede
tub.affiliation.facultyFak. 4 Elektrotechnik und Informatikde
tub.affiliation.groupFG Architektur eingebetteter Systemede
tub.affiliation.instituteInst. Technische Informatik und Mikroelektronikde
tub.publisher.universityorinstitutionTechnische Universität Berlinen
Files
Original bundle
Now showing 1 - 1 of 1
Loading…
Thumbnail Image
Name:
KoflerEUROPAR15.pdf
Size:
1.08 MB
Format:
Adobe Portable Document Format
Description:
Collections