FG Architektur eingebetteter Systeme

105 Items

Recent Submissions
A Quantitative Study of Locality in GPU Caches

Lal, Sohan ; Juurlink, Ben (2020-10-07)

Traditionally, GPUs only had programmer-managed caches. The advent of hardware-managed caches accelerated the use of GPUs for general-purpose computing. However, as GPU caches are shared by thousands of threads, they are usually a victim of contention and can suffer from thrashing and high miss rate, in particular, for memory-divergent workloads. As data locality is crucial for performance, the...

QSLC: Quantization-Based, Low-Error Selective Approximation for GPUs

Lal, Sohan ; Lucas, Jan ; Juurlink, Ben (2020)

GPUs use a large memory access granularity (MAG) that often results in a low effective compression ratio for memory compression techniques. The low effective compression ratio is caused by a significant fraction of compressed blocks that have a few bytes above a multiple of MAG. While MAG-aware selective approximation, based on a tree structure, has been used to increase the effective compressi...

High-throughput HEVC CABAC decoding

Habermann, Philipp (2020)

Video applications have emerged in various fields of our everyday life. They have continuously enhanced the user experience in entertainment and communication services. All this would not have been possible without the evolution of video compression standards and computer architectures over the last decades. Modern video codecs employ sophisticated algorithms to transform raw video data to an i...

Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels

Fan, Kaijie ; Cosenza, Biagio ; Juurlink, Ben (2020-04-27)

Energy optimization is an increasingly important aspect of today’s high-performance computing applications. In particular, dynamic voltage and frequency scaling (DVFS) has become a widely adopted solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies manually to minimize energ...

Performance Counters based Power Modeling of Mobile GPUs using Deep Learning

Mammeri, Nadjib ; Neu, Markus ; Lal, Sohan ; Juurlink, Ben (2019-07-15)

GPUs have recently become important computational units on mobile devices, resulting in heterogeneous devices that can run a variety of parallel processing applications. While developing and optimizing such applications, estimating power consumption is of immense importance as energy efficiency has become the key design constraint to optimize for on these platforms. In this work, we apply deep ...

An Efficient Lightweight Framework for Porting Vision Algorithms on Embedded SoCs

Ashish, Apurv ; Lal, Sohan ; Juurlink, Ben (2019-09-10)

The recent advances in the field of embedded hardware and computer vision have made autonomous vehicles a tangible reality. The primary requirement of such an autonomous vehicle is an intelligent system that can process sensor inputs such as camera or lidar to have a perception of the surroundings. The vision algorithms are the core of a camera-based Advanced Driver Assistance Systems (ADAS). H...

MEMPower: Data-Aware GPU Memory Power Model

Lucas, Jan ; Juurlink, Ben (2019-04-25)

This paper presents the MEMPower power model. MEMPower is a detailed empirical power model for GPU memory access. It models the data dependent energy consumption as well as individual core specific differences. We explain how the model was calibrated using special micro benchmarks as well as a high-resolution power measurement testbed. A novel technique to identify the number of memory channels...

SLC: Memory Access Granularity Aware Selective Lossy Compression for GPUs

Lal, Sohan ; Lucas, Jan ; Juurlink, Ben (2019-05-16)

Memory compression is a promising approach for reducing memory bandwidth requirements and increasing performance, however, memory compression techniques often result in a low effective compression ratio due to large memory access granularity (MAG) exhibited by GPUs. Our analysis of the distribution of compressed blocks shows that a significant percentage of blocks are compressed to a size that ...

Power modeling and architectural techniques for energy-efficient GPUs

Lal, Sohan (2019)

Graphics Processing Units (GPUs) have evolved from fixed function graphics processors to programmable general-purpose compute accelerators in a short time. The high theoretical performance and energy efficiency of GPUs compared to CPUs have made them indispensable for mainstream computing. However, their high power consumption and limited energy efficiency under low utilization is a challenge t...

Approximating Memory-bound Applications on Mobile GPUs

Maier, Daniel ; Mammeri, Nadjib ; Cosenza, Biagio ; Juurlink, Ben (2019)

Approximate computing techniques are often used to improve the performance of applications that can tolerate some amount of impurity in the calculations or data. In the context of embedded and mobile systems, a broad number of applications have exploited approximation techniques to improve performance and overcome the limited capabilities of the hardware. On such systems, even small performance...

VComputeBench: A Vulkan Benchmark Suite for GPGPU on Mobile and Embedded GPUs

Mammeri, Nadjib ; Juurlink, Ben (2018-12-13)

GPUs have become immensely important computational units on embedded and mobile devices. However, GPGPU developers are often not able to exploit the compute power offered by GPUs on these devices mainly due to the lack of support of traditional programming models such as CUDA and OpenCL. The recent introduction of the Vulkan API provides a new programming model that could be explored for GPGPU ...

Predictable GPUs Frequency Scaling for Energy and Performance

Fan, Kaijie ; Cosenza, Biagio ; Juurlink, Ben (2019)

Dynamic voltage and frequency scaling (DVFS) is an important solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies. The possibility to manually set these frequencies is a great opportunity for application tuning, which can focus on the best applicationdependent setting. Howev...

Activity and Eye Movement Analysis as Basis of Vehicle Cabin Design

Rötting, Matthias ; Rösler, Dirk ; Lohse, Katrin ; Göbel, Matthias (2000)

An inventory of different methods was developed over the last couple years to evaluate the ergonomic quality of different drivers workplaces. The cockpit of short-haul buses, long-haul buses, streetcars and harvesting machinery were evaluated. Based on this analysis criteria for the re-design could be developed. E.g. the design of the standard German short-haul buses is based on the results of ...

Efficient HEVC Decoder for Heterogeneous CPU with GPU Systems

Wang, Biao ; Alvarez-Mesa, Mauricio ; Chi, Chi Ching ; Juurlink, Ben ; Souza, Diego F. de ; Ilic, Aleksandar ; Roma, Nuno ; Sousa, Leonel (2016)

The High Efficiency Video Coding (HEVC) standard provides higher compression efficiency than other video coding standards but at the cost of increased computational load, which makes it hard to achieve real-time encoding/decoding of high-resolution, high-quality video sequences. In this paper, we investigate how Graphics Processing Units (GPUs) can be employed to accelerate HEVC decoding. GPUs ...

Enabling GPU software developers to optimize their applications – The LPGPU2approach

Juurlink, Ben ; Lucas, Jan ; Mammeri, Nadjib ; Keramidas, Georgios ; Pontzolkova, Katerina ; Aransay, Ignacio ; Kokkala, Chrysa ; Bliss, Martyn ; Richards, Andrew (2017)

Low-power GPUs have become ubiquitous, they can be found in domains ranging from wearable and mobile computing to automotive systems. With this ubiquity has come a wider range of applications exploiting low-power GPUs, placing ever increasing demands on the expected performance and power efficiency of the devices. The LPGPU 2 project is an EU-funded, Innovation Action, 30-month-project targetin...

Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU

Wang, Biao ; de Souza, Diego F. ; Álvarez-Mesa, Mauricio ; Chi, Chi Ching ; Juurlink, Ben ; Ilic, Aleksandar ; Roma, Nuno ; Sousa, Leonel (2017)

The High Efficiency Video Coding HEVC standard provides a higher compression efficiency than other video coding standards but at the cost of an increased computational load, which makes hard to achieve real-time encoding/decoding for ultra high-resolution and high-quality video sequences. Graphics Processing Units GPU are known to provide massive processing capability for highly parallel and re...

Highly Parallel HEVC Decoding for Heterogeneous Systems with CPU and GPU - Research Data

Wang, Biao ; Felix de Souza, Diego ; Alvarez-Mesa, Mauricio ; Chi, Chi Ching ; Juurlink, Ben ; Ilic, Aleksandar ; Nuno Roma, Nuno ; Sousa, Leonel (2017)

The High Efficiency Video Coding (HEVC) standard provides a higher compression efficiency than other video coding standards but at the cost of an increased computational load, which makes hard to achieve real-time encoding/decoding for ultra high-resolution and high-quality video sequences. Graphics Processing Units (GPUs) are known to provide massive processing capability for highly parallel a...

Optimal DC/AC Data Bus Inversion Coding

Lucas, Jan ; Lal, Sohan ; Juurlink, Ben (2018-10-29)

GDDR5 and DDR4 memories use data bus inversion (DBI) coding to reduce termination power and decrease the number of output transitions. Two main strategies exist for encoding data using DBI: DBI DC minimizes the number of outputs transmitting a zero, while DBI AC minimizes the number of signal transitions. We show that neither of these strategies is optimal and reduction of interface power of up...

VComputeBench: A Vulkan Benchmark Suite for GPGPU on Mobile and Embedded GPUs

Mammeri, Nadjib (2018-09-30)

GPUs have become immensely important computational units on embedded and mobile devices. However, GPGPU developers are often not able to exploit the compute power offered by GPUs on these devices mainly due to the lack of support of traditional programming models such as CUDA and OpenCL. The recent introduction of the Vulkan API provides a new programming model that could be explored for GPGPU ...

Application-Specific Cache and Prefetching for HEVC CABAC Decoding

Habermann, Philipp ; Chi, Chi Ching ; Álvarez-Mesa, Mauricio ; Juurlink, Ben (2017)

Context-based Adaptive Binary Arithmetic Coding (CABAC) is the entropy coding module in the HEVC/H.265 video coding standard. As in its predecessor, H.264/AVC, CABAC is a well-known throughput bottleneck due to its strong data dependencies. Besides other optimizations, the replacement of the context model memory by a smaller cache has been proposed for hardware decoders, resulting in an improve...