Inst. Technische Informatik und Mikroelektronik

148 Items

Recent Submissions
EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction

Stahl, Kolja ; Schneider, Michael ; Brock, Oliver (2017-06-17)

Background Accurately predicted contacts allow to compute the 3D structure of a protein. Since the solution space of native residue-residue contact pairs is very large, it is necessary to leverage information to identify relevant regions of the solution space, i.e. correct contacts. Every additional source of information can contribute to narrowing down candidate regions. Therefore, recent met...

Enabling GPU software developers to optimize their applications – The LPGPU2approach

Juurlink, Ben ; Lucas, Jan ; Mammeri, Nadjib ; Keramidas, Georgios ; Pontzolkova, Katerina ; Aransay, Ignacio ; Kokkala, Chrysa ; Bliss, Martyn ; Richards, Andrew (2017)

Low-power GPUs have become ubiquitous, they can be found in domains ranging from wearable and mobile computing to automotive systems. With this ubiquity has come a wider range of applications exploiting low-power GPUs, placing ever increasing demands on the expected performance and power efficiency of the devices. The LPGPU 2 project is an EU-funded, Innovation Action, 30-month-project targetin...

Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU

Wang, Biao ; de Souza, Diego F. ; Álvarez-Mesa, Mauricio ; Chi, Chi Ching ; Juurlink, Ben ; Ilic, Aleksandar ; Roma, Nuno ; Sousa, Leonel (2017)

The High Efficiency Video Coding HEVC standard provides a higher compression efficiency than other video coding standards but at the cost of an increased computational load, which makes hard to achieve real-time encoding/decoding for ultra high-resolution and high-quality video sequences. Graphics Processing Units GPU are known to provide massive processing capability for highly parallel and re...

Highly Parallel HEVC Decoding for Heterogeneous Systems with CPU and GPU - Research Data

Wang, Biao ; Felix de Souza, Diego ; Alvarez-Mesa, Mauricio ; Chi, Chi Ching ; Juurlink, Ben ; Ilic, Aleksandar ; Nuno Roma, Nuno ; Sousa, Leonel (2017)

The High Efficiency Video Coding (HEVC) standard provides a higher compression efficiency than other video coding standards but at the cost of an increased computational load, which makes hard to achieve real-time encoding/decoding for ultra high-resolution and high-quality video sequences. Graphics Processing Units (GPUs) are known to provide massive processing capability for highly parallel a...

Optimal DC/AC Data Bus Inversion Coding

Lucas, Jan ; Lal, Sohan ; Juurlink, Ben (2018-10-29)

GDDR5 and DDR4 memories use data bus inversion (DBI) coding to reduce termination power and decrease the number of output transitions. Two main strategies exist for encoding data using DBI: DBI DC minimizes the number of outputs transmitting a zero, while DBI AC minimizes the number of signal transitions. We show that neither of these strategies is optimal and reduction of interface power of up...

VComputeBench: A Vulkan Benchmark Suite for GPGPU on Mobile and Embedded GPUs

Mammeri, Nadjib (2018-09-30)

GPUs have become immensely important computational units on embedded and mobile devices. However, GPGPU developers are often not able to exploit the compute power offered by GPUs on these devices mainly due to the lack of support of traditional programming models such as CUDA and OpenCL. The recent introduction of the Vulkan API provides a new programming model that could be explored for GPGPU ...

A regularized fusion based 3D reconstruction framework

Rajput, Muhammad Asif Ali (2018)

Recent developments in depth sensing technologies enabled mobile robots to perceive surroundings with high accuracy. Robotic applications, equipped with depth perception technology, enable the capability of autonomous navigation to self-driving cars, assist in critical surgical procedures, or reconstruct the 3D model of a potentially hazardous environment. There exists a variety of 3D sensors r...

VComputeBench: A Vulkan Benchmark Suite for GPGPU on Mobile and Embedded GPUs

Mammeri, Nadjib ; Juurlink, Ben (2018)

GPUs have become immensely important computational units on embedded and mobile devices. However, GPGPU developers are often not able to exploit the compute power offered by GPUs on these devices mainly due to the lack of support of traditional programming models such as CUDA and OpenCL. The recent introduction of the Vulkan API provides a new programming model that could be explored for GPGPU ...

Application-Specific Cache and Prefetching for HEVC CABAC Decoding

Habermann, Philipp ; Chi, Chi Ching ; Álvarez-Mesa, Mauricio ; Juurlink, Ben (2017)

Context-based Adaptive Binary Arithmetic Coding (CABAC) is the entropy coding module in the HEVC/H.265 video coding standard. As in its predecessor, H.264/AVC, CABAC is a well-known throughput bottleneck due to its strong data dependencies. Besides other optimizations, the replacement of the context model memory by a smaller cache has been proposed for hardware decoders, resulting in an improve...

E²MC: Entropy Encoding Based Memory Compression for GPUs

Lal, Sohan ; Lucas, Jan ; Juurlink, Ben (2017)

Modern Graphics Processing Units (GPUs) provide much higher off-chip memory bandwidth than CPUs, but many GPU applications are still limited by memory bandwidth. Unfortunately, off-chip memory bandwidth is growing slower than the number of cores and has become a performance bottleneck. Thus, optimizations of effective memory bandwidth play a significant role for scaling the performance of GPUs....

Bridging the virtual world and the physical world with optically dynamic interfaces

Lindlbauer, David (2018)

In the virtual world, changing properties of objects such as their color, size or shape is one of the main means of communication. Objects are hidden or revealed when needed, or undergo changes in color or size to communicate importance. Users are in full control over how the virtual world looks and behaves. With augmented reality, virtual content is overlaid over the physical world to display ...

ALUPower: Data Dependent Power Consumption in GPUs - Research Data

Lucas, Jan ; Juurlink, Ben (2016)

Existing architectural power models for GPUs count activities such as executing floating point or integer instructions, but do not consider the data values processed. While data value dependent power consumption can often be neglected when performing architectural simulations of high performance Out-of-Order (OoO) CPUs, in our related paper we show that this approach is invalid for estimating t...

Real-Time Vision System for License Plate Detection and Recognition on FPGA

Rosli, Faird ; Elhossini, Ahmed ; Juurlink, Ben (2015)

Rapid development of the Field Programmable Gate Array (FPGA) offers an alternative way to provide acceleration for computationally intensive tasks such as digital signal and image processing. Its ability to perform parallel processing shows the potential in implementing a high speed vision system. Out of numerous applications of computer vision, this paper focuses on the hardware implementatio...

High performance CCSDS image data compression using GPGPUs for space applications

Ramanarayanan, Sunil Chokkanathapuram ; Manthey, Kristian ; Juurlink, Ben (2015)

The usage of graphics processing units (GPUs) as computing architectures for inherently data parallel signal processing applications in this computing era is very popular. In principle, GPUs in comparison with central processing units (CPUs) could achieve significant speed-up over the latter, especially considering data parallel applications which expect high throughput. The paper investigates ...

Proximity Scheme for Instruction Caches in Tiled CMP Architectures

Alawneh, Tareq ; Chi, Chi Ching ; Elhossini, Ahmed ; Juurlink, Ben (2015)

Recent research results show that there is a high degree of code sharing between cores in multi-core architectures. In this paper we propose a proximity scheme for the instruction caches, a scheme in which the shared code blocks among the neighbouring L2 caches in tiled multi-core architectures are exploited to reduce the average cache miss penalty and the on-chip network traffic. We evaluate t...

A Benchmark Suite for Evaluating Parallel Programming Models

Andersch, Michael ; Juurlink, Ben ; Chi, Chi Ching (2011)

The transition to multi-core processors enforces software developers to explicitly exploit thread-level parallelism to increase performance. The associated programmability problem has led to the introduction of a plethora of parallel programming models that aim at simplifying software development by raising the abstraction level. Since industry has not settled for a single model, however, multi...

Spectral turning bands for efficient Gaussian random fields generation on GPUs and accelerators

Hunger, Lars ; Cosenza, Biagio ; Kimeswenger, Stefan ; Fahringer, Thomas (2015)

A random field (RF) is a set of correlated random variables associated with different spatial locations. RF generation algorithms are of crucial importance for many scientific areas, such as astrophysics, geostatistics, computer graphics, and many others. Current approaches commonly make use of 3D fast Fourier transform (FFT), which does not scale well for RF bigger than the available memory; t...

An evaluation of current SIMD programming models for C++

Pohl, Angela ; Cosenza, Biagio ; Álvarez-Mesa, Mauricio ; Chi, Chi Ching ; Juurlink, Ben (2016)

SIMD extensions were added to microprocessors in the mid '90s to speed-up data-parallel code by vectorization. Unfortunately, the SIMD programming model has barely evolved and the most efficient utilization is still obtained with elaborate intrinsics coding. As a consequence, several approaches to write efficient and portable SIMD code have been proposed. In this work, we evaluate current progr...

A Quantitative Analysis of the Memory Architecture of FPGA-SoCs

Göbel, Matthias ; Elhossini, Ahmed ; Chi, Chi Ching ; Álvarez-Mesa, Mauricio ; Juurlink, Ben (2017)

In recent years, so called FPGA-SoCs have been introduced by Intel (formerly Altera) and Xilinx. These devices combine multi-core processors with programmable logic. This paper analyzes the various memory and communication interconnects found in actual devices, particularly the Zynq-7020 and Zynq-7045 from Xilinx and the Cyclone V SE SoC from Intel. Issues such as different access patterns, cac...

The LPGPU2 Project: Low-Power Parallel Computing on GPUs

Juurlink, Ben ; Lucas, Jan ; Mammeri, Nadjib ; Bliss, Martyn ; Keramidas, Georgios ; Kokkala, Chrysa ; Richards, Andrew (2017)

The LPGPU2 project is a 30-month-project (Innovation Action) funded by the European Union. Its overall goal is to develop an analysis and visualization framework that enables GPU application developers to improve the performance and power consumption of their applications. To achieve this overall goal, several key objectives need to be achieved. First, several applications (use cases) need to b...