Inst. Technische Informatik und Mikroelektronik

112 Items

Recent Submissions
TACO: A scheduling scheme for parallel applications on multicore architectures

Schönherr, Jan H. ; Juurlink, Ben ; Richling, Jan (2014)

While multicore architectures are used in the whole product range from server systems to handheld computers, the deployed software still undergoes the slow transition from sequential to parallel. This transition, however, is gaining more and more momentum due to the increased availability of more sophisticated parallel programming environments. Combined with the ever increasing complexity of mu...

Reducing HEVC encoding complexity using two-stage motion estimation

Cebrián-Márquez, Gabriel ; Chi, Ching Chi ; Martínez, José Luis ; Cuenca, Pedro ; Sanz-Rodríguez, Sergio ; Álvarez Mesa, Mauricio ; Juurlink, Ben (2015)

We propose a technique for optimizing the High Efficiency Video Coding (HEVC) encoder by reducing the number of operations performed in the motion estimation stage. The technique is based on the fact that a significant number of motion estimation operations are performed repetitively for the same image samples, but for different block partition sizes. By decoupling the initial motion estimation...

Protective redundancy overhead reduction using instruction vulnerability factor

Borodin, Demid ; Juurlink, Ben (2010)

Due to modern technology trends, fault tolerance (FT) is acquiring an ever increasing research attention. To reduce the overhead introduced by the FT features, several techniques have been proposed. One of these techniques is Instruction-Level Fault Tolerance Configurability (ILCOFT). ILCOFT enables application developers to protect different instructions at varying degrees, devoting more resou...

Parallel scalability and efficiency of HEVC parallelization approaches

Chi, Ching Chi ; Álvarez-Mesa, Mauricio ; Juurlink, Ben ; Clare, Gordon ; Henry, Félix ; Pateux, Stéphane ; Thomas, Schierl (2012)

Unlike H.264/advanced video coding, where parallelism was an afterthought, High Efficiency Video Coding currently contains several proposals aimed at making it more parallel-friendly. A performance comparison of the different proposals, however, has not yet been performed. In this paper, we will fill this gap by presenting efficient implementations of the most promising parallelization proposal...

HEVC real-time decoding

Bross, Benjamin ; Álvarez-Mesa, Mauricio ; George, Valeri ; Chi, Chi Ching ; Mayer, Tobias ; Juurlink, Ben ; Schierl, Thomas (2013)

The new High Efficiency Video Coding Standard (HEVC) was finalized in January 2013. Compared to its predecessor H.264 / MPEG4-AVC, this new international standard is able to reduce the bitrate by 50% for the same subjective video quality. This paper investigates decoder optimizations that are needed to achieve HEVC real-time software decoding on a mobile processor. It is shown that HEVC real-ti...

Two-level sliding-window VBR control algorithm for video on demand streaming

de-Frutos-López, Manuel ; González-de-Suso, José Luis ; Sanz-Rodríguez, Sergio ; Peláez-Moreno, Carmen ; Díaz-de-María, Fernando (2015)

A two-level variable bit rate (VBR) control algorithm for hierarchical video coding, specifically tailored for the new High Efficiency Video Coding (HEVC) standard, is presented here. A long-term level monitors the current bit count along a sliding window of a few seconds, comprising several intra-periods (IPs) and shifted on an IP basis. This long-term view allows the accommodation of the natu...

Parallel HEVC Decoding on Multi- and Many-core Architectures

Chi, Chi Ching ; Álvarez-Mesa, Mauricio ; Lucas, Jan ; Juurlink, Ben ; Schierl, Thomas (2013)

The Joint Collaborative Team on Video Decoding is developing a new standard named High Efficiency Video Coding (HEVC) that aims at reducing the bitrate of H.264/AVC by another 50 %. In order to fulfill the computational demands of the new standard, in particular for high resolutions and at low power budgets, exploiting parallelism is no longer an option but a requirement. Therefore, HEVC includ...

Extending the Cell SPE with Energy Efficient Branch Prediction

Briejer, Martijn ; Meenderinck, Cor ; Juurlink, Ben (2010)

Energy-efficient dynamic branch predictors are proposed for the Cell SPE, which normally depends on compiler-inserted hint instructions to predict branches. All designed schemes use a Branch Target Buffer (BTB) to store the branch target address and the prediction, which is computed using a bimodal counter. One prediction scheme pre-decodes instructions when they are fetched from the local stor...

An Optimized Parallel IDCT on Graphics Processing Units

Wang, Biao ; Álvarez-Mesa, Mauricio ; Chi, Chi Ching ; Juurlink, Ben (2013)

In this paper we present an implementation of the H.264/AVC Inverse Discrete Cosine Transform (IDCT) optimized for Graphics Processing Units (GPUs) using OpenCL. By exploiting that most of the input data of the IDCT for real videos are zero valued coefficients a new compacted data representation is created that allows for several optimizations. Experimental evaluations conducted on different GP...

An Instruction to Accelerate Software Caches

Azevedo, Arnaldo ; Juurlink, Ben (2011)

In this paper we propose an instruction to accelerate software caches. While DMAs are very efficient for predictable data sets that can be fetched before they are needed, they introduce a large latency overhead for computations with unpredictable access behavior. Software caches are advantageous when the data set is not predictable but exhibits locality. However, software caches also incur a la...

Design and Implementation of a High-Throughput CABAC Hardware Accelerator for the HEVC Decoder

Habermann, Philipp (2014)

HEVC is the new video coding standard of the Joint Collaborative Team on Video Coding. As in its predecessor H.264/AVC, Context-based Adaptive Binary Arithmetic Coding (CABAC) is a throughput bottleneck. This paper presents a hardware acceleration approach for transform coefficient decoding, the most time consuming part of CABAC in HEVC. In addition to a baseline design, a pipelined architectur...

A High-Performance Hardware Accelerator for HEVC Motion Compensation

Göbel, Matthias (2014)

The presented master’s thesis has focused on the design and implementation of a motion compensation hardware accelerator for use in HEVC hybrid decoders, i.e. decoders that contain hardware as well as software parts. As the motion compensation is the most time consuming step in the decoding process it is crucial to implement it in a fast and efficient way. This paper elaborates the theoretical ...

An efficient and flexible FPGA implementation of a face detection system

Fekih, Hichem Ben ; Elhossini, Ahmed ; Juurlink, Ben (2015)

This paper proposes a hardware architecture based on the object detection system of Viola and Jones using Haar-like features. The proposed design is able to discover faces in real-time with high accuracy. Speed-up is achieved by exploiting the parallelism in the design, where multiple classifier cores can be added. To maintain a flexible design, classifier cores can be assigned to different ima...

Traffic Prediction for NoCs using Fuzzy Logic

Thomas, Gervin ; Juurlink, Ben ; Tutsch, Dietmar (2011)

Networks on Chip provide faster communication and higher throughput for chip multiprocessor systems than conventional bus systems. Having multiple processing elements on one chip, however, leads to a large number of message transfers in the NoC. The consequence is that more blocking occurs and time and power is wasted with waiting until the congestion is dissolved. With knowledge of future comm...

A parallel H.264/SVC encoder for high definition video conferencing

Sanz-Rodríguez, Sergio ; Álvarez-Mesa, Mauricio ; Mayer, Tobias ; Schierl, Tobias (2015)

In this paper we present a video encoder specially developed and configured for high definition (HD) video conferencing. This video encoder brings together the following three requirements: H.264/Scalable Video Coding (SVC), parallel encoding on multicore platforms, and parallel-friendly rate control. With the first requirement, a minimum quality of service to every end-user receiver over Inter...

Leveraging problem structure in interactive perception for robot manipulation of constrained mechanisms

Martín-Martín, Roberto (2018)

In this thesis we study robot perception to support a specific type of manipulation task in unstructured environments, the mechanical manipulation of kinematic degrees of freedom. In these tasks the goal of the robot is to create controlled motion, i.e. to change configuration of the kinematic degrees of freedom (DoF) of the objects in the environment. Often, the environment contains articulate...

Enhancing the scalability of many-core systems towards utilizing fine-grain parallelism in task-based programming models

Dallou, Tamer (2017)

In the past few years, it has been foreseeable that Moore's law is coming to an end. This law, based on the observation that the number of transistors in an integrated chip doubles every 18-24 months, served as a roadmap for the semiconductors industry. On the verge of its end due to the huge increase in integrated chips power density, a new era in computing systems has begun. In this era, a co...

A reconfigurable architecture for real-time image compression on-board satellites

Manthey, Kristian (2017)

Data products of optical remote sensing systems are increasingly used in many areas of our everyday life. The spatial as well as the spectral resolution of satellite image data increases steadily with new missions resulting in a higher precision of known procedures and new application scenarios. While the memory capacity requirements can still be fulfilled, the transmission capacity becomes inc...

libWater: heterogeneous distributed computing made easy

Grasso, Ivan ; Pellegrini, Simone ; Cosenza, Biagio ; Fahringer, Thomas (2013)

Clusters of heterogeneous nodes composed of multi-core CPUs and GPUs are increasingly being used for High Performance Computing (HPC) due to the benefits in peak performance and energy efficiency. In order to fully harvest the computational capabilities of such architectures, application developers often employ a combination of different parallel programming paradigms (e.g. OpenCL, CUDA, MPI an...

Low-power high-efficiency video decoding using general purpose processors

Chi, Chi Ching ; Álvarez-Mesa, Mauricio ; Juurlink, Ben (2015)

In this article, we investigate how code optimization techniques and low-power states of general-purpose processors improve the power efficiency of HEVC decoding. The power and performance efficiency of the use of SIMD instructions, multicore architectures, and low-power active and idle states are analyzed in detail for offline video decoding. In addition, the power efficiency of techniques suc...