FG Datenbanksysteme und Informationsmanagement (DIMA)

9 Items

Recent Submissions
Representations and optimizations for embedded parallel dataflow languages

Alexandrov, Alexander (2019)

Parallel dataflow engines such as Apache Hadoop, Apache Spark, and Apache Flink have emerged as an alternative to relational databases more suitable for the needs of modern data analysis applications. One of the main characteristics of these systems is their scalable programming model, based on distributed collections and parallel transformations. Notable examples are Flink’s DataSet and Spark’...

Benchmarking dataflow systems for scalable machine learning

Boden, Christoph (2018)

The popularity of the world wide web and its ubiquitous global online services have led to unprecedented amounts of available data. Novel distributed data processing systems have been developed in order to scale out computations and analysis to such massive data set sizes. These "Big Data Analytics" systems are also popular choices to scale out the execution of machine learning algorithms. Howe...

Data processing on heterogeneous hardware

Heimel, Max (2018)

The primary objective of data processing research on modern hardware is to understand how to utilize emerging technology to process data efficiently. Over the last decades, Software Engineers and Computer Scientists have made significant progress towards this goal, providing highly-tuned algorithms, systems & mechanisms for a wide variety of different device types. However, while we mostly unde...

Visualization-driven data aggregation

Jugel, Uwe (2017)

Visual analysis of high-volume numerical data is traditionally required for understanding sensor data in manufacturing and engineering scenarios. Today, the visual analysis of any kind of big data has become ubiquitous and is a most-wanted feature for data analysis tools. It is vital for commerce, finance, sales, and an ever-growing number of industries, whose data are traditionally stored in r...

Identifier Gold Standard for NTCIR 11 Math Wikipedia Dataset

Schubotz, Moritz (2016-07-18)

Mathematical formulae are essential in science, but face challenges of ambiguity, due to the use of a small number of identifiers to represent an immense number of concepts. Corresponding to word sense disambiguation in Natural Language Processing, we disambiguate mathematical identifiers. By regarding formulae and natural text as one monolithic information source, we are able to extract the se...

Augmenting mathematical formulae for more effective querying & efficient presentation

Schubotz, Moritz (2017)

Mathematical Information Retrieval (MIR) is a research area that focuses on the Information Need (IN) of the Science, Technology, Engineering and Mathematics (STEM) domain. Unlike traditional Information Retrieval (IR) research, that extracts information from textual data sources, MIR takes mathematical formulae into account as well. This thesis makes three main contributions: 1. It analyses th...

Exploratory relation extraction in large multilingual data

Akbik, Alan (2016)

The task of Relation Extraction (RE) is concerned with creating extractors that automatically find structured, relational information in unstructured data such as natural language text. Motivated by an explosion of sources of readily available text data such as the Web, RE offers intriguing possibilities for querying, organizing, and analyzing information by drawing upon the clean semantics of ...

Specification and optimization of analytical data flows

Hüske, Fabian (2016)

In the past, the majority of data analysis use cases was addressed by aggregating relational data. Since a few years, a trend is evolving, which is called “Big Data” and which has several implications on the field of data analysis. Compared to previous applications, much larger data sets are analyzed using more elaborate and diverse analysis methods such as information extraction techniques, da...

Making math searchable in Wikipedia

Schubotz, Moritz (2012)

Wikipedia, the world largest encyclopedia contains a lot of knowledge that is expressed as formulae exclusively. Unfortunately, this knowledge is currently not fully accessible by intelligent information retrieval systems. This immense body of knowledge is hidden form value-added services, such as search. In this paper, we present our MathSearch implementation for Wikipedia that enables users t...