Facets of neural network complexity

Hertrich, Christoph

Facets of neural network complexity

dc.contributor.advisor	Skutella, Martin
dc.contributor.author	Hertrich, Christoph
dc.contributor.grantor	Technische Universität Berlin	en
dc.contributor.referee	Skutella, Martin
dc.contributor.referee	Pokutta, Sebastian
dc.contributor.referee	Basu, Amitabh
dc.date.accepted	2022-03-02
dc.date.accessioned	2022-03-29T13:13:06Z
dc.date.available	2022-03-29T13:13:06Z
dc.date.issued	2022
dc.description.abstract	Artificial neural networks are at the heart of some of the greatest advances in modern technology. They enable huge breakthroughs in applications ranging from computer vision via machine translation to speech recognition as well as autonomous driving and many more. However, we are still far away from a more rigorous theoretical explanation of these overwhelming success stories. Consequently, the development of a better mathematical understanding of neural networks is currently one of the hottest research topics in computer science. In this thesis we provide several contributions in that direction for the simple, but practically powerful and widely used model of feedforward neural networks with rectified linear unit (ReLU) activations. Our focus is on various notions of what we call the complexity of such neural networks: how much computing resources (time, hardware, network size, etc.) are required to achieve a certain goal? Of course, such questions can be asked in various contexts. We identify and study the following three facets of complexity for neural networks with ReLU activations. The first facet is neural networks' expressivity: What functions can be represented by certain neural network architectures? Even though this is such a fundamental question, very little is known so far. We make progress concerning the question whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). We also provide upper bounds on the number of neurons required to represent arbitrary piecewise linear functions with small-depth ReLU neural networks. The second facet is neural networks' computational power. Here, we view neural networks as a model of computation, just like Boolean, or even closer, arithmetic circuits. We then investigate which network (or circuit) size is required to solve various problems, with a focus on combinatorial optimization problems. Even though this model is quite restrictive compared to other models of computation, we are able to show that comparably small neural networks can provably solve problems like the efficiently solvable Maximum Flow Problem or the NP-hard Knapsack Problem. The third facet is neural networks' training complexity: How difficult is it to fit the weights of a neural network to training data? It is widely known that optimal solutions to the training problem are hard to obtain, which is why local optimization techniques like stochastic gradient descent are used in practice. We focus on the question whether the situation improves for fixed input dimension, leading to the paradigm of parameterized complexity analysis. We provide running time lower bounds in terms of W[1]-hardness results, proving that known brute-force strategies are essentially optimal. On the positive side, we extend a known polynomial-time algorithm for constant dimension and convex loss functions to a more general class of loss functions. The mathematical methods used in this thesis include polyhedral theory, discrete and tropical geometry, mixed-integer and combinatorial optimization, as well as tools from complexity theory.	en
dc.description.abstract	Künstliche neuronale Netze sind der Schlüssel zu einigen der größten modernen Technologiefortschritten. Sie ermöglichen Durchbrüche in Anwendungen wie Computer Vision, maschinellem Übersetzen, Spracherkennung, bis hin zu autonomem Fahren und vielem mehr. Allerdings sind wir nach wie vor weit entfernt von rigorosen theoretischen Erklärungen dieser überwältigenden Erfolge. Daher ist die Entwicklung eines besseren mathematischen Verständnisses neuronaler Netze aktuell eines der wichtigsten Forschungsthemen der Informatik. In dieser Arbeit präsentieren wir einige Fortschritte in diese Richtung für das einfache, aber aus praktischer Sicht mächtige und viel verwendete Modell der Feedforward-Netze mit ReLU-Aktivierungsfunktionen. Unser Fokus liegt auf verschiedenen Arten der Komplexität solcher neuronalen Netze: Wie viele Ressourcen (Zeit, Hardware, Netzwerkgröße etc.) werden benötigt, um ein bestimmtes Ziel zu erreichen? Selbstverständlich können solche Fragen in verschiedenen Kontexten gestellt werden. Wir identifizieren und untersuchen die folgenden drei Facetten der Komplexität für neuronale Netze mit ReLU-Aktivierungsfunktionen. Die erste Facette ist Expressivität: Welche Funktionen können von bestimmten Netzwerkarchitekturen dargestellt werden? Obwohl es sich dabei um eine so fundamentale Frage handelt, ist bisher wenig bekannt. Wir präsentieren Fortschritte bezüglich der Frage, ob die Menge der exakt darstellbaren Funktionen echt größer wird, wenn wir mehr Schichten hinzufügen (ohne dabei die Größe zu beschränken). Wir beweisen außerdem obere Schranken an die Anzahl benötigter Neuronen, um beliebige stückweise lineare Funktionen mit neuronalen Netzen mit geringer Tiefe zu berechnen. Die zweite Facette ist Berechnungskomplexität (Computational Power). Dafür betrachten wir neuronale Netze als ein Berechenbarkeitsmodell, ähnlich wie etwa Boolesche oder arithmetische Schaltkreise. Wir untersuchen, welche Netzwerkgröße nötig ist, um verschiedene Probleme, insbesondere aus der kombinatorischen Optimierung, zu lösen. Trotz der starken Eingeschränktheit dieses Modells zeigen wir, dass vergleichsweise kleine Netze beweisbar Lösungen für Probleme wie das Maximalflussproblem oder das NP-schwere Rucksackproblem finden können. Die dritte Facette ist Trainingskomplexität: Wie schwer ist es, die Gewichte eines neuronalen Netzes an Trainingsdaten anzupassen? Bekanntermaßen ist das Finden exakter Lösungen ein schweres Problem, weshalb in der Praxis lokale Optimierungsverfahren wie etwa stochastischer Gradientenabstieg verwendet werden. Wir beschäftigen uns damit, ob sich die Lage für fixe Dimension verbessert, was uns zum Paradigma der parametrisierten Komplexität führt. Wir beweisen untere Laufzeitschranken in Form von W[1]-Härteresultaten, woraus folgt, dass bekannte Brute-Force-Strategien im Wesentlichen optimal sind. Auf der positiven Seite erweitern wir einen bekannten Polynomialzeitalgorithmus für konstante Dimension und konvexe Lossfunktionen auf eine allgemeinere Klasse von Lossfunktionen. Die in dieser Arbeit verwendeten mathematischen Methoden umfassen Polyedertheorie, diskrete und tropische Geometrie, gemischt-ganzzahlige und kombinatorische Optimierung sowie Tools aus der Komplexitätstheorie.	de
dc.description.sponsorship	DFG, 385256563, GRK 2434: Facetten der Komplexität	en
dc.identifier.uri	https://depositonce.tu-berlin.de/handle/11303/16494
dc.identifier.uri	http://dx.doi.org/10.14279/depositonce-15271
dc.language.iso	en	en
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en
dc.subject.ddc	510 Mathematik	de
dc.subject.other	neural networks	en
dc.subject.other	expressivity	en
dc.subject.other	computational power	en
dc.subject.other	combinatorial optimization	en
dc.subject.other	training complexity	en
dc.subject.other	neuronale Netze	de
dc.subject.other	Expressivität	de
dc.subject.other	Berechnungskomplexität	de
dc.subject.other	Kombinatorische Optimierung	de
dc.subject.other	Trainingskomplexität	de
dc.title	Facets of neural network complexity	en
dc.title.translated	Facetten der Komplexität für neuronale Netze	de
dc.type	Doctoral Thesis	en
dc.type.version	acceptedVersion	en
tub.accessrights.dnb	free	en
tub.affiliation	Fak. 2 Mathematik und Naturwissenschaften::Inst. Mathematik::FG Kombinatorische Optimierung und Graphenalgorithmen	de
tub.affiliation.faculty	Fak. 2 Mathematik und Naturwissenschaften	de
tub.affiliation.group	FG Kombinatorische Optimierung und Graphenalgorithmen	de
tub.affiliation.institute	Inst. Mathematik	de
tub.publisher.universityorinstitution	Technische Universität Berlin	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: hertrich_christoph.pdf
Size:: 1.4 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 4.86 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Publications