Comparing Temporal Graphs Using Dynamic Time Warping

The connections within many real-world networks change over time. Thus, there has been a recent boom in studying temporal graphs. Recognizing patterns in temporal graphs requires a similarity measure to compare different temporal graphs. To this end, we initiate the study of dynamic time warping (an established concept for mining time series data) on temporal graphs. We propose the dynamic temporal graph warping distance (dtgw) to determine the (dis-)similarity of two temporal graphs. Our novel measure is flexible and can be applied in various application domains. We show that computing the dtgw-distance is a challenging (NP-hard) optimization problem and identify some polynomial-time solvable special cases. Moreover, we develop a quadratic programming formulation and an efficient heuristic. Preliminary experiments indicate that the heuristic performs very well and that our concept yields meaningful results on real-world instances.


Introduction
A fundamental concept for pattern recognition is the concept of (dis)similarity between objects. For objects that are represented by numerical feature vectors, there exist a lot of well-known (dis)similarity functions such as p -norms or positive semi-definite kernels.
In structural pattern recognition, objects are often more naturally represented by complex (discrete) data structures such as graphs, strings or time series. For these representations, one can often not simply use vector-based (dis)similarity measures. Instead, one needs to define suitable domain-specific (dis)similarity functions such as the edit distance on graphs and strings or the dynamic time warping distance on time series.
The majority of graph (dis)similarity functions, focuses on static graphs such as the graph edit distance [20], graph kernels [6], and geometric graph distances [12]. However, many complex systems are not static as the links between entities dynamically change over time. Such temporal networks can be represented by a series of temporal edges between a fixed set of vertices. Examples are face-to-face proximity networks, flight traffic networks, temporal attack networks in computer security, or protein-protein-interaction networks in biology [9,15,17]. Thus, there is a steadily growing research interest in analyzing temporal networks [26]. In order to perform data mining tasks such as classification or clustering on temporal networks, one needs to find suitable (dis)similarity functions.
We introduce a novel (dis)similarity measure on temporal graphs based on dynamic time warping, called dynamic temporal graph warping. Thus, by combining established methods from graph-based pattern recognition and time series data mining in a nontrivial way, we obtain a suitable tool to analyze temporal network data. Beyond that, we study its computational complexity, develop efficient algorithms and study their behavior on real-world data, the latter confirming the practical usefulness.
Related Work. There are numerous approaches to define graph (dis)similarity measures. A well-known example is the (NP-hard) graph edit distance [20]. Graph kernels (many of which are polynomial-time computable) are another well-studied class [6,8]. Measuring graph distance based on vertex mappings using local vertex signatures was introduced by Jouili and Tabbone [14]. The idea of using vertex mappings can also be found in optimal assignment kernels [2,5,16]. Regarding (dis)similarity measures on temporal graphs, seemingly little work has been done so far. Elhesha et al. [4] recently described an approach based on vertex mappings. Their method, however, does not allow for a flexible alignment between time layers. Dynamic Time Warping [21] is an established measure for mining time series data [19,22,24] which is specifically designed to cope with temporal distortion in the data via nonlinear alignment of observations. We lift this approved concept to the domain of temporal graphs.
Our Contributions. We define the dynamic temporal graph warping distance as a twofold discrete minimization problem involving computation of an optimal vertex mapping and an optimal warping path (see Section 3). We prove that it is NP-hard to solve in general (Theorem 4.1). In contrast, we point out several polynomial-time solvable special cases. Namely, the case when either a vertex mapping or a warping path is fixed (Observation 3.1), the case of deciding whether the dtgw-distance is zero (Theorem 5.1), and the case when the lifetimes of the two temporal graphs differ only by a constant and the warping path length is restricted (Proposition 5.2). Moreover, we give a quadratic programming formulation (Section 5.1) and propose an efficient heuristic approach (Section 5.2).
We evaluate the heuristic in some experiments on real-world data to show its efficiency and quality of solution (Section 6).
Organization. In Section 2 we introduce basic definitions. Section 3 contains our main definition of the dtgw-distance followed by some computational hardness results in Section 4 and algorithmic results in Section 5. Finally, Section 6 presents experimental results on some real-world data.

Preliminaries
For T ∈ N, we define [T ] := {1, 2, . . . , T }. For a set S, we denote the set of all size-k subsets of S by S k .
Temporal Graphs. A temporal graph G = (V, E 1 , E 2 , . . . , E T ) consists of a vertex set V and a sequence of T ≥ 1 edge sets E i ⊆ V 2 . By G i = (V, E i ), we denote the i th layer of G and we call T the lifetime of G. The underlying graph of G is the graph (V, T i=1 E i ). We remark that all definitions and results in this work can easily be extended to labeled temporal graphs (with vertex and/or edge labels).
Vertex Mapping. A vertex mapping between two vertex sets V and W is a set M ⊆ V × W containing min(|V |, |W |) tuples such that for all x ∈ V ∪W it holds that |{m ∈ M : x ∈ m}| ≤ 1.
We denote the set of all vertex mappings between V and W by M(V, W ). Let V M ⊆ V be the subset of vertices in V that are contained in some tuple of M (W M ⊆ W is defined analogously).
Assignment Problem. The assignment problem is a fundamental problem in combinatorial optimization. Given two sets A and B of equal size and a cost function c : A × B → Q, the goal is to find a bijection π : A → B such that a∈A c(a, π(a)) is minimized. It is well known that the assignment problem can be described as an integer linear program and is solvable in Dynamic Time Warping. The dynamic time warping distance [21] is a distance between time series. It is based on the concept of a warping path. A warping path of order n × m is a set p = {p 1 , . . . , p L } of L ≥ 1 pairs p = (i , j ) such that • p 1 = (1, 1) and p L = (n, m), and • p +1 ∈ {(i + 1, j + 1), (i , j + 1), (i + 1, j)} for all 1 ≤ < L. We denote the set of all warping paths of order n × m by P n,m . For two temporal graphs G = (V, E 1 , . . . , E T ), H = (W, F 1 , . . . , F U ), every order-(T × U ) warping path p defines a warping between G and H, that is, a pair (i, j) ∈ p warps the layer G i to H j .
Parameterized Complexity. We assume the reader to be familiar with basic concepts of computational complexity theory such as NP-completeness. In parameterized complexity theory [3] one considers running times with respect to two dimensions. One dimension is the size of the input instance x and the other dimensions is a parameter k (usually a numerical value). An instance of a parameterized problem is a pair (x, k). The class FPT contains all fixed-parameter tractabale parameterized problems, that is, they can be solved in time f (k) · poly(|x|) for some computable function f only depending on k. The class XP contains all parameterized problems that can be solved in polynomial time for every constant parameter value, that is, in time |x| f (k) (clearly, FPT ⊆ XP).

Dynamic Temporal Graph Warping (DTGW)
In this section we define a temporal graph distance based on dynamic time warping using a vertex-signature-based graph distance as local cost function. We choose this graph distance for the following reasons. First, it is computationally tractable (in comparison to the NP-hard graph edit distance). Second, it is based on a mapping between the two vertex sets (possibly of different size) which might be reasonable in many temporal network applications since this allows to enforce a consistency over time. Third, vertex signatures allow for a high flexibility since they can be chosen arbitrarily (as can the metric) in order to incorporate essential information for the application at hand (e.g. they can be used for weighted or labeled temporal graphs).
Graph Distance Based on Vertex Signatures. The following approach is due to Jouili and Tabbone [14]. For a (static) graph G = (V, E), a vertex signature function f G : V → Q k encodes (local) information about a vertex (e.g. its degree). Let d : Q k × Q k → Q be a metric.
For two (static) graphs G = (V, E) and H = (W, F ) with vertex signatures f G : V → Q k and f H : W → Q k and a given vertex mapping M between V and W , we define the cost of M as where ∆ G (v) ∈ Q is the (predefined) cost of "deleting" vertex v from G since it is not mapped by M to any vertex in the other vertex set. The value ∆ G (v) might for example depend on the vertex signature of v.
The vertex-signature-based distance between G and H is then defined as Depending on the application, one might normalize the distance D by some appropriate factor (typically depending on |V | and |W |, e.g. Jouili and Tabbone [14] normalize by min(|V |, |W |) −1 ).
Throughout this work, we assume that vertex signature functions f G are computable in polynomial time in the size of G and we assume all metrics d to be polynomial-time computable. We neglect the running times for computing the values of f G and d (we can actually assume that all vertex signatures are precomputed once in polynomial time).
Dynamic Time Warping Distance for Temporal Graphs. We transfer the concept of dynamic time warping to temporal graphs in the following way. Let G = (V, E 1 , . . . , E T ) and H = (W, F 1 , . . . , F U ) be two temporal graphs and let f G 1 , . . . , f G T : V → Q k and f H 1 , . . . , f H U : W → Q k be corresponding vertex signature functions.
We define the vertex-signature-based dynamic temporal graph warping distance (dtgw-distance) between G and H as   The right-hand side of the above equation can be computed by a well-known dynamic program for dynamic time warping in O(T · U · n) time [21]. Here O(n) is the time required to compute C(G i , H j , M ). where the vertex signatures are their degrees and the metric is the absolute value of the difference. For example the costs of warping G 1 to H 1 is 2, as the green and yellow vertex each have degree two in G 1 but only degree one in H 1 . The resulting dtgw-distance is dtgw(G, H) = 2 + 0 + 0 + 2 + 2 + 2 + 4 = 12.
ii) Let p ∈ P T,U be a fixed warping path. Assume without loss of generality that |V | ≤ |W | and let V : Then, we have dtgw(G, H) = min Note that the vertex mapping M defines a bijection between V and W . Hence, computing dtgw(G, H) is an assignment problem solvable in O(n 3 ) time [1,Theorem 12.2]. Computing all values σ(u, v) can be done in O(n 2 · |p|) time.
Note that Observation 3.1 implies that if we already know the vertex mapping up to a constant number of vertices, then dtgw can be computed in polynomial time (since we can try out all polynomially many possible vertex mappings).
For given vertex signature function and metric, we refer to the decision problem of testing whether two temporal graphs have dynamic temporal graph warping distance at most some given c by DTGW.
By Observation 3.1, DTGW is polynomial-time solvable if one temporal graph has a constant lifetime or a constant number of vertices since there are only polynomially many possible warping paths or polynomially many vertex mappings.

Computational Hardness
Even though the dynamic time warping distance and the vertex-signature-based graph distance are both computable in polynomial time, their combined application to temporal graphs yields a distance measure that is generally NP-hard to compute: Theorem 4.1. DTGW is NP-complete for every metric when the vertex signatures are vertex degrees.
Proof. DTGW is clearly contained in NP since for a given vertex mapping and warping path (both having polynomial size), one can check in polynomial time whether the dtgw-distance is at most c (also see Observation 3.1).
To show NP-hardness, we give a polynomial-time reduction from 3-SAT. Let d : Q × Q → Q be any metric and let φ = C 1 ∧ . . . ∧ C m be an instance of 3-SAT over the variables x 1 , . . . , x n . Each clause C j is then a disjunction of three literals C j =: We may assume m > 8. Our idea is to represent each literal by a vertex which can be mapped to either (true) or ⊥ (false). We then build, for each clause, a clause box gadget consisting of three consecutive layers. The choice of warping path will then, for each clause, implicitly select one of its literals and the costs caused by each clause box will attain their minimum value if and only if that particular literal is mapped to .
Henceforth the details. Let D and D be two copies of the graph 22m i=1 K 2 (consisting of 22m disjoint edges), where for each vertex v ∈ V (D) we denote its copy in V (D ) by v . We construct two temporal graphs G and H. Their vertex sets each contain the following 2n+47m+8 vertices.
Both temporal graphs have 2n + 26m layers defined as follows. For each i ∈ [n], we set Finally, for j ∈ [22m], we set Only relevant vertices are shown in each layer.
"⇐": Given a satisfying assignment β : {x 1 , . . . , x n } → {true, false} of φ, we define the following vertex mapping To construct a warping path, we begin by defining, for each j ∈ [m], the following three Figure 3: The three possible warpings between layers of a clause block. Each edge is labeled with the minimal cost it causes under the assumption that the set sub-paths (see also Fig. 3): j is true. We then build the warping path p as the union of all π k j j , using the trivial warping path for all remaining layers: It is then not difficult to calculate that each clause block adds cost of exactly 42 · d(0, 1) and there are no other costs. Thus dtgw(G, H) ≤ 42m · d(0, 1) = c.  H). Note that any non-separation layer contains at most eight edges. So if p warps any separation layer to any non-separation layer, then the resulting cost would be at least (44m − 16) · d(0, 1) > c. Thus, we may assume that every separation layer i of G is only warped to layer i of H and vice versa. Since the last 22m layers of each temporal graph are all identical and (M, p) are chosen to have minimal cost, we can conclude that then the 22m layers 2n + 4m + 1, . . . , 2n + 4m + 22m each would cause cost of at least 2 · d(0, 1), thus exceeding c in total. Therefore, M has to contain a bijection from Now, consider the clause block corresponding to C j = 1 j ∨ 2 j ∨ 3 j . From the arguments above, it follows that G 2n+4j−3 and G 2n+4j−1 are warped to H 2n+4j−3 and H 2n+4j−1 respectively. This already costs 32 · d(0, 1). We distinguish three cases (corresponding to π 1 j through π 3 j above): (1) G 2n+4j−2 is warped to H 2n+4j−3 . This causes costs of at least 2 · d(0, 1). Then, H 2n+4j−2 must be warped to G 2n+4j−1 or p would not have minimal cost. Thus, there are additional costs of at least 8 · d(0, 1). This is the situation illustrated in Fig. 3a.
In summary, the costs contributed by each clause block are at least 42 · d(0, 1). Therefore, to meet the bound of c, all layers outside of clause blocks must not cause any additional cost. For Furthermore, for each j ∈ [m], the clause block corresponding to C j must have costs of exactly 42 · d(0, 1). If we are in Case (1) as above, then this is only possible if M maps each degree-1 vertex of G 2n+4j−2 to some degree-1 vertex of H 2n+4j−3 . Thus, 2 j , ν(j,2) ∈ M . Otherwise, if we are in Case (2) respectively Case (3), then analogous arguments yield that is a satisfying assignment for φ.
Let us take a closer look at the reduction in the proof of Theorem 4.1. Note that the corresponding optimal warping path is always close to the diagonal (that is, |i − j| ≤ 1 holds for every pair (i, j)). Hence, it lies within the so-called Sakoe-Chiba band [21] of width one. Moreover, the maximum degree in each layer is one. Finally, the number of vertices and the number of layers of both temporal graphs and the target cost c are all upper-bounded linearly in the size of the 3-SAT formula, which allows to conclude a running time lower bound based on the Exponential Time Hypothesis 1 [10] (together with the Sparsification Lemma [11]). These observations are summarized in the following corollary.
Corollary 4.2. DTGW is NP-complete for every metric and vertex degrees as vertex signatures even when the maximum degree of each layer is one and the warping path is restricted to the Sakoe-Chiba band of width one.
Due to the intrinsic hardness of DTGW, there is little hope to solve the general problem efficiently. In the following section, however, we point out two polynomial-time solvable special cases. Furthermore, we develop a mathematical programming formulation as well as a heuristic approach to compute the dtgw-distance in practice.

Algorithms
Our first algorithmic result is to show that determining whether two temporal graphs with the same number of vertices have dtgw-distance zero is possible in polynomial time. In contrast, determining whether two (static) graphs have graph edit distance zero is not known to be polynomial-time solvable (as this is equivalent to the famous Graph Isomorphism problem). Proof. We will show that for distance zero, an optimal warping path can easily be determined. Polynomial-time solvability then follows from Observation 3.1.
Let G = (V, E 1 , . . . , E T ) and H = (W, F 1 , . . . , F U ) be two temporal graphs with V =: {v 1 , . . . , v n } and W =: {w 1 , . . . , w n }. For each i ∈ [T ], we define the i th layer signature of G as f ( Assuming dtgw(G, H) = 0, it follows that there exists a vertex mapping M ⊆ V × W and a warping path p ∈ P T,U such that Clearly, if f (G i ) = f (G i ) and layer i is warped to layer j and layer i is warped to layer j , then f (H j ) = f (H j ) since otherwise the cost will not be zero. By the definition of a warping path, it follows that the layers 1, . . . , i 1 of G can only be warped to layers 1, . . . , j 1 of H and the layers i 1 + 1, . . . , i 2 of G can only be warped to layers j 1 + 1, . . . , j 2 of H and so on. Note that this is only possible if q = r. If this is the case, then we can assume that the warping path p has the following form: We remark that if the vertex signatures and the metric satisfy the property that every pair of different vertex signatures has distance at least δ for some constant δ > 0, then DTGW parameterized by c is in XP. For example, this is the case when the vertex signatures contain only integers and d is any p -norm (for p ≥ 1). Then, every pair of different signatures has distance at least δ = 1. The idea of the algorithm is to "guess" the tuples of a warping path which cause non-zero cost (at most c/δ many) and to check whether it is possible to complete the warping path without further costs. The latter can be done in polynomial time using similar arguments as for the case c = 0 (Theorem 5.1).
In contrast, if the dtgw-distance is normalized (e.g. divided by the number of vertices), then the differences between vertex signatures can be arbitrarily small. In that case, DTGW is NPcomplete even for a constant value of c (by the same reduction as in the proof of Theorem 4.1).
To overcome this hardness, in the following, we consider parameters regarding the warping path length. We assume that the lifetimes of the inputs differ by at most a constant, that is, T = U + t for some t ≥ 0 (which might often be the case in practice). Note that, by definition, every warping path of order T × U has length at least T . We define the parameter λ to be the difference between the warping path length and the lower bound T , that is, we consider only order-(T × U ) warping paths of length at most T + λ (in practice, long warping paths are often considered unnatural). We prove that DTGW is in XP with respect to the combined parameter (λ, t).
Note that Proposition 5.2 implies polynomial-time solvability of DTGW if t and λ are constants. For unbounded t, however, we conjecture that DTGW is NP-hard even if the warping paths are restricted to have length max(T, U ), which is the minimum possible length (that is, λ = 0). The idea is to modify the reduction in the proof of Theorem 4.1 by adding some appropriate layers to one of the temporal graphs.

Quadratic Programming
We give a formalization of DTGW as a quadratic minimization problem with linear constraints (QP). This can be used to solve relatively small instances exactly with state-of-the-art QPsolvers.
We define the following variables: denote the cost of matching vertex i in layer s to vertex j in layer t. Then, computing dtgw(G, H) is the following quadratic 2 minimization problem.
subject to The constraints 1a and 1b ensure that the vertex mapping variables define a correct vertex mapping, that is, every vertex is mapped to exactly one other vertex (or is deleted). Constraints 1c to 1f ensure that the warping variables define a valid warping path. Here, the constraints 1d to 1f imply that if the warping path contains a pair (s, t), then it also contains at least one of the pairs (s + 1, t), (s, t + 1), or (s + 1, t + 1) (since the objective is minimized, any solution will actually select only one of these pairs).
The number of variables is in O(|V |·|W |+T ·U ) and the number of constraints is in O(|V |+ |U | + T · U ).

Heuristic Approaches
In this section, we present a heuristic to compute the dtgw-distance, which typically yields good (not necessarily optimal) solutions in practice.
The approach is to simply start with an arbitrary initial vertex mapping (or warping path) and to compute an optimal warping path (vertex mapping) based on Observation 3.1 in polynomial time. This process is then repeated by alternating between optimal warping path and optimal vertex mapping computation until the solution converges to a local minimum (or some other criterion is reached).
Note that it is convenient to be able to stop the process after any number of iterations to obtain some approximate solution (a so-called anytime algorithm). It is further possible to incorporate preknowledge, for example, by fixing the mapping for some vertices. Note also that convergence is guaranteed since we decrease the objective in each alternation and the search space is finite. For the speed of convergence in practice, the choice of initialization can be important. We propose several options in the following.
Initial Warping Path. A first idea is to choose a shortest warping path (that is, of length max{T, U }). Note that for T = U several such paths exists. Without further knowledge about the instances, choosing a path within a Sakoe-Chiba band of small width is a reasonable default.
Another idea is to compute a warping path using D(G i , H j ) as a cost for warping layer i to layer j. This is of course an optimistic estimate since it allows to use a different vertex mapping for each pair of layers. Then, a vertex mapping can be computed by Observation 3.1.
Initial Vertex Mapping. The idea is to compute a vertex mapping by solving an assignment problem for approximate costs. Let σ(u, v) be some approximate cost for mapping vertex u ∈ V to v ∈ W . For example, one could use the following estimations The first option σ * estimates the cost of mapping u to v over all possible warpings between any two layers (this is usually more than any warping path will incur). The definition of σ opt only considers for each layer of the first temporal graph the minimal cost over all layers of the other temporal graph (this estimate might be too low).
Based on the estimated costs one computes a vertex mapping by solving an assignment problem and then computes an optimal warping path for this vertex mapping based on Observation 3.1.

Experiments
We conducted some preliminary experiments to test the viability of our dtgw-distance and to evaluate the performance of the alternating minimization heuristic (AM) we described in Section 5.2. For computations, we used a 4.0 GHz i7-6700K processor.

Data Sets
We used the following two data sets for our experiments.
Primary School Face-to-Face Contacts [7,23]. This data set contains a temporal network of face-to-face contacts between pupils and teachers in a french primary school, measured with a 20 second resolution. The network describes contacts among 232 children in 5 grades and 10 teachers. It has a lifetime of 3100, covering two days of school activity.
FAA Flights Spatio-Temporal Network [25]. This data set contains a spatio-temporal network of US domestic flights. The network describes actual take-off and landing times among 299 US airports. The temporal resolution of this network is 30 minutes and the lifetime is 480. 13

Benchmark on Small Instances.
Here we compared the solutions of the proposed AM heuristic under different initialization schemes against the optimal solutions. Due to long running times for computing optimal solutions, we are restricted to small temporal networks. We randomly selected 10 children from class 1A of the Primary School data set and extracted 15 temporal networks with 15 consecutive layers during a high contact period. We used vertex degrees as vertex signatures and the absolute value metric and computed all pairwise dtgw-distances between these 15 networks with the following algorithms: • QP: exact QP-solver (Gurobi 8.0.1) • AM σ * : AM with σ * initialization • AM σopt : AM with σ opt initialization • AM swp : AM with shortest warping path initialization • AM owp : AM with optimistic warping path initialization We implemented the AM heuristic in Python 3 , using an existing C++ implementation 4 of the Jonker-Volgenant algorithm [13] to solve the assignment problem. Figure 4 shows for each initialization variant the estimated cumulative distribution function (ecdf) of the error percentage ε = 100 · (d AM − d QP )/d QP , where d AM is the approximated dtgw-distance obtained by an AM heuristic and d QP is the exact dtgw-distance obtained by the QP-solver. A point (ε, P ε ) on an ecdf-curve of an AM heuristic means that the error percentage of AM is at most ε with estimated probability P ε .
All AM variants found the correct solution for a majority of samples (P 0 > 0.5). The average error percentages are rather small and vary between 3.0 by AM owp and 5.5 by AM σopt . The AM owp heuristic performed best, having P 0 ≈ 0.71 and maximum error percentage max ≈ 36.4.
These findings indicate that for small instances the approximations of the four heuristics are relatively close to the optimal solution on average but may fail considerably in some cases with up to a maximum error percentage of 63.6. We remark that in our experience, the relative differences become smaller on larger instances.
Regarding running times, the AM heuristic took less than 0.01 seconds per instance, usually converging after at most three alternations. In comparison, the QP was slower by a factor of more than 10 000, requiring eight minutes on average (median two minutes) with some instances approaching two hours.

Sensitivity of DTGW to Noise
The goal of this experiment is to assess how sensitive the dtgw-distance is to noise, that is, how well can original data be reconstructed from noisy data.
We used the Primary School dataset from which we extracted five reference temporal networks representing the contacts between children of the same grade, each containing 45-50 vertices and 3100 layers.
For each of the five reference networks, we generated nine noisy copies as follows:  Rewiring an edge e = {u, v} ∈ E i of a temporal graph is defined as randomly picking a tuple (e = {u , v }, t) ∈ T s=1 (E s × {s}) and then replacing e in E i by {u, v } and e in E t by {u , v}. 5 Rewiring of underlying edges is done analogously (see Holme and Saramäki [9] for details).
We used the AM heuristic to approximate the pairwise dtgw-distances (using degrees as vertex signatures) between all reference and noisy temporal networks. In all of these instances, shortest-warping-path initialization (which is fastest) was used since preliminary tests showed that the other initializations produce very similar results. Figure 5 shows the dendogram obtained by hierarchical clustering using complete linkage and the t-SNE embedding [18] of the approximated pairwise dtgw-distances. The plots show five well-separated clusters, each of which consists of a reference network and its nine noisy copies. Hence, the original reference networks can be successfully recovered from noise.
In all instances, the heuristic converged within at most six iterations, taking about 20 seconds of single-threaded computation.

Vertex identification
To study the ability of the dtgw-distance to identify vertices in real-world networks, we used the FAA Flights data set. We selected two disjoint time periods, each spanning three days (Monday to Wednesday) and 144 layers. We then used the AM heuristic to compute the dtgw-distance and the corresponding vertex mapping between these two temporal graphs. When using only the degrees as vertex signatures, the resulting mapping correctly re-identified about 81.6% of the 299 airports (that is, mapped them to their copies).
To demonstrate how additional information may be used to improve the result, we subsequently labeled each vertex with "northwest", "northeast", "southwest", or "southeast", depending on its geographical location. Using the vertex signature function the AM heuristic re-identified about 96.3% of the airports (again independently of initialization).

Conclusion
We introduced a new similarity measure on temporal graphs by transferring dynamic time warping to temporal graphs. We showed that this constitutes a challenging computational problem and proposed several exact algorithms as well as a heuristic approach to solve it. While exact solutions can only be computed for very small instances, we empirically demonstrated that our heuristic runs fast in practice and yields good solutions. Our work opens several directions for future research. On a theoretical side there are several open questions regarding the parameterized complexity of DTGW with respect to the output or the warping path length parameter λ. It might also be interesting to develop efficient constantfactor approximation algorithms for DTGW.
On an applied side, further experiments should be conducted in order to empirically evaluate the merit of the dtgw-distance in applications. Depending on the application domain, one could test different vertex signatures or even other graph distances as local cost functions. Moreover, data mining applications give rise to further challenging problems, e.g. for clustering tasks it might be necessary to compute a mean temporal graph with respect to the dtgw-distance. This might become an important task to solve in the future.