David B. Blumenthal

Free University of Bozen-Bolzano
Faculty of Computer Science


ORCID iD iconorcid.org/0000-0001-8651-750X
Office POS 2.12
Piazza Dominicani 3
39100 Bolzano, Italy

About Me

I am a postdoctoral researcher at the Faculty of Computer Science of the Free University of Bozen-Bolzano. In July 2019, I defended my Ph.D. thesis entitled “New Techniques for Graph Edit Distance Computation”. My main research interests are graph matching and graph edit distance. In February and November 2018, I was visiting fellow at the GREYC Research Lab in Digital Sciences. I hold a M.Sc. degree in Mathematics from the Technische Universität Berlin as well as a B.Sc. degree in Mathematics and B.A. and M.A. degrees in Philosophy from the Freie Universität Berlin. For more information, please download my CV or visit my profile on LinkedIn.

Publications

This is a complete list of my research publications. Also see my profiles on ResearchGate, Scopus, ORCiD, dblp, and Google Scholar. If you have problems accessing a paper you are interested in, feel free to contact me.

Abstract Due to their capacity to encode rich structural information, labeled graphs are often used for modeling various kinds of objects such as images, molecules, and chemical compounds. If pattern recognition problems such as clustering and classification are to be solved on these domains, a (dis-)similarity measure for labeled graphs has to be defined. A widely used measure is the graph edit distance (GED), which, intuitively, is defined as the minimum amount of distortion that has to be applied to a source graph in order to transform it into a target graph. The main advantage of GED is its flexibility and sensitivity to small differences between the input graphs. Its main drawback is that it is hard to compute.

In this thesis, new results and techniques for several aspects of computing GED are presented. Firstly, theoretical aspects are discussed: competing definitions of GED are harmonized, the problem of computing GED is characterized in terms of complexity, and several reductions from GED to the quadratic assignment problem (QAP) are presented. Secondly, solvers for the linear sum assignment problem with error-correction (LSAPE) are discussed. LSAPE is a generalization of the well-known linear sum assignment problem (LSAP), and has to be solved as a subproblem by many GED algorithms. In particular, a new solver is presented that efficiently reduces LSAPE to LSAP. Thirdly, exact algorithms for computing GED are presented in a systematic way, and improvements of existing algorithms as well as a new mixed integer programming (MIP) based approach are introduced. Fourthly, a detailed overview of heuristic algorithms that approximate GED via upper and lower bounds is provided, and eight new heuristics are described. Finally, a new easily extensible C++ library for exactly or approximately computing GED is presented.

arXiv | bibtex

Abstract Because of its flexibility, intuitiveness, and expressivity, the graph edit distance (GED) is one of the most widely used distance measures for labeled graphs. Since exactly computing GED is NP-hard, over the past years, various heuristics have been proposed. They use techniques such as transformations to the linear sum assignment problem with error-correction, local search, and linear programming to approximate GED via upper or lower bounds. In this paper, we provide a systematic overview of the most important heuristics. Moreover, we empirically evaluate all compared heuristics within an integrated implementation.

doi | poster | bibtex

Abstract The graph edit distance (GED) is a flexible graph dissimilar- ity measure widely used within the structural pattern recognition field. In this paper, we present GEDLIB, a C++ library for exactly or approx- imately computing GED. Many existing algorithms for GED are already implemented in GEDLIB. Moreover, GEDLIB is designed to be easily ex- tensible: for implementing new edit cost functions and GED algorithms, it suffices to implement abstract classes contained in the library. For implementing these extensions, the user has access to a wide range of utilities, such as deep neural networks, support vector machines, mixed integer linear programming solvers, a blackbox optimizer, and solvers for the linear sum assignment problem with and without error-correction.

doi | slides | poster | bibtex

Abstract Shortest path computation is a fundamental problem in road networks. However, in many real-world scenarios, determining solely the shortest path is not enough. In this paper, we study the problem of finding k-Dissimilar Paths with Minimum Collective Length (kDPwML), which aims at computing a set of paths from a source s to a target t such that all paths are pairwise dissimilar by at least \theta and the sum of the path lengths is minimal. We introduce an exact algorithm for the kDPwML problem, which iterates over all possible s-t paths while employing two pruning techniques to reduce the prohibitively expensive computational cost. To achieve scalability, we also define the much smaller set of the simple single-via paths, and we adapt two algorithms for kDPwML queries to iterate over this set. Our experimental analysis on real road networks shows that iterating over all paths is impractical, while iterating over the set of simple single-via paths can lead to scalable solutions with only a small trade-off in the quality of the results.

doi | bibtex | extended version on arXiv

Abstract The graph edit distance (GED) is a widely used distance measure for attributed graphs. It has recently been shown that the problem of computing GED, which is a NP-hard optimization problem, can be formulated as a quadratic assignment problem (QAP). This formulation is useful, since it allows to derive well performing approximative heuristics for GED from existing techniques for QAP. In this paper, we focus on the case where the edit costs that underlie GED are quasimetric. This is the case in many applications of GED. We show that, for quasimetric edit costs, it is possible to reduce the size of the corresponding QAP formulation. An empirical evaluation shows that this reduction significantly speeds up the QAP-based approximative heuristics for GED.

doi | poster | bibtex

Abstract The graph edit distance (GED) is a flexible graph dissimilarity measure widely used within the structural pattern recognition field. A widely used paradigm for approximating GED is to define local structures rooted at the nodes of the input graphs and use these structures to transform the problem of computing GED into a linear sum assignment problem with error correction (LSAPE). In the literature, different local structures such as incident edges, walks of fixed length, and induced subgraphs of fixed radius have been proposed. In this paper, we propose to use rings as local structure, which are defined as collections of nodes and edges at fixed distances from the root node. We empirically show that this allows us to quickly compute a tight approximation of GED.

doi | slides | bibtex

Abstract The graph edit distance is a widely used distance measure for labelled graph. However, A*-GED, the standard approach for its exact computation, suffers from huge runtime and memory requirements. Recently, three better performing algorithms have been proposed: The general algorithms DF-GED and BIP-GED, and the algorithm CSI-GED, which only works for uniform edit costs. All newly proposed algorithms outperform the standard approach A*-GED. However, cross-comparisons are lacking. This paper consolidates and extends these recent advances. To this purpose, we present all existing algorithms in a unified way and show that the slightly different definitions of the graph edit distance underlying A*-GED and DF-GED, on the one side, and CSI-GED, on the other side, can be harmonised. This harmonisation allows us to develop a generalisation of CSI-GED to non-uniform edit cost. Moreover, we present a speed-up of A*-GED and DF-GED for uniform edit costs, which build upon the fact that, in the uniform case, a continuously used subroutine can be implemented to run in linear rather than cubic time. We also suggest an algorithm MIP-GED which builds upon a very compact new mixed integer linear programming formulation. Finally, we carry out a thorough empirical evaluation, which, for the first time, compares all existing exact algorithms.

doi | bibtex

Abstract We propose an algorithm that efficiently solves the linear sum assignment problem with error-correction and no cost constraints. This problem is encountered for instance in the approximation of the graph edit distance. The fastest currently available solvers for the linear sum assignment problem require the pairwise costs to respect the triangle inequality. Our algorithm is as fast as these algorithms, but manages to drop the cost constraint. The main technical ingredient of our algorithm is a cost-dependent factorization of the node substitutions.

doi | bibtex

Abstract The problem of deriving lower and upper bounds for the edit distance between undirected, labelled graphs has recently received increasing attention. However, only one algorithm has been proposed that allegedly computes not only an upper but also a lower bound for non-uniform edit costs and incorporates information about both node and edge labels. In this paper, we demonstrate that this algorithm is incorrect. We present a corrected version Branch that runs in O(n2Δ3+n3) time, where Δ is the maximum of the maximum degrees of input graphs G and H. We also develop a speed-up BranchFast that runs in O(n2Δ2+n3) time and computes an only slightly less accurate lower bound. The lower bounds produced by Branch and BranchFast are shown to be pseudo-metrics on a collection of graphs. Finally, we suggest an anytime algorithm BranchTight that iteratively improves Branch's lower bound. BranchTight runs in O(n3Δ2+I(n2Δ3+n3)) time, where the number of iterations I is controlled by the user. A detailed experimental evaluation shows that all suggested algorithms are Pareto optimal, that they are very effective when used as filters for edit distance range queries, and that they perform excellently when used within classification frameworks.

doi | bibtex

Abstract The graph edit distance is a well-established and widely used distance measure for labelled, undirected graphs. However, since its exact computation is NP-hard, research has mainly focused on devising approximative heuristics and only few exact algorithms have been proposed. The standard approach A*-GED, a node-based best-first search that works for both uniform and non-uniform metric edit costs, suffers from huge runtime and memory requirements. Recently, two better performing algorithms have been proposed: DF-GED, a node-based depth-first search that works for uniform and non-uniform metric edit costs, and CSI_GED, an edge-based depth-first search that works only for uniform edit costs. Our paper contains two contributions: First, we propose a speed-up DF-GEDu of DF-GED for uniform edit costs. Second, we develop a generalisation CSI_GEDnu of CSI_GED that also covers non-uniform metric edit cost. We empirically evaluate the proposed algorithms. The experiments show, i.a., that our speed-up DF-GEDu clearly outperforms DF-GED and that our generalisation CSI_GEDnu is the most versatile algorithm.

doi | slides | bibtex

Abstract The problem of deriving lower and upper bounds for the edit distance between labelled undirected graphs has recently received increasing attention. However, only one algorithm has been proposed that allegedly computes not only an upper but also a lower bound for non-uniform metric edit costs and incorporates information about both node and edge labels. In this paper, we show that this algorithm is incorrect in the sense that, in general, it does not compute a lower bound. We present Branch, a corrected version of the algorithm that runs in O(n5) time. We also develop a speed-up BranchFast that runs in O(n4) time and computes a lower bound, which is only slightly less accurate than the one computed by Branch. An experimental evaluation shows that Branch and BranchFast yield excellent runtime/accuracy-tradeoffs, as they outperform all existing competitors in terms of runtime or in terms of accuracy.

doi | poster | bibtex

Abstract Der Aufsatz zielt darauf ab, ein angemessenes Verständnis der Semantik von Äußerungen über fiktive Kontexte (kurz: AFKs) zu entwickeln. Der systematische Ausgangspunkt der Arbeit besteht dabei in einem pragmatistischen Inferentialismus à la Robert Brandom. Für diese theoretische Weichenstellung wird nicht gesondert argumentiert, sondern vielmehr geltend gemacht, dass aus ihr zwei Forderungen an eine angemessene Theorie der Semantik von AFKs ableitbar sind. Erstens muss eine solche Theorie den inferentiellen Beziehungen Rechnung tragen, in denen von AFKs gemachte Aussagen de facto stehen. Zweitens darf sie nur auf solche Gegenstände zurückgreifen, die insofern ontologisch unschuldig sind, als sie vollständig und individuierbar sind. Aus diesen Forderungen ergibt sich, dass klassische Theorien der Semantik von AFKs unbefriedigend sind: Weder können AFKs mit Bertrand Russell als Äußerungen über das inferentielle Potenzial bestimmter Mengen fiktionaler Medien betrachtet, noch mit Searle, van Inwagen oder Parsons als Äußerungen über fiktive Gegenstände verstanden werden. Im Anschluss an diese kritische Auseinandersetzung wird ein eigener Vorschlag entwickelt, dessen Kerngedanken darin bestehen, AFKs als Äußerungen über bestimmte Werke zu betrachten, die wesentlich auf Interpretation beruhen. Dabei werden Werke als Äquivalenzklassen fiktionaler Medien verstanden, die logische Feinstruktur von AFKs gemachter Aussagen nach dem Modell von De-dicto-Zuschreibungen erläutert und deren Funktionsweise wiederum inferentialistisch gefasst.

pdf | bibtex

Teaching
Projects

GEDLIB is an easily extensible C++ library for (suboptimally) computing the graph edit distance (GED) between two labeled graphs. GED is defined as the minimum cost of a sequence of elementary edit operations that transforms one graph into another. There are six elementary edit operations: changing the label of an existing node, inserting an isolated node, deleting an isolated node, changing the label of an existing edge, inserting an an edge between two existing nodes, and deleting an edge. GEDLIB implements several state of the art methods for computing GED and comes with predefined edit costs for some benchmark datasets. Further methods and edit costs can easily be implemented by the user.

Awards & Funding
2015 - 2019
Ph.D. grant from the Faculty of Computer Science of the Free University of Bozen-Bolzano.
2017
Best paper award for “Exact computation of graph edit distance for uniform and non-uniform metric edit costs” at GbRPR 2017.
2015
M.Sc. award from the Institue of Mathematics of the Technische Universität Berlin.
2007 - 2015
Scholarship from the Studienstiftung des deutschen Volkes.
Services
PC Member
HCC 2019.
Reviewer
ICPR 2018, Pattern Recognit. Lett. (2018).
External Reviewer
VLDB 2017, EDBT 2017, ICDE 2016.