3rd KRDB school on
Trends in the Web of Data

Brixen-Bressanone (near Bozen-Bolzano), Italy
17-18 September 2010


Programme at a glance

The school starts at 9am on Friday the 17th of September, and it finishes at 5:30pm on Saturday the 18th of September.

17 September
8:00 Registration
9:00 Danny Ayers: Platforms and the Semantic Web (slides)
11:00 Break
11:30 Jonathan Ellis: NoSQL and Cloud Computing (part a)
12:30 Lunch
14:00 Jonathan Ellis: NoSQL and Cloud Computing (part b) (slides)
15:00 Tom Heath: Linked Data (part a)
16:00 Break
16:30 Tom Heath: Linked Data (part b) (slides)
17:30 Peter Mika: Semantic Search (part a)
18 September
8:30 Peter Mika: Semantic Search (part b) (slides)
9:30 Martin Hepp: The GoodRelations Ontology for E-Commerce (part a)
10:30 Break
11:00 Martin Hepp: The GoodRelations Ontology for E-Commerce (part b) (slides)
12:00 Lunch
13:30 Marko Rodriguez: Graph Databases (slides)
15:30 Break
16:00 Stefano Ceri: The need of semantics in search services (panel) (slides)
17:30 End of school

The lectures of the school are:

  • Danny Ayers: Platforms and the Semantic Web

    A platform in general is a structure to support something useful or interesting. Within computing, the term has traditionally been used to describe the hardware architecture and operating systems upon which software may be run. The platform presents an abstraction of the underlying system, intended to simplify development and deployment. The term has evolved to cover higher levels of abstraction such as database systems and software development tools. But the advent of the Web, with its hugely distributed nature and global adoption has brought with it new challenges to the notion of a platform. In this session I will identify common characteristics of various platforms, from early mainframe computers to API-based "Web 2.0" systems such as the Facebook platform, up to Semantic Web platforms. I shall give a view of the Web as a platform, with practical examples and highlight some of the opportunities and pitfalls, and identify strategies that are likely to be successful in the future.

    CV: Danny Ayers is an independent developer and consultant specializing in Web technologies, currently focussed on the data.gov.uk project. He has contributed to 10 books related to the Web, and writes an occasional column for IEEE Internet Computing, "Webscience". He was an early enthusiast and advocate for the Semantic Web, and has been involved in the development of various Web specifications.

  • Jonathan Ellis: NoSQL and Cloud Computing

    Cloud computing has solved scalability-on-demand for many applications, but the database has remained an Achilles heel: relational databases are stubbornly difficult to scale. NoSQL is the growing practice of using the right database tool for every project. The most exciting aspect of NoSQL is those products that offer truly incremental scalability. Cassandra is a distributed database combining the best of Google's Bigtable and Amazon's Dynamo and is in use at Facebook, Twitter, Rackspace, and more companies that have large, active data sets. The largest production cluster has over 100 TB of data in over 150 machines. This session will cover how to evaluate which NoSQL products can be useful for your projects, strengths and weaknesses of using NoSQL in a cloud environment, and why Cassandra is the NoSQL database of choice for companies that need scalability without sacrificing durability.

    CV: Jonathan Ellis is project chair of Apache Cassandra and co-founder of Riptano, provider of professional support and services for Cassandra. Cassandra is the distributed database in use at Facebook, Twitter, Rackspace, and more companies that have large, active data sets. Jonathan was an early employee at Mozy where he wrote Triton, a fault tolerant, multi-petabyte distributed storage system based on Reed-Solomon encoding.

  • Tom Heath: Linked Data

    Linked Data is about using the architecture and technologies of the Web to connect data that is related but stored in different locations. These locations may be as diverse as databases maintained by organisations in different geographical locations, or simply heterogeneous databases, applications and services within one organisation that have not previously shared data in a consistent fashion. The Linked Data principles define a set of norms for publishing and connecting data using Web standards. By adopting these principles, data publishers are making it easier for their data to be combined with other related data sets and reused in novel applications. In this session I will introduce the concept and principles of Linked Data, and provide an overview of the core technologies that underpin this approach to data sharing. I will highlight some of the key technical considerations and design decisions that influence the Linked Data publishing process, before describing potential architectures for applications seeking to exploit Linked Data. The session will conclude with a discussion of open research questions that arise when data sharing graduates from relatively controlled scenarios to the massive, noisy and uncertain environment of the Web.

    CV: Dr. Tom Heath is Lead Researcher at Talis Systems Ltd, a global leader in the research, development and commercial exploitation of Linked Data and Semantic Web technologies. At Talis, he is responsible for leading internal research exploring how Linked Data affects the sharing and reuse of data, the value and insights that can be derived from this data, and the implications of these changes for human-computer interaction. Tom has a PhD in Computer Science from The Open University. He has been the recipient of a number of awards in the Semantic Web field, including First Prize in the 2007 International Semantic Web Challenge, and STI International PhD of the Year 2008/9.

  • Peter Mika: Semantic Search

    While most current research in Web Retrieval aims at improving search over hypertext, the Semantic Web promises to break new boundaries in search by transforming the content itself into a form that is more easily processable by machines. In this session, we introduce the notion of semantic search, and in particular the different roles that semantics can play in various parts of the IR process from document processing to query analysis, ranking and result presentation. We show some of the existing prototypes for semantic search from both research and industry, and discuss our efforts toward evaluating semantic search systems. We close with a discussion of active research topics in the field.

    CV: Dr. Peter Mika is a researcher and data architect at Yahoo! Research in Barcelona, working on the applications of semantic technology to Web search. He received his PhD in computer science (cum laude) from Vrije Universiteit Amsterdam. His interdisciplinary work in social networks and the Semantic Web earned him a Best Paper Award at the 2005 International Semantic Web Conference and a First Prize at the 2004 Semantic Web Challenge. From 2006 to 2009, he has been a co-chair of the Semantic Web Challenge. He is the author of the book 'Social Networks and the Semantic Web' (Springer, 2007). In 2008 he has been selected as one of "AI's Ten to Watch" by the editorial board of the IEEE Intelligent Systems journal.

  • Martin Hepp: The GoodRelations Ontology for E-Commerce

    The GoodRelations ontology for business, product, and offer data has quickly become the second most popular Web ontology. Within less than two years, it already comes close to FOAF in terms of available data and adoption but has a more complex conceptual structure and formal account. For the first time in history, we can expect a massive amount of real-world data expressed using an ontology compliant with the state-of-the art in W3C Semantic Web technology. In this session, I will (1) give an overview of the GoodRelations ontology and point to the most useful resources for developers, (2) present lessons learned for engineering successful Web-scale ontologies, and (3) outline interesting research challenges in the fields of Controlled Natural Languages, Natural Language Processing, Ontology Mapping and Alignment, Collaborative Ontology Engineering, and Storage and Reasoning.

    CV: Martin Hepp is a professor of general management and e-business at Universität der Bundeswehr München in Germany, where he heads the e-business and Web Science Research Group. Hepp holds a PhD in business information systems from the University of Würzburg (Germany). His key research interest is in using structured, linked data on a Web scale for e-business, in particular matchmaking and product data reuse. As part of his research, he developed the GoodRelations and eClassOWL ontologies, now widely used for describing offers on the Web

  • Marko Rodriguez: Graph Databases

    Relational databases are perhaps the most commonly used data management systems. In relational databases, data is modeled as a collection of disparate tables. In order to unify the data within these tables, a join operation is used. This operation is expensive as the amount of data grows. For information retrieval operations that do not make use of extensive joins, relational databases are an excellent tool. However, when an excessive amount of joins are required, the relational database model breaks down. In contrast, graph databases maintain one single data structure---a graph. A graph contains a set of vertices (i.e. nodes, dots) and a set of edges (i.e. links, lines). These elements make direct reference to one another, and as such, there is no notion of a join operation. The direct references between graph elements make the joining of data explicit within the structure of the graph. The benefit of this model is that traversing (i.e. moving between the elements of a graph in an intelligent, direct manner) is very efficient and yields a style of problem-solving called the graph traversal pattern. This session will discuss graph databases, the graph traversal programming pattern, and their use in solving real-world problems.

    CV: Dr. Marko A. Rodriguez is currently a AT&Ti Graph Systems Architect; before he was PostDoctoral Director's Fellow for the Center for Nonlinear Studies at the Los Alamos National Laboratory. He is the primary designer of the Gremlin graph-based programming language, and he is working on the graph infrastructure of Neo4j, the Java-Based NoSQL Graph Database. He got his PhD in Computer Science from the University of California at Santa Cruz in 2007.

  • Stefano Ceri: The need of semantics in search services (panel)

    CV: Stefano Ceri is professor of Database Systems at the Dipartimento di Elettronica e Informazione (DEI), Politecnico di Milano. He was visiting professor at the Computer Science Department of Stanford University (1983-1990). He was the chairman of the Computer Science Section of DEI (1992-2004), and the chairman of LaureaOnLIne, a fully online curriculum in Computer Engineering (2004-2008). Stefano Ceri is vice-chairman (representing Politecnico di Milano) of Alta Scuola Politecnica, a school of excellence for master-level students which is jointly managed by Politecnico di Milano and Politecnico di Torino. He was associate editor of ACM-Transactions on Database Systems and IEEE-Transactions on Software Engineering, and he is currently an associated editor of several international journals. Stefano Ceri is co-editor in chief (with Mike Carey) of the book series "Data Centric Systems and Applications"(Springer-Verlag).

Organised by: KRDB Research Centre at the Faculty of Computer Science of the Free University of Bozen-Bolzano.

Enrico Franconi