Home Home Faculty Home Faculty of Computer Science
Free University of Bozen - Bolzano

Database and Information Systems

ABOUT DIS
Staff
Guests
How to reach us
Open Positions

TEACHING
BSc: DB Stream
MSc: DB Stream

COURSES
Advanced Topics in Databases
Advanced Topics in Inf. Systems
Approximation: Theory and Algorithms
Data Management Systems
Data Structures and Algorithms
Data Warehousing/Mining
Distributed Databases
Mobile Services
Seminar in Databases
Temporal and Spatial Databases

RESEARCH
Publications
PhD Projects
Software
Scientific Services
Seminars

RESEARCH PROJECTS
eBZ – 2015
COSPA
3DVDM-DS
Eurescom P817
Chorochronos
XVDM

FACULTY
IT Services
Faculty Council
Regulations

LINKS
DBLP
OnlineLibraries@unibz
G

W

A Parallel Corpus of Italian/German Legal Texts

Johann Gamper

Abstract

This paper presents the creation of a parallel corpus of Italian and German legal documents which are translations of one another. The corpus, which contains approximately 5 mio. words, is primarily intended as a resource for (semi-)automatic terminology acquisition. The guidelines of the Corpus Encoding Standard have been applied for encoding structural information, segmentation information, and sentence alignment. Since the parallel texts have a one-to-one correspondence on the sentence level, building a perfect sentence alignment is rather straightforward. As a result of this the corpus constitutes also a valuable testbed for the evaluation of alignment algorithms. The paper discusses the intended use of the corpus, the various phases of corpus compilation, and basic statistics.