S-PIC4CHU: Semantics-Enriched Techniques for Data Preparation in Data Science

Gianvincenzo Alfano, Ilaria Bartolini, Diego Calvanese, Paolo Ciaccia, Sergio Greco, Davide Lanti, Pasquale Leonardo Lazzaro, Emilia Lenzi, Davide Martinenghi, Cristian Molinaro, Marco Patella, Letizia Tanca, Riccardo Torlone, and Irina Trubitsyna

Proc. of the 4th Italian Conference on Big Data and Data Science (ItaData). Volume 4152 of CEUR Workshop Proceedings, https://ceur-ws.org/. 2025.

The S-PIC4CHU project deals with the crucial issue of data preparation for Data Science and Machine Learning, and aims to offer new models and techniques for fighting inaccuracy, noise, uncertainty, bias, and incompleteness of data. While, at the core, the project embraces a semantics-based approach, the proposed data preparation pipeline includes data cleaning---also from the ethical viewpoint---, transformation, reduction as well as deduplication, error detection, missing value imputation, and space transformations for multimedia data. This paper illustrates the advancements on all these fronts, achieved during the first months of work on the project, and sets out the forthcoming actionable objectives.


@inproceedings{ItaData-2025,
   title = "S-PIC4CHU: Semantics-Enriched Techniques for Data Preparation
in Data Science",
   year = "2025",
  author = "Gianvincenzo Alfano and Ilaria Bartolini and Diego Calvanese and
Paolo Ciaccia and Sergio Greco and Davide Lanti and Lazzaro, Pasquale
Leonardo and Emilia Lenzi and Davide Martinenghi and Cristian Molinaro and
Marco Patella and Letizia Tanca and Riccardo Torlone and Irina Trubitsyna",
   booktitle = "Proc. of the 4th Italian Conference on Big Data and Data
Science (ItaData)",
   volume = "4152",
   publisher = "CEUR-WS.org",
   series = "CEUR Workshop Proceedings, https://ceur-ws.org/",
}
pdf url