D2AI: Data-driven Artificial Intelligence research area

The Data-Driven Artificial Intelligence (D2AI) research area is devoted to handling Big Data, using advanced data science and machine learning, to tackle a variety of grand challenges in artificial intelligence.

Specific research strands include:

Advanced machine learning: to develop new methods and techniques for the creation of models based on both big and small data, such us multi-task learning, transfer learning, and reinforcement learning;
Machine learning in embedded systems: development of new methodologies to miniaturize and accelerate machine learning algorithms, to enable these in mobile devices, considering scenarios such as intelligent Internet of Things and smart sensing;
Computer vision (-> Visual Computing and Learning Lab): development of deep learning models to address complex computer vision problems, such as activity recognition in video, anomaly detection in images and volumetric data, and the development of algorithms for the semi-automatic creation of artificial neural networks for the extraction of information from images and video;
Databases and analytics (-> Database Systems Group): development of new techniques for managing, processing, and analysing multi-dimensional and temporal data, considering complex scenarios such as smart systems, industrial IoT, and predictive maintenance.

In the context of complex/smart systems, ML models are more robust and adaptive than physical models

Building AI models based on data, and use them for optimization, control, and decision-making
Using big data, data science and machine learning to ´gain insights´ about the system
Using ML to enhance, enrich, and transform data, so as to more effectively train algorithms and models

Machine intelligence must start at the data source, and then integrate with cloud intelligence

Data science and ML are peculiar in the context of Intelligent-IoT
- Distributed rather than centralized learning
- Data are particularly noisy, unstructured, uncurated
- Loads of data are: missing, incorrect, irrelevant
- ML is real-time and starts at the very edge:
  - Learning while sensing
  - Edge ML for data quality enhancement

HW systems (edge-ML)

The science of miniature ML
Embedded ML for real data quality enhancement
Edge ML for data series imputation, cleaning, processing, characterization, event/anomalies, etc.
Edge-Cloud ML
Acceleration of ANNs for cloud ML processes
Smart networks

Data (fusion)

Sensor and IoT multi-variate streams
Spatio-temporal data series
Sensor/Satellite data fusion (hyper-spectral, synthetic aperture radar)
Ice and underwater imaging
Subjective/objective data fusion (smart city)

Models (data-driven)

Urban mobility
Concurrent sensing and learning
Intelligent protocols for energy-efficiency
Object detection
Hydro-climatic
Ice classification
Oceanic eddy detection

Machine Learning techniques

Sparse least squares Support Vector Regression
Bayesian inference
Scalable Clustering
Graph Mining (frequent subgraphs, heaviest k–subgraph)
Feature Selection
Outlier Detection

Multi-Task Deep Learning

Data Science Tools

KNIME is a popular, open-source and free Data Science and ML platform
Contributions to:
- first KNIME release in 2006
- adoption for Data Science teaching and training since 2008
- KNIME Certification in Data Science in 2019
Applications to Prediction of Alzheimer’s Disease from Brain Imaging

Distributed Computing / Big data

Big Data Mining
- Parallel and Distributed Clustering
- Rule-based Classification for data streams
- Fast retrieval of weather analogues in the ECMWF multi-petabytes archive
Distributed Data Mining and Distributed Computing
- Epidemic Computing for fully-decentralised data mining
- Decentralised consensus
- Data Mining in Cloud/Edge Computing
- IOT- & Blockchain-Enabled Security Framework for New Generation Critical Cyber-Physical Systems In Finance Sector
Distributed multi-dimensional scaling
- Non-Euclidean network coordinates

Computer Vision

Representation Learning for Video Understanding
- Action Recognition from Video and Sensor Streams
- Text-Video Retrieval and Video Question Answering
- Full-body Human Tracking and Object Detection
Anomaly Detection in Image and Volumetric Data
- Vision-based Quality Control of Produced Parts
- Neural Digital Twin of Produced Parts for Inspection
- Anomaly Classification and Segmentation in CT Scans
Neural Architecture Search (NAS)
- AutoML for Image and Video Understanding Tasks

Query processing of interval-timestamped historical data in RDBMSs

Efficient algorithms and indexing structures for join and aggregation
First RDBMS with support for all temporal operators

Algorithms for processing time series data

Imputation of missing values and anomaly detection
Motif discovery
Correlation analysis
Predictive analytics