Data Visualization and Exploration

Ozan Kahramanoğulları

Yes, I see.

state energy state energy state energy state energy
AT 9.374 EL 7.087 IT 10.182 PT 7.254
BA 2.240 ES 8.191 LT 4.680 RO 4.430
BE 6.388 2020 7.672 LU 10.506 RS 2.420
BG 2.226 EU28 8.012 LV 4.379 SE 7.821
CY 6.968 FI 5.453 ME 3.396 SI 5.494
CZ 3.879 FR 8.059 MK 2.853 SK 4.686
DE 8.616 HR 5.355 MT 3.826 TR 6.291
DK 14.007 HU 4.461 NL 7.400 UK 10.620
EA19 8.230 IE 13.620 NO 11.534 XK 2.140
EE 2.806 IS 1.866 PL 4.232 NA

Energy productivity (gross domestic product (GDP) by the gross inland consumption of energy) in 2014 http://data.europa.eu/euodp/en/data/dataset/xWiT1fbpF5q1ZCvLQc2upg

Visualization is better at summarizing data

Consider the following summaries of a 2 dimensional dataset

  • Mean 9, 7.5009091
  • Standard deviation 3.3166248, 2.0315681
  • Correlation coefficient 0.8164205

This is called Anscombe’s quartet

Visualization is better at summarizing data

https://www.nytimes.com/interactive/2019/08/29/opinion/hurricane-dorian-forecast-map.html

In this course, we will study techniques
to explore and visualize data
in order to communicate with people.

The goals of this course

  • Give you the tools to carry out your own data analyses

  • Learn how to explore data

  • How to preprocess and clean data

  • How to visualize data in appropriate ways

    • Study human perception
  • How to program to develop reproducible analyses

A data exploration workflow

Source: https://r4ds.had.co.nz/explore-intro.html

Second edition: https://r4ds.hadley.nz/intro

Reproducibility

Reproducibility

Exploration is useless if you don’t draw a map to repeat your steps

Reproducibility is the key!

  • Someone questions your conclusions.

  • One year later, you want to re-run the analysis with new data.

  • One year later, you want to slightly modify the analysis.

  • You are collaborating with someone else.

The three flavours of reproducibility

  • Repeatability

  • Reproducibility

  • Replicability

Repeatability / Reproducibility / Replicability

Repeatability (Same team, same experimental setup): The measurement can be obtained with stated precision by the same team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same location on multiple trials. For computational experiments, this means that a researcher can reliably repeat her own computation.

Source: https://www.acm.org/publications/policies/artifact-review-and-badging-current

Definitions of the ACM (Association of Computing Machinery)

Repeatability / Reproducibility / Replicability

Reproducibility (Different team, same experimental setup): The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using the author’s own artifacts.

Source: https://www.acm.org/publications/policies/artifact-review-and-badging-current

Definitions of the ACM (Association of Computing Machinery)

Repeatability / Reproducibility / Replicability

Replicability (Different team, different experimental setup): The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently.

Source: https://www.acm.org/publications/policies/artifact-review-and-badging-current

Definitions of the ACM (Association of Computing Machinery)

Repeatability / Reproducibility / Replicability

  • Repeatability: same team, same setup

  • Reproducibility: different team, same setup

  • Replicability: different team, different setup

You want your results to be at least reproducible.

Attention: if you are not careful enough,
you may have something that is
not even repeatable!

It is important to aim at replicability
from the very beginning.
It cannot be an afterthought!

How to achieve the 3 Rs?

  • Take notes!

  • Save all the information about your inputs

  • Save all the processing you do on your data

  • Snapshot your programming environment

  • Keep everything under version control

Some practical information

About me

  • Ozan Kahramanoğulları
  • email: okahramanogullari@unibz.it
  • office: B1.5.22 (NOI)

About the course

  • Lectures
    • Monday: 10:00 - 12:00
    • Thursday: 08:00 - 10:00
  • Lab
    • Thursday: 15:30 - 17:30
  • Office hours: schedule by email
    • Wednesday: 14:00 - 16:00

About the exam

You should complete at least 60% of your weekly assignments.

Your final grade is:

About the material

Some questions for you

  • Which programming languages do you know?

  • Which version control systems do you know?

  • Do you have prior experience with visualization?