Layered Grammar of Graphics

Data Visualization and Exploration

Ozan Kahramanoğulları

Why a grammar?

A grammar

  • expresses the fundamental principles of rules of an art or science.

  • provides structural insight into complicated graphics.

  • makes available more flexibility and expressiveness in creation of graphics.

  • provides a consistent framework and guidelines to think about graphics.

  • constitutes boundaries by principled rules rather than an API.

The components

  • data

  • aesthetic mapping

  • geometric objects

  • scales

  • statistical transformations

  • position adjustments

  • facet specification

  • coordinate system

Data

Data

  • This is the most fundamental part: all other components depend on it.

  • In our discussion, we assume we are dealing with tidy data:

    • Variables

    • Observations

    • Values

Aesthetic mappings

An aesthetic is a visual property of the objects in your plot.

Aesthetic mappings

An aesthetic is a visual property of the objects in your plot.

Examples:

  • Position on the x, y plane

  • Colour

  • Shape

  • Size

Geometric objects

A geom is the geometrical object that a plot uses to represent data.

Geometric objects

A geom is the geometrical object that a plot uses to represent data.

Examples:

  • Points

  • Lines

  • Bars

  • Polygons


The aesthetic mapping
associates variables in the data
with visual properties of geometric objects.

Scales

A scale controls the mapping from data values to aesthetic values.

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80
scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
palette <- tibble(color = qualitative_hcl(3)) %>%
  mutate(x=rank(color), y=0)
p1 <- ggplot(palette, aes(fill=color, x=x, y=y)) +
  geom_tile(width=.9) +
  scale_fill_identity() +
  labs(title="scale") +
  theme_void()
grid.arrange(p1)

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80

scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
palette <- tibble(color = qualitative_hcl(3)) %>%
  mutate(x=rank(color), y=0)
p1 <- ggplot(palette, aes(fill=color, x=x, y=y)) +
  geom_tile(width=.9) +
  scale_fill_identity() +
  labs(title="scale") +
  theme_void()
grid.arrange(p1)

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80

scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
palette <- tibble(color = qualitative_hcl(3)) %>%
  mutate(x=rank(color), y=0)
p1 <- ggplot(palette, aes(fill=color, x=x, y=y)) +
  geom_tile(width=.9) +
  scale_fill_identity() +
  labs(title="scale") +
  theme_void()
p2 <- scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=idx, fill=category, y=yidx)) +
    geom_tile(color='black') 
grid.arrange(p1, p2)

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80

scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
palette <- tibble(color = qualitative_hcl(3)) %>%
  mutate(x=rank(color), y=0)
p1 <- ggplot(palette, aes(fill=color, x=x, y=y)) +
  geom_tile(width=.9) +
  scale_fill_identity() +
  labs(title="scale") +
  theme_void()
p2 <- scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=idx, fill=category, y=yidx)) +
    geom_tile(color='black') +
    scale_fill_discrete_qualitative() 
grid.arrange(p1, p2)

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80

scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
palette <- tibble(color = qualitative_hcl(3)) %>%
  mutate(x=rank(color), y=0)
p1 <- ggplot(palette, aes(fill=color, x=x, y=y)) +
  geom_tile(width=.9) +
  scale_fill_identity() +
  labs(title="scale") +
  theme_void()
p2 <- scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=idx, fill=category, y=yidx)) +
    geom_tile(color='black') +
    scale_fill_discrete_qualitative() +
    theme_void()
grid.arrange(p1, p2)

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80

scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
palette <- tibble(color = qualitative_hcl(3)) %>%
  mutate(x=rank(color), y=0)
p1 <- ggplot(palette, aes(fill=color, x=x, y=y)) +
  geom_tile(width=.9) +
  scale_fill_identity() +
  labs(title="scale") +
  theme_void()
p2 <- scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=idx, fill=category, y=yidx)) +
    geom_tile(color='black') +
    scale_fill_discrete_qualitative() +
    theme_void() +
    theme(axis.text.x = element_text())
grid.arrange(p1, p2)

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, y=yidx)) +
    geom_point(color='black') 

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, y=yidx)) +
    geom_point(color='black') 

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, y=yidx)) +
    geom_point(color='black') +
    geom_label_repel(aes(label=idx)) 

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, y=yidx)) +
    geom_point(color='black') +
    geom_label_repel(aes(label=idx)) +
    scale_y_continuous(limits = c(0, 0.01)) 

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, y=yidx)) +
    geom_point(color='black') +
    geom_label_repel(aes(label=idx)) +
    scale_y_continuous(limits = c(0, 0.01)) +
    scale_x_continuous(breaks = c(0,250,500,750,1000)) 

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, y=yidx)) +
    geom_point(color='black') +
    geom_label_repel(aes(label=idx)) +
    scale_y_continuous(limits = c(0, 0.01)) +
    scale_x_continuous(breaks = c(0,250,500,750,1000)) +
    scale_fill_discrete_qualitative() 

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, y=yidx)) +
    geom_point(color='black') +
    geom_label_repel(aes(label=idx)) +
    scale_y_continuous(limits = c(0, 0.01)) +
    scale_x_continuous(breaks = c(0,250,500,750,1000)) +
    scale_fill_discrete_qualitative() +
    theme_void() 

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, y=yidx)) +
    geom_point(color='black') +
    geom_label_repel(aes(label=idx)) +
    scale_y_continuous(limits = c(0, 0.01)) +
    scale_x_continuous(breaks = c(0,250,500,750,1000)) +
    scale_fill_discrete_qualitative() +
    theme_void() +
    theme(axis.text.x = element_text(),
          axis.line.x.bottom = element_line())

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, color=category, y=yidx)) +
    geom_point(color="black") +
    geom_label_repel(aes(label=idx)) +
    scale_y_continuous(limits = c(0, 0.01)) +
    scale_x_continuous(breaks = c(0,250,500,750,1000)) +
    scale_fill_discrete_qualitative() +
    theme_void() +
    theme(axis.text.x = element_text(),
          axis.line.x.bottom = element_line())

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, color=category, y=yidx)) +
    geom_point() +
    geom_label_repel(aes(label=idx)) +
    scale_y_continuous(limits = c(0, 0.01)) +
    scale_x_continuous(breaks = c(0,250,500,750,1000)) +
    scale_fill_discrete_qualitative() +
    theme_void() +
    theme(axis.text.x = element_text(),
          axis.line.x.bottom = element_line())

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, color=category, y=yidx)) +
    geom_point() +
    geom_label_repel(aes(label=idx)) +
    scale_y_continuous(limits = c(0, 0.01)) +
    scale_x_continuous(breaks = c(0,250,500,750,1000)) +
    scale_fill_discrete_qualitative() +
    theme_void() +
    theme(axis.text.x = element_text(),
          axis.line.x.bottom = element_line(),
          legend.position = 'top')

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, color=category, y=yidx)) +
    geom_point() +
    geom_label_repel(aes(label=idx)) +
    scale_y_continuous(limits = c(0, 0.01)) +
    scale_x_continuous(breaks = c(0,250,500,750,1000)) +
    scale_color_discrete_qualitative() +
    theme_void() +
    theme(axis.text.x = element_text(),
          axis.line.x.bottom = element_line(),
          legend.position = 'top')

Scales

A scale controls the mapping from data values to aesthetic values.

idx category price
1 shoes 100
2 shoes 70
3 computers 1000
4 trousers 80


scale_example <- tribble(
  ~idx, ~category, ~price,
  1, "shoes", 100,
  2, "shoes", 70,
  3, "computers", 1000,
  4, "trousers", 80
)
scale_example %>%
  mutate(yidx = 0) %>%
  ggplot(aes(x=price, color=category, y=yidx)) +
    geom_point() +
    geom_label_repel(aes(label=idx)) +
    scale_y_continuous(limits = c(0, 0.01)) +
    scale_x_log10(breaks = c(0,250,500,750,1000)) +
    scale_color_discrete_qualitative() +
    theme_void() +
    theme(axis.text.x = element_text(),
          axis.line.x.bottom = element_line(),
          legend.position = 'top')

Statistical transformations

Transforms the data, typically by summarizing it.

Statistical transformations

Transforms the data, typically by summarizing it.

Examples:

  • Identity
  • Binning
  • Smoothing
  • Quantile computation
  • Conditional statistics
  • Density estimation

Statistical transformations

Summarization

summary_example <- tibble(y = rnorm(100))
p1 <- summary_example %>%
  mutate(x='a') %>%
  ggplot(aes(x=x, y=y)) +
    geom_point(size=0.1) +
    scale_y_continuous(limits = c(-2,2)) +
    theme_void() +
    theme(axis.line.y.left = element_line(),
          axis.text.y = element_text())
p1

p2 <- summary_example %>%
  mutate(x='a') %>%
  ggplot(aes(x=x, y=y)) +
    geom_pointrange(stat='summary',color="red") +
    scale_y_continuous(limits = c(-2,2)) +
    theme_void() +
    theme(axis.line.y.left = element_line(),
          axis.text.y = element_text())
p2

Statistical transformations

Binning

summary_example <- tibble(y = rnorm(100))
summary_example %>%
  mutate(x='a') %>%
  ggplot(aes(x=y)) +
    geom_histogram(bins=30, 
                   color='gray', 
                   fill='lightgray') +
    geom_rug() +
    theme_void() +
    theme(
      axis.line.x.bottom = element_line(),
      axis.text.x = element_text())

Position adjustments

Position adjustments

Adjustment of the position of graphical objects to avoid overplotting.

Examples:

  • random jittering
  • dodging
  • stacking

Layers

Layers

The combination of

  • data
  • aesthetic mapping
  • geometric object
  • statistical transformation
  • position adjustments

You can stack several layers on top of each other.

Facets

Facet specification

Create multiple plots with the same layers, each on a different subset of data

var1 var2 x y
a Z 2 1.0
a Z 3 1.2
b Z 2 3.0
b Z 2 1.0
a W 4 1.0
a W 3 2.0
b W 2 3.0
b W 3 1.0
ggplot(example_facet, aes(x=x, y=y)) +
  geom_point() +
  facet_grid(var1 ~ var2) +
  theme_bw()

Coordinates

Coordinate system

Maps the position of objects onto the plane of the plot.

coord_data <- tibble(x = rnorm(10, mean=100), y = rnorm(10, mean=100))
ggplot(coord_data, aes(x=x, y=y)) +
  geom_point() +
  coord_polar() +
  theme_bw()

ggplot(coord_data, aes(x=x, y=y)) +
  geom_point() +
  coord_cartesian() +
  theme_bw()

GGplot’s grammar

GGplot building blocks

  • aestetic mappings: aes
  • geometric objects: geom_*
  • scales: scale_*
  • statistical transformations: stat_*
  • facet specification: facet_*
  • coordinate system: coord_*

Example data

In the following we will use the gapminder dataset.

library(gapminder)
gapminder
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

Defining a layer

ggplot() +
  layer(
    data = gapminder,
    mapping = aes(x=gdpPercap, y=lifeExp),
    geom = 'point',
    stat = 'identity',
    position = 'identity'
  ) +
  scale_x_log10()

Why is the scale outside the layer definition?

Multiple layers

ggplot() +
  layer(
    data = filter(gapminder, year == 1952),
    mapping = aes(x=gdpPercap, y=lifeExp, 
                  color=factor(year)),
    geom = 'point',
    stat = 'identity',
    position = 'identity'
  ) +
  layer(
    data = filter(gapminder, year == 2007),
    mapping = aes(x=gdpPercap, y=lifeExp, 
                  color=factor(year)),
    geom = 'point',
    stat = 'identity',
    position = 'identity'
  ) +
  scale_x_log10()

Defining a layer

ggplot() +
  layer(
    data = gapminder,
    mapping = aes(x=gdpPercap, y=lifeExp),
    geom = 'point', stat = 'identity',
    position = 'identity'
  ) +
  layer(
    data = gapminder,
    mapping = aes(x=gdpPercap, y=lifeExp),
    geom = 'line', stat = 'smooth',
    position = 'identity',
    params = list(
      method = 'gam',
      color = 'blue',
      size = 1
    )
  ) + scale_x_log10()

Using default values

Oftentimes data and aesthetic mapping are shared across all layers.

In such cases, we can provide the “default” data in the ggplot function.

Using specialized functions like geom_* or stat_*, we can use the default values in all the other components of a layer.

ggplot() +
  layer(
    data = gapminder,
    mapping = aes(x=gdpPercap, y=lifeExp),
    geom = 'point',
    stat = 'identity',
    position = 'identity'
  )
ggplot(data = gapminder, 
       mapping = aes(x=gdpPercap, y=lifeExp)) +
  geom_point()

Using default values

Each geom has a default stat, each stat has a default geom

ggplot() +
  layer(
    data = gapminder,
    mapping = aes(x=gdpPercap, y=lifeExp),
    geom = 'line',
    stat = 'smooth',
    position = 'identity',
    params = list(
      method = 'gam',
      color = 'blue',
      linewidth = 1
    )
  )
ggplot(data = gapminder, 
       mapping = aes(x=gdpPercap, y=lifeExp)) +
  stat_smooth(se=F)

A tour of geometric objects

The humble point

ggplot(gapminder, 
       aes(x=gdpPercap, y=lifeExp, 
           color=continent, size=pop)) +
  geom_point()

The line

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(pop = sum(as.numeric(pop))) %>%
    
  ggplot(aes(x=year, y=pop, 
           color=continent,
           linetype=continent)) +
      geom_line()

The line

gapminder %>% 
  drop_na(pop) 
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows
gapminder 
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

The line

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year)
# A tibble: 1,704 × 6
# Groups:   continent, year [60]
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

The line

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(pop = sum(as.numeric(pop))) 
# A tibble: 60 × 3
# Groups:   continent [5]
   continent  year       pop
   <fct>     <int>     <dbl>
 1 Africa     1952 237640501
 2 Africa     1957 264837738
 3 Africa     1962 296516865
 4 Africa     1967 335289489
 5 Africa     1972 379879541
 6 Africa     1977 433061021
 7 Africa     1982 499348587
 8 Africa     1987 574834110
 9 Africa     1992 659081517
10 Africa     1997 743832984
# ℹ 50 more rows

The line

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(pop = sum(as.numeric(pop))) %>%
    
  ggplot(aes(x=year, y=pop)) +
      geom_line()

The line

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(pop = sum(as.numeric(pop))) %>%
    
  ggplot(aes(x=year, y=pop, 
           color=continent)) +
      geom_line()

The line

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(pop = sum(as.numeric(pop))) %>%
    
  ggplot(aes(x=year, y=pop, 
           color=continent,
           linetype=continent)) +
      geom_line()

The bar/column

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(pop = sum(as.numeric(pop))) %>%
  filter(year == 2007) %>%
  
  ggplot(aes(x=continent, y=pop, 
           color=continent)) +
      geom_col()

The bar/column

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(pop = sum(as.numeric(pop))) 
# A tibble: 60 × 3
# Groups:   continent [5]
   continent  year       pop
   <fct>     <int>     <dbl>
 1 Africa     1952 237640501
 2 Africa     1957 264837738
 3 Africa     1962 296516865
 4 Africa     1967 335289489
 5 Africa     1972 379879541
 6 Africa     1977 433061021
 7 Africa     1982 499348587
 8 Africa     1987 574834110
 9 Africa     1992 659081517
10 Africa     1997 743832984
# ℹ 50 more rows

The bar/column

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(pop = sum(as.numeric(pop))) %>%
  filter(year == 2007) 
# A tibble: 5 × 3
# Groups:   continent [5]
  continent  year        pop
  <fct>     <int>      <dbl>
1 Africa     2007  929539692
2 Americas   2007  898871184
3 Asia       2007 3811953827
4 Europe     2007  586098529
5 Oceania    2007   24549947

The bar/column

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(pop = sum(as.numeric(pop))) %>%
  filter(year == 2007) %>%
  
  ggplot(aes(x=continent, y=pop)) +
      geom_col()

The bar/column

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(pop = sum(as.numeric(pop))) %>%
  filter(year == 2007) %>%
  
  ggplot(aes(x=continent, y=pop, 
           color=continent)) +
      geom_col()

The bar/column (!!)

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(pop = sum(as.numeric(pop))) %>%
  filter(year == 2007) %>%
  
  ggplot(aes(x=continent, y=pop, 
           fill=continent)) +
      geom_col()

Ribbon

gapminder %>% 
  drop_na(gdpPercap) %>% 
  group_by(year, continent) %>%
  summarise(
    ymax = max(gdpPercap),
    ymin = min(gdpPercap)
  ) %>%
  filter(continent == "Europe") %>%
  
  ggplot(aes(x=year, 
           ymax=ymax, 
           ymin=ymin, 
           fill=continent)) +
    geom_ribbon()

Ribbon

gapminder %>% 
  drop_na(gdpPercap) 
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows
gapminder
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

Ribbon

gapminder %>% 
  drop_na(gdpPercap) %>% 
  group_by(year, continent)
# A tibble: 1,704 × 6
# Groups:   year, continent [60]
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

Ribbon

gapminder %>% 
  drop_na(gdpPercap) %>% 
  group_by(year, continent) %>%
  summarise(
    ymax = max(gdpPercap),
    ymin = min(gdpPercap)
  ) 
# A tibble: 60 × 4
# Groups:   year [12]
    year continent    ymax   ymin
   <int> <fct>       <dbl>  <dbl>
 1  1952 Africa      4725.   299.
 2  1952 Americas   13990.  1398.
 3  1952 Asia      108382.   331 
 4  1952 Europe     14734.   974.
 5  1952 Oceania    10557. 10040.
 6  1957 Africa      5487.   336.
 7  1957 Americas   14847.  1544.
 8  1957 Asia      113523.   350 
 9  1957 Europe     17909.  1354.
10  1957 Oceania    12247. 10950.
# ℹ 50 more rows

Ribbon

gapminder %>% 
  drop_na(gdpPercap) %>% 
  group_by(year, continent) %>%
  summarise(
    ymax = max(gdpPercap),
    ymin = min(gdpPercap)
  ) %>%
  filter(continent == "Europe")
# A tibble: 12 × 4
# Groups:   year [12]
    year continent   ymax  ymin
   <int> <fct>      <dbl> <dbl>
 1  1952 Europe    14734.  974.
 2  1957 Europe    17909. 1354.
 3  1962 Europe    20431. 1710.
 4  1967 Europe    22966. 2172.
 5  1972 Europe    27195. 2860.
 6  1977 Europe    26982. 3528.
 7  1982 Europe    28398. 3631.
 8  1987 Europe    31541. 3739.
 9  1992 Europe    33966. 2497.
10  1997 Europe    41283. 3193.
11  2002 Europe    44684. 4604.
12  2007 Europe    49357. 5937.

Ribbon

gapminder %>% 
  drop_na(gdpPercap) %>% 
  group_by(year, continent) %>%
  summarise(
    ymax = max(gdpPercap),
    ymin = min(gdpPercap)
  ) %>%
  filter(continent == "Europe") %>%
  
  ggplot(aes(x=year, 
           ymax=ymax, 
           ymin=ymin, 
           fill=continent)) 

Ribbon

gapminder %>% 
  drop_na(gdpPercap) %>% 
  group_by(year, continent) %>%
  summarise(
    ymax = max(gdpPercap),
    ymin = min(gdpPercap)
  ) %>%
  filter(continent == "Europe") %>%
  
  ggplot(aes(x=year, 
           ymax=ymax, 
           ymin=ymin, 
           fill=continent)) +
    geom_ribbon()

Ribbon

gapminder %>% 
  drop_na(gdpPercap) %>% 
  group_by(year, continent) %>%
  summarise(
    ymax = max(gdpPercap),
    ymin = min(gdpPercap),
    yavg = mean(gdpPercap)
  ) %>%
  filter(continent == "Europe") %>%
  
  ggplot(aes(x=year,
           ymax=ymax, 
           y=yavg, 
           ymin=ymin, 
           fill=continent)) +
    geom_ribbon() +
    geom_line(color='black')

Segments

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(minGdp = min(gdpPercap),
            maxGdp = max(gdpPercap)) %>%
  filter(year == 2007) %>%
  
  ggplot(aes(x=continent, y=minGdp, 
           xend=continent, yend=maxGdp,
           color=continent)) +
    geom_segment()

Segments

gapminder %>% 
  drop_na(pop) 
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows
gapminder
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

Segments

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year)
# A tibble: 1,704 × 6
# Groups:   continent, year [60]
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

Segments

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(minGdp = min(gdpPercap),
            maxGdp = max(gdpPercap)) 
# A tibble: 60 × 4
# Groups:   continent [5]
   continent  year minGdp maxGdp
   <fct>     <int>  <dbl>  <dbl>
 1 Africa     1952   299.  4725.
 2 Africa     1957   336.  5487.
 3 Africa     1962   355.  6757.
 4 Africa     1967   413. 18773.
 5 Africa     1972   464. 21011.
 6 Africa     1977   502. 21951.
 7 Africa     1982   462. 17364.
 8 Africa     1987   390. 11864.
 9 Africa     1992   411. 13522.
10 Africa     1997   312. 14723.
# ℹ 50 more rows

Segments

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(minGdp = min(gdpPercap),
            maxGdp = max(gdpPercap)) %>%
  filter(year == 2007) 
# A tibble: 5 × 4
# Groups:   continent [5]
  continent  year minGdp maxGdp
  <fct>     <int>  <dbl>  <dbl>
1 Africa     2007   278. 13206.
2 Americas   2007  1202. 42952.
3 Asia       2007   944  47307.
4 Europe     2007  5937. 49357.
5 Oceania    2007 25185. 34435.

Segments

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(minGdp = min(gdpPercap),
            maxGdp = max(gdpPercap)) %>%
  filter(year == 2007) %>%
  
  ggplot(aes(x=continent, y=minGdp, 
           xend=continent, yend=maxGdp,
           color=continent)) +
    geom_segment()

Segments and points

gapminder %>% 
  drop_na(pop) %>% 
  group_by(continent, year) %>% 
  summarise(minGdp = min(gdpPercap),
           maxGdp = max(gdpPercap)) %>%
  filter(year == 2007) %>%
  
  ggplot(aes(x=continent, y=minGdp, 
           xend=continent, yend=maxGdp,
           color=continent)) +
    geom_segment() +
    geom_point(mapping = aes(y=minGdp),
             size=3) +
    geom_point(mapping = aes(y=maxGdp),
             size=3)

Statistical transformations

Sometimes it is easier to express a layer in terms of a statistical transformation.

Summaries

gapminder %>% filter(year==2007) %>%
ggplot(aes(x=continent, y=gdpPercap)) +
  stat_summary()

Summaries

gapminder %>% filter(year==2007) %>%
ggplot(aes(x=continent, y=gdpPercap)) +
  stat_summary()

gapminder %>% filter(year==2007) %>%
ggplot(aes(x=continent, y=gdpPercap)) +
  geom_point()

Statistical transformations

Usually a stat introduces new variables that can be mapped to aesthetics.

To know which ones, look at the help pages.

For instance, stat_summary introduces

  • ymin
  • ymax
  • y (overwrites)

Ribbon, revisited

gapminder %>% drop_na(gdpPercap) %>% 
  filter(continent == "Europe") %>%
  ggplot(aes(x=year, 
           y=gdpPercap, 
           fill=continent)) +
  stat_summary(geom='ribbon',
               fun.max = max,
               fun.min = min,
               alpha=0.2) 

Ribbon, revisited

gapminder %>% drop_na(gdpPercap) %>% 
  filter(continent == "Europe") %>%
  ggplot(aes(x=year, 
           y=gdpPercap, 
           fill=continent)) +
  stat_summary(geom='ribbon',
               fun.max = max,
               fun.min = min,
               alpha=0.2) +
  geom_line(stat='summary')

Ribbon, revisited

gapminder %>% drop_na(gdpPercap) %>% 
  filter(continent == "Europe") %>%
  ggplot(aes(x=year, 
           y=gdpPercap, 
           fill=continent)) +
  stat_summary(geom='ribbon',
               fun.max = max,
               fun.min = min,
               alpha=0.2) +
  stat_summary(geom='ribbon',
               alpha=0.7) +
  geom_line(stat='summary')

The group aesthetic

By default, the group is set to the interaction of all discrete variables in the plot.

For most applications you can simply specify the grouping with various aesthetics,

that is, colour, shape, fill, linetype, as well as with facets.

The group aesthetic

ggplot(gapminder,
       aes(x=year, y=gdpPercap,
           group=continent)) +
  stat_summary()

The group aesthetic

ggplot(gapminder,
       aes(x=year, y=gdpPercap,
           color=continent)) +
  stat_summary()

The group aesthetic

Let’s first assign the ribbon plot we created before to a variable

europe_gdp <- gapminder %>% 
  drop_na(gdpPercap) %>% 
  filter(continent == "Europe")
  
ribbon_plot <- ggplot(
  data=europe_gdp, 
  aes(x=year, 
      y=gdpPercap, 
      fill=continent)) +
  stat_summary(geom='ribbon',
               alpha=0.7) +
  stat_summary(geom='ribbon',
               fun.max = max,
               fun.min = min,
               alpha=0.2) +
  geom_line(stat='summary')

The group aesthetic

Let’s add a line for each country.

ribbon_plot + 
  geom_line(
    size=.3
  )

The group aesthetic

Let’s add a line for each country.

ribbon_plot + 
  geom_line(
    mapping=aes(group=country),
    size=.3
  )

The group aesthetic


ggplot(
  data=europe_gdp, 
  aes(x=year, 
      y=gdpPercap, 
      fill=continent)) +
  stat_summary(geom='ribbon',
               alpha=0.7) +
  stat_summary(geom='ribbon',
               fun.max = max,
               fun.min = min,
               alpha=0.2) + 
  geom_line(
         mapping=aes(group=country),
         size=.3
       ) +
  geom_line(stat='summary')

The group aesthetic


ggplot(
  data=europe_gdp, 
  aes(x=year, 
      y=gdpPercap, 
      fill=continent)) +
  stat_summary(geom='ribbon',
               alpha=0.9) +
  stat_summary(geom='ribbon',
               fun.max = max,
               fun.min = min,
               alpha=0.5) +
  geom_line(
         mapping=aes(group=country),
         size=.2
       ) +
  geom_line(stat='summary',
            size=1,
            color='black')

Position adjustments

Position adjustments

Sometimes you need to adjust the position of the plot elements

  • dodging
  • jittering
  • stacking

Position adjustments: dodging

gapminder %>% 
  filter(year > 1990) %>% 
    ggplot(aes(x=year, 
           y=pop, 
           fill=continent)) +
      geom_col()

gapminder %>% 
  filter(year > 1990) %>% 
    ggplot(aes(x=year, 
           y=pop, 
           fill=continent)) +
      geom_col(position='dodge')

Position adjustments: stacking

gapminder %>% 
  filter(year > 1990) %>% 
    ggplot(aes(x=year, 
           y=pop, 
           fill=continent)) +
      geom_col()

gapminder %>% 
  filter(year > 1990) %>% 
    ggplot(aes(x=year, 
           y=pop, 
           fill=continent)) +
      geom_col(position='stack')

Position adjustments: filling

gapminder %>% 
  filter(year > 1990) %>% 
    ggplot(aes(x=year, 
           y=pop, 
           fill=continent)) +
      geom_col(position='fill')

Position jitter


gapminder %>% 
  filter(year == 2007) %>% 
    ggplot(aes(x=continent, 
           y=gdpPercap, 
           color=continent)) +
      geom_point()

Position jitter


gapminder %>% 
  filter(year == 2007) %>% 
    ggplot(aes(x=continent, 
           y=gdpPercap, 
           color=continent)) +
      geom_point(
        position='jitter'
      )

Position jitter


gapminder %>% 
  filter(year == 2007) %>% 
    ggplot(aes(x=continent, 
           y=gdpPercap, 
           color=continent)) +
    geom_point(
      position = position_jitter(.2)
    )

Scales

Scales

Scales are functions that map from data values to aesthetic values.

  • Scales are invertible functions.

Examples:

  • Map data values to pixel positions, and back
  • Map data values to colors, and back
  • Map data values to shapes, and back…

Scales

Scales

Scales are usually linear, but not necessarily.

In some cases we can apply a non-linear transformation to improve readability.

 [1] "10"             "100"            "1 000"          "10 000"        
 [5] "100 000"        "1 000 000"      "10 000 000"     "100 000 000"   
 [9] "1 000 000 000"  "10 000 000 000"

Linear scale (default):

Logarithmic scale:

In a logarithmic scale, multiples are equally spaced.

We can use them to display data that spans a very wide range, in an unequal way.

Scales

ggplot(gapminder,
       aes(x=gdpPercap, 
           y=lifeExp, 
           color=continent)) +
  geom_point() +
  scale_x_log10() +
  scale_y_continuous()

Faceting

Faceting

ggplot(gapminder, aes(x=year, y=gdpPercap, fill=continent)) +
  stat_summary(geom='ribbon',
               alpha=0.6) +
  facet_wrap(vars(continent))

Faceting

gapminder %>% 
  mutate(decade = factor(floor(year / 10)*10)) %>% 
ggplot(aes(x=gdpPercap, y=lifeExp, color=continent)) +
  geom_point() +
  scale_x_log10(labels=scales::dollar) +
  facet_grid(rows = vars(continent),
             cols = vars(decade))

Coordinates

Playing with coordinates

continent_population <- gapminder %>%
  filter(year == 2002) %>% 
  drop_na(pop) %>%
  mutate(pop = as.numeric(pop)) %>%
  group_by(continent) %>% 
  summarise(pop = sum(pop))

ggplot(continent_population, 
       aes(x=continent, 
           y=pop, 
           fill=continent)) +
  geom_col() +
  coord_cartesian()

Playing with coordinates

ggplot(continent_population, 
       aes(x=continent, 
           y=pop, 
           fill=continent)) +
  geom_col() +
  coord_flip()

Playing with coordinates

ggplot(continent_population, 
       aes(x=continent, 
           y=pop, 
           fill=continent)) +
  geom_col(width=1) +
  coord_polar()

Playing with coordinates

ggplot(continent_population, 
       aes(x="", 
           y=pop,
           fill=continent)) +
  geom_col() +
  coord_polar(theta="y")