Plotting basics

Data Visualization and Exploration

Ozan Kahramanoğulları

GGplot2

  • GGplot2 is a tidyverse library for plotting.

  • It builds on top of a “grammar of graphics”.

  • Makes building plots modular.

The ggplot workflow

The ggplot workflow

The ggplot workflow

The ggplot workflow

The ggplot workflow

The ggplot workflow

Building plots incrementally

renv::install("gapminder")
library(gapminder)
gapminder
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

Building plots incrementally

p <- ggplot(data = gapminder, 
            mapping = aes(
              x = gdpPercap, 
              y = lifeExp))
p

Building plots incrementally

p <- ggplot(data = gapminder, 
            mapping = aes(
              x = gdpPercap, 
              y = lifeExp))
p

Building plots incrementally

p <- ggplot(data = gapminder, 
            mapping = aes(
              x = gdpPercap, 
              y = lifeExp))
p
gapminder %>%
  select(gdpPercap, lifeExp) %>%
  summarise(
    minGdp = min(gdpPercap),
    maxGdp = max(gdpPercap),
    minLifeExp = min(lifeExp),
    maxLifeExp = max(lifeExp)
  )
# A tibble: 1 × 4
  minGdp  maxGdp minLifeExp maxLifeExp
   <dbl>   <dbl>      <dbl>      <dbl>
1   241. 113523.       23.6       82.6

Building plots incrementally

p + geom_point()

Building plots incrementally

p + geom_point()

Building plots incrementally

p + geom_smooth()

Building plots incrementally

p + geom_smooth()

Stacking geoms

p + geom_point() + geom_smooth()

Stacking geoms

p + geom_point() + geom_smooth()

Stacking geoms

What happens if we swap the order of two geoms?

p + geom_smooth() + geom_point()

Stacking geoms

What happens if we swap the order of two geoms?

p + geom_smooth() + geom_point()

Beware of line breaks!

This works.

p + geom_smooth() + geom_point()

Also, this.

p + geom_smooth() + 
  geom_point()

This doesn’t.

p + geom_point()  
  + geom_smooth()

Building plots incrementally

p + geom_point() + geom_smooth(method = "lm")

Building plots incrementally

p + geom_point() + geom_smooth(method = "lm")

Playing with scales

p + geom_point() + 
    geom_smooth(method = "gam")

Playing with scales

p + geom_point() + 
    geom_smooth(method = "gam")

Playing with scales

p + geom_point() + 
    geom_smooth(method = "gam") + 
    scale_x_log10()

Playing with scales

p + geom_point() + 
    geom_smooth(method = "gam") + 
    scale_x_log10()


Playing with scales

p + geom_point() + 
    geom_smooth(method = "gam") + 
    scale_x_log10()

ggplot applies the scale transformations before fitting the model line.

Changing labels

p + geom_point() + 
    geom_smooth(method = "gam") + 
    scale_x_log10(labels = scales::dollar)

Changing labels

p + geom_point() + 
    geom_smooth(method = "gam") + 
    scale_x_log10(labels = scales::dollar)

Setting labels

p <- ggplot(data = gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp))
p + geom_point() + 
    geom_smooth(method = "gam") + 
    scale_x_log10(labels = scales::dollar) +
    labs(
      x = "GDP per capita",
      y = "Life Expectancy in Years",
      title = "Economic growth and life expectancy",
      caption = "Source: Gapminder."
    )

Setting labels

p <- ggplot(data = gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp))
p + geom_point() + 
    geom_smooth(method = "gam") + 
    scale_x_log10(labels = scales::dollar) +
    labs(
      x = "GDP per capita",
      y = "Life Expectancy in Years",
      title = "Economic growth and life expectancy",
      caption = "Source: Gapminder."
    )

Finishing up?

p <- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap, 
                          y = lifeExp))
p + geom_point() + 
    geom_smooth(method = "gam") + 
    scale_x_log10(labels = scales::dollar) +
    labs(
      x = "GDP per capita",
      y = "Life Expectancy in Years",
      title = "Economic growth and life expectancy",
      caption = "Source: Gapminder."
    ) +
    theme_bw()

A finished plot?

Look again at this picture: can we do better?

Looking at the dataset, which information are we ignoring?

What about colors?

p <- ggplot(data = gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp, 
                          color=continent))
p + geom_point() + 
    geom_smooth(method = "gam") + 
    scale_x_log10(labels = scales::dollar)


What about colors?

p <- ggplot(data = gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp, 
                          color=continent))
p + geom_point() + 
    geom_smooth(method = "gam") + 
    scale_x_log10(labels = scales::dollar)


What about colors?

p <- ggplot(data = gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp, 
                          color=continent))
p + geom_point() + 
    geom_smooth(method = "gam") + 
    scale_x_log10(labels = scales::dollar)

That’s quite a mess!

Changing aesthetics for single geoms

p <- ggplot(data = gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp, 
                          color=continent))
p + geom_point(size = .01) + 
    geom_smooth(method = "gam") + 
    scale_x_log10(labels = scales::dollar)


Changing aesthetics for single geoms

p <- ggplot(data = gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp))
p + geom_point(size = .01, color="gray") + 
    geom_smooth(mapping = aes(color = continent),
                method = "gam") + 
    scale_x_log10(labels = scales::dollar)


Changing aesthetics for single geoms

p <- ggplot(data = gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp))
p + geom_point(mapping = aes(color = "gray"), size = .01) + 
    geom_smooth(mapping = aes(color = continent),
                method = "gam") + 
    scale_x_log10(labels = scales::dollar)

What is hapenning here?

Changing aesthetics for different geoms

p <- ggplot(data = gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp))
p + geom_point(mapping = aes(color = continent), size = .01) +
    geom_smooth(method = "gam", color = "black") + 
    scale_x_log10(labels = scales::dollar)

Maybe in this case it’s better to have a global smoothing line.

Combining with dplyr

filtered_gapminder <- gapminder %>% filter(year == 2007)
p <- ggplot(data = filtered_gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp,
                          color = continent))
p + geom_point() + 
    scale_x_log10(labels = scales::dollar)


Combining with dplyr

filtered_gapminder <- gapminder %>% filter(year == 2007)
p <- ggplot(data = filtered_gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp,
                          color = continent))
p + geom_point() + 
    scale_x_log10(labels = scales::dollar,
                  breaks = c(300, 3000, 30000))

Faceting

filtered_gapminder <- gapminder %>% 
  filter(year == 2007)
p <- ggplot(data = filtered_gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp,
                          color = continent))
p + geom_point() + 
    scale_x_log10(labels = scales::dollar,
                  breaks = c(300,3000,30000)) +
    facet_wrap(vars(continent))


Faceting

Wrap up

What happens if you map year to color?

p <- ggplot(gapminder, 
            aes(x=gdpPercap, 
                y=lifeExp,
                color=year))
p + geom_point() +
    scale_x_log10()

p <- ggplot(gapminder, 
            aes(x=gdpPercap, 
                y=lifeExp,
                color=factor(year)))
p + geom_point() +
    scale_x_log10()

Wrap up

p <- ggplot(data = gapminder, 
            mapping = aes(x = gdpPercap, 
                          y = lifeExp,
                          color = continent,
                          fill = continent))
p + geom_point(color="gray") + 
    geom_smooth(method = "gam") + 
    scale_x_log10(labels = scales::dollar) +
    labs(
      x = "GDP per capita",
      y = "Life Expectancy in Years",
      title = "Economic growth and life expectancy",
      caption = "Source: Gapminder."
    ) +
    theme_bw()

Wrap up

Look closely at the legend. How is it related to the geoms you use?

Saving your plots

  • You can save your plots within Quarto documents.

  • You can export them to an external file.

Including in Quarto documents

You can use the execution options to set some parameters.

See https://quarto.org/docs/computations/execution-options.html

globally:

or locally on the R chunks

Saving to file

Save to png file the last plot you rendered.

ggsave(filename = "myplot.png")

save to pdf file a specific plot object.

ggsave(filename = "myplot.pdf", plot = p)