Skip to content

Commit

Permalink
add chapter15 updated note (#47)
Browse files Browse the repository at this point in the history
Co-authored-by: BB1464 <[email protected]>
  • Loading branch information
BB1464 and BB1464 authored Oct 13, 2023
1 parent a6c16e3 commit f7143a2
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 43 deletions.
76 changes: 33 additions & 43 deletions 15-Scales-and-guides.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,19 @@ library(ggplot2)
**Learning objectives:**

* Illustrate that there is nothing preventing you from transforming other kinds of scales beyond continuous position scale
* Show how concepts for position scales apply elsewhere
* Discuss the theory underpinning scales and guides

* Show how concepts for position scales apply elsewhere

* Discuss the theory underpinning scales and guides

## Theory of scales and guides

Each scale is a function from a region in data space (the domain of the scale) to a region in aesthetic space (the range of the scale). The axis or legend is the inverse function, known as the guide: it allows you to convert visual properties back to data.
- Each scale is a function from a region in data space to a region in aesthetic space.

- The axis or legend is the inverse function, known as the **guide**: it allows you to convert visual properties back to data.


Surprisingly, axes and legends are the same type of thing, but while they look very different they have the same purpose: to allow you to read observations from the plot and map them back to their original values.
- Surprisingly, axes and legends are the same type of thing, but while they look very different they have the same purpose: to allow you to read observations from the plot and map them back to their original values.

The commonalities between the two are illustrated below:

Expand All @@ -29,6 +31,8 @@ The commonalities between the two are illustrated below:
| `breaks` | Ticks & grid line|Key|
| `labels` |Tick label|Key label|

![](images/2023-10-13.png)

However, legends are more complicated than axes, and consequently there are a number of topics that are specific to legends:

**1.** A legend can display multiple aesthetics (e.g. colour and shape), from multiple layers (Section 15.7.1), and the symbol displayed in a legend varies based on the geom used in the layer (Section 15.8)
Expand Down Expand Up @@ -58,20 +62,14 @@ ggplot(mpg, aes(displ, hwy)) +
scale_colour_discrete()
```


The choice of default scale depends on the aesthetic and the variable type. In this example `hwy` is a continuous variable mapped to the y aesthetic so the default scale is `scale_y_continuous()`; similarly `class` is discrete so when mapped to the colour aesthetic the default scale becomes `scale_colour_discrete()`. Specifying these defaults would be tedious so ggplot2 does it for you. But if you want to override the defaults, you’ll need to add the scale yourself, like this:


```{r 15-04, echo=TRUE,eval=FALSE}
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_continuous(name = "A really awesome x axis label") +
scale_y_continuous(name = "An amazingly great y axis label")
```

In practice you would typically use `labs()` for this, discussed in Section 8.1, but it is conceptually helpful to understand that axis labels and legend titles are both examples of scale names: see Section 15.2.

The use of `+` to “add” scales to a plot is a little misleading because if you supply two scales for the same aesthetic, the last scale takes precedence. In other words, when you `+` a scale, you’re not actually adding it to the plot, but overriding the existing scale. This means that the following two specifications are equivalent:
The use of `+` to “add” scales to a plot is a little misleading because if you supply two scales for the same aesthetic, the last scale takes precedence:



Expand All @@ -88,10 +86,6 @@ ggplot(mpg, aes(displ, hwy)) +
scale_x_continuous(name = "Label 2")
```

Note the message when you add multiple scales for the same aesthetic, which makes it harder to accidentally overwrite an existing scale. If you see this in your own code, you should make sure that you’re only adding one scale to each aesthetic.

If you’re making small tweaks to the scales, you might continue to use the default scales, supplying a few extra arguments. If you want to make more radical changes you will override the default scales with alternatives:


```{r 15-06, echo=TRUE,eval=FALSE}
ggplot(mpg, aes(displ, hwy)) +
Expand All @@ -100,8 +94,6 @@ ggplot(mpg, aes(displ, hwy)) +
scale_colour_brewer()
```

Here `scale_x_sqrt()` changes the scale for the x axis scale, and `scale_colour_brewer()` does the same for the colour scale.

### Naming scheme

The scale functions intended for users all follow a common naming scheme. You’ve probably already figured out the scheme, but to be concrete, it’s made up of three pieces separated by "_":
Expand All @@ -112,9 +104,6 @@ The scale functions intended for users all follow a common naming scheme. You’

**3.** The name of the scale (e.g., `continuous`, `discrete`, `brewer`).

The naming structure is often helpful, but can sometimes be ambiguous. For example, while the name `scale_colour_continuous()` clearly refers to the colour scale associated with a continuous variables, it is less obvious that `scale_colour_distiller()` is simply a different method for creating colour scales for continuous variables.



### Fundamental scale types

Expand All @@ -132,51 +121,57 @@ Each fundamental type is handled by one of three scale constructor functions:

* `binned_scale()`.

Although you should never need to call these constructor functions, they provide the organising structure for scales and it is useful to know about them.
Although you should never need to call these constructor functions, they provide the organizing structure for scales and it is useful to know about them.

## Scale Breaks

## Scale Names
Discussion of what unifies the concept of breaks across continuous, discrete and binned scales: they are specific data values at which the guide needs to display something. Include additional detail about break functions.

Extend discussion of `labs()` in Section 8.1.
## Scale Limits

## Scale Breaks
- Section 15.1 introduced the concept that a scale defines a mapping from the data space to the aesthetic space.

Discussion of what unifies the concept of breaks across continuous, discrete and binned scales: they are specific data values at which the guide needs to display something. Include additional detail about break functions.
- Scale limits are an extension of this idea: they dictate the **region** of the data space over which the mapping is defined.

- For continuous and binned scales, the data space is inherently continuous and one-dimensional, so the limits can be specified by two end points.

- For discrete scales, however, the data space is unstructured and consists only of a set of categories: as such the limits for a discrete scale can only be specified by enumerating the set of categories over which the mapping is defined.

## Scale Limits
- The toolbox chapters outline the common practical goals for specifying the limits: for position scales the limits are used to set the end points of the axis, for example.

Section 15.1 introduced the concept that a scale defines a mapping from the data space to the aesthetic space. Scale limits are an extension of this idea: they dictate the **region** of the data space over which the mapping is defined. At a theoretical level this region is defined differently depending on the fundamental scale type.
This leads naturally to the question of what ggplot2 should do if the data set contains “out of bounds” values that fall outside the limits.

For continuous and binned scales, the data space is inherently continuous and one-dimensional, so the limits can be specified by two end points. For discrete scales, however, the data space is unstructured and consists only of a set of categories: as such the limits for a discrete scale can only be specified by enumerating the set of categories over which the mapping is defined.
- The default behaviour in ggplot2 is to convert out of bounds values to NA.

The toolbox chapters outline the common practical goals for specifying the limits: for position scales the limits are used to set the end points of the axis, for example. This leads naturally to the question of what ggplot2 should do if the data set contains “out of bounds” values that fall outside the limits.
- We can override this default by setting `oob` argument of the scale, a function that is applied to all observations outside the scale limits.

The default behaviour in ggplot2 is to convert out of bounds values to NA, the logic for this begin that if a data value is not part of the mapped region, it should be treated as missing. This can occasionally lead to unexpected behaviour, as illustrated in Section 10.1.2. You can override this default by setting `oob` argument of the scale, a function that is applied to all observations outside the scale limits. The default is `scales::oob_censor()` which replaces any value outside the limits with `NA`. Another option is `scales::oob_squish()` which squishes all values into the range. An example using a fill scale is shown below:
- The default is `scales::oob_censor()` which replaces any value outside the limits with `NA`.

- Another option is `scales::oob_squish()` which squishes all values into the range. An example using a fill scale is shown below:

```{r 15-07}
df <- data.frame(x = 1:6, y = 8:13)
base <- ggplot(df, aes(x, y)) +
geom_col(aes(fill = x)) + # bar chart
geom_vline(xintercept = 3.5, colour = "red") # for visual clarity only
base
base + scale_fill_gradient(limits = c(1, 3))
base + scale_fill_gradient(limits = c(1, 3), oob = scales::squish)
```

On the left the default fill colours are shown, ranging from dark blue to light blue.

In the middle panel the scale limits for the fill aesthetic are reduced so that the values for the three rightmost bars are replace with NA and are mapped to a grey shade.

In some cases this is desired behaviour but often it is not: the right panel addresses this by modifying the `oob` function appropriately.
The first plot the default fill colours are shown, ranging from dark blue to light blue.

In the second plot the scale limits for the fill aesthetic are reduced so that the values for the three rightmost bars are replace with NA and are mapped to a grey shade.

In some cases this is desired behaviour but often it is not: the third plot addresses this by modifying the `oob` function appropriately.

## Scale guides

Scale guides are more complex than scale names: where the `name` argument (and `labs()` ) takes text as input, the `guide` argument (and `guides()`) require a guide object created by a **guide function** such as `guide_colourbar()` and `guide_legend()`. These arguments to these functions offer additional fine control over the guide.
Scale **guides** are more complex than **scale names**: where the `name` argument (and `labs()` ) takes text as input, the `guide` argument (and `guides()`) require a guide object created by a **guide function** such as `guide_colourbar()` and `guide_legend()`. These arguments to these functions offer additional fine control over the guide.

The table below summarises the default guide functions associated with different scale types:

Expand Down Expand Up @@ -205,8 +200,6 @@ Each of these guide types has appeared earlier in the toolbox:
In addition to the functionality discussed in those sections, the guide functions have many arguments that are equivalent to theme settings like text colour, size, font etc, but only apply to a single guide. For information about those settings, see Chapter 18.




## Scale transformation

The most common use for scale transformations is to adjust a continuous position scale, as discussed in Section 10.1.7. However, they can sometimes be helpful to when applied to other aesthetics. Often this is purely a matter of visual emphasis.
Expand Down Expand Up @@ -235,9 +228,6 @@ base + scale_size(trans = "reverse")

In the plot on the left, the `z` value is naturally interpreted as a “weight”: if each dot corresponds to a group, the `z` value might be the size of the group. In the plot on the right, the size scale is reversed, and `z` is more naturally interpreted as a “distance” measure: distant entities are scaled to appear smaller in the plot.




## Legend merging and splitting

There is always a one-to-one correspondence between position scales and axes. But the connection between non-position scales and legend is more complex: one legend may need to draw symbols from multiple layers (“merging”), or one aesthetic may need multiple legends (“splitting”).
Expand Down Expand Up @@ -292,11 +282,11 @@ base + labs(shape = "Split legend")
base + labs(shape = "Merged legend", colour = "Merged legend")
```


### Splitting legends

Splitting a legend is a much less common data visualization task. In general it is not advisable to map one aesthetic (e.g. colour) to multiple variables, and so by default ggplot2 does not allow you to “split” the colour aesthetic into multiple scales with separate legends.

Splitting a legend is a much less common data visualisation task. In general it is not advisable to map one aesthetic (e.g. colour) to multiple variables, and so by default ggplot2 does not allow you to “split” the colour aesthetic into multiple scales with separate legends. Nevertheless, there are exceptions to this general rule, and it is possible to override this behaviour using the ggnewscale package. The `ggnewscale::new_scale_colour()` command acts as an instruction to ggplot2 to initialise a new colour scale: scale and guide commands that appear above the `new_scale_colour()` command will be applied to the first colour scale, and commands that appear below are applied to the second colour scale.
Nevertheless, there are exceptions to this general rule, and it is possible to override this behaviour using the ggnewscale package. The `ggnewscale::new_scale_colour()` command acts as an instruction to ggplot2 to initialize a new colour scale: scale and guide commands that appear above the `new_scale_colour()` command will be applied to the first colour scale, and commands that appear below are applied to the second colour scale.

To illustrate this the plot on the left uses `geom_point()` to display a large marker for each vehicle make in the mpg data, with a single colour scale that maps to the year. On the right, a second `geom_point()` layer is overlaid on the plot using small markers: this layer is associated with a different colour scale, used to indicate whether the vehicle has a 4-cylinder engine.

Expand Down
Binary file added images/2023-10-13.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f7143a2

Please sign in to comment.