-
Notifications
You must be signed in to change notification settings - Fork 36
/
ggplot2.Rmd
429 lines (302 loc) · 29 KB
/
ggplot2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
# The `gglot2` Library {#ggplot2}
Being able to create **visualizations** (graphical representations) of data is a key step in being able to _communicate_ information and findings to others. In this chapter you will learn to use the **`ggplot2`** library to declaratively make beautiful plots or charts of your data.
Although R does provide built-in plotting functions, the `ggplot2` library implements the **Grammar of Graphics** (similar to how `dplyr` implements a _Grammar of Data Manipulation_; indeed, both packages were developed by the same person). This makes the library particularly effective for describing how visualizations should represent data, and has turned it into the preeminent plotting library in R.
Learning this library will allow you to easily make nearly any kind of (static) data visualization, customized to your exact specifications.
<p class="alert">Examples in this chapter adapted from [_R for Data Science_](http://r4ds.had.co.nz/) by Garrett Grolemund and Hadley Wickham.</p>
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE)
knitr::opts_chunk$set(fig.path='img/ggplot2/', fig.width=6, fig.asp=0.618)
library("ggplot2")
library("dplyr")
```
## A Grammar of Graphics
Just as the grammar of language helps us construct meaningful sentences out of words, the ___Grammar of Graphics___ helps us to construct graphical figures out of different visual elements. This grammar gives us a way to talk about parts of a plot: all the circles, lines, arrows, and words that are combined into a diagram for visualizing data. Originally developed by Leland Wilkinson, the Grammar of Graphics was [adapted by Hadley Wickham](http://vita.had.co.nz/papers/layered-grammar.pdf) to describe the _components_ of a plot, including
- the **data** being plotted
- the **geometric objects** (circles, lines, etc.) that appear on the plot
- the **aesthetics** (appearance) of the geometric objects, and the _mappings_ from variables in the data to those aesthetics
- a **statistical transformation** used to calculate the data values used in the plot
- a **position adjustment** for locating each geometric object on the plot
- a **scale** (e.g., range of values) for each aesthetic mapping used
- a **coordinate system** used to organize the geometric objects
- the **facets** or groups of data shown in different plots
<!--http://r4ds.had.co.nz/data-visualisation.html#the-layered-grammar-of-graphics << good diagrams (for slides?)-->
Wickham further organizes these components into **layers**, where each layer has a single _geometric object_, _statistical transformation_, and _position adjustment_. Following this grammar, you can think of each plot as a set of layers of images, where each image's appearance is based on some aspect of the data set.
All together, this grammar enables you to discuss what plots look like using a standard set of vocabulary. And like with `dplyr` and the _Grammar of Data Manipulation_, `ggplot2` uses this grammar directly to declare plots, allowing you to more easily create specific visual images.
## Basic Plotting with `ggplot2`
### _ggplot2_ library
The [**`ggplot2`**](http://ggplot2.tidyverse.org/) library provides a set of _declarative functions_ that mirror the above grammar, enabling you to easily specify what you want a plot to look like (e.g., what data, geometric objects, aesthetics, scales, etc. you want it to have).
`ggplot2` is yet another external package (like `dplyr` and `httr` and `jsonlite`), so you will need to install and load it in order to use it:
```r
install.packages("ggplot2") # once per machine
library("ggplot2")
```
This will make all of the plotting functions you'll need available.
<p class="alert">_ggplot2_ is called _ggplot**2**_ because once upon a time there was just a library _ggplot_. However, the developer noticed that it used an inefficient set of functions. In order for not to break the API, the authors introduced a successor package _ggplot2_. However, the central function in this package is still called `ggplot()`, not `ggplot2()`!
</p>
### _mpg_ data
_ggplot2_ library comes with a number of built-in data sets. One of the most popular of these is `mpg`, a data frame about fuel economy for different cars. It is a sufficiently small but versatile dataset to demonstrate various aspects of plotting. _mpg_ has `r nrow(mpg)` rows and `r ncol(mpg)` columns. Below is a sample of it:
```{r mpgDemo}
mpg[sample(nrow(mpg), 3),]
```
The most important variables for our purpose are following:
* **class**, car class, such as SUV, compact, minivan
* **displ**, engine size (liters)
* **cyl**, number of cylinders
* **hwy**, mileage on highway, miles per gallon
* **manufacturer**, producer of the car, e.g. Volkswagen, Toyota
### Our first ggplot
In order to create a plot, you call the `ggplot()` function, specifying the **data** that you wish to plot. You then add new _layers_ that are **geometric objects** which will show up on the plot:
```{r basic_mpg}
# plot the `mpg` data set, with highway mileage on the x axis and
# engine displacement (power) on the y axis:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
```
To walk through the above code:
- The `ggplot()` function is passed the data frame to plot as the `data` argument.
- You specify a geometric object (`geom`) by calling one of the many`geom` [functions](http://ggplot2.tidyverse.org/reference/index.html#section-layer-geoms), which are all named `geom_` followed by the name of the kind of geometry you wish to create. For example, `geom_point()` will create a layer with "point" (dot) elements as the geometry. There are a large number of these functions; see below for more details.
- For each `geom` you must specify the **aesthetic mappings**, which is how data from the data frame will be mapped to the visual aspects of the geometry. These mappings are defined using the `aes()` function. The `aes()` function takes a set of arguments (like a list), where the argument name is the visual property to map _to_, and the argument value is the data property to map _from_.
- Finally, you add `geom` layers to the plot by using the addition (**`+`**) operator.
Thus, basic simple plots can be created simply by specifying a data set, a `geom`, and a set of aesthetic mappings.
- Note that `ggplot2` library does include a `qplot()` function for creating "quick plots", which acts as a convenient shortcut for making simple, "default"-like plots. While this is a nice starting place, the strength of `ggplot2` is in it's _customizability_, so read on!
### Aesthetic Mappings
The **aesthetic mapping** is a central concept of every data visualization. This means setting up the correspondence between **aesthetics**, the visual properties (visual channels) of the plot, such as _position_, _color_, _size_, or _shape_, and certain properties of the data, typically numeric values of certain variables.
Aesthetics are the representations that you want to _drive with your data properties_, rather than fix in code for all markers. Each visual channel can therefore encode an aspect of the data and be used to express underlying patterns.
The aesthetics mapping is specified in the [`aes()`](http://ggplot2.tidyverse.org/reference/index.html#section-aesthetics) function call in the `geom` layer. Above we used mapping `aes(x=displ, y=hwy)`. This means to map variable `displ` in the `mpg` data (engine size) to the horizontal position (_x_-coordinate) on the plot, and variable `hwy` (highway mileage) to the vertical position (_y_ coordinate). We did not specify any other visual properties, such as color, point size or point shape, so by default the `geom_point` layer produced a set of equal size black dots, positioned according to the date. Let's now color the points according to the class of the car. This amounts to taking an additional aesthetic, _color_, and mapping it to the variable `class` in data as `color=class`. As we want this to happen in the same layer, we must add this to the `aes()` function as an additional named argument:
```{r aes_color}
# color the data by car type
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
```
(`ggplot2` will even create a legend for you!)
Note that using the `aes()` function will cause the visual channel to be based on the data specified in the argument. For example, using `aes(color = "blue")` won't cause the geometry's color to be "blue", but will instead cause the visual channel to be mapped from the _vector_ `c("blue")`—as if you only had a single type of engine that happened to be called "blue":
```{r wrong_blue}
ggplot(data = mpg) + # note where parentheses are closed
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
```
This looks confusing (note the weird legend!) and is most likely not what you want.
If you wish to specify a given aesthetic, you should ___set___ that property as an argument to the `geom` method, outside of the `aes()` call:
```{r color_blue}
ggplot(data = mpg) + # note where parentheses are closed
geom_point(mapping = aes(x = displ, y = hwy), color = "blue") # blue points!
```
## Complex Plots
Building on these basics, `ggplot2` can be used to build almost any kind of plot you may want. These plots are declared using functions that follow from the _Grammar of Graphics_.
### Specifying Geometry
The most obvious distinction between plots is what **geometric objects** (`geoms`) they include. `ggplot2` supports a number of different types of [`geoms`](http://ggplot2.tidyverse.org/reference/index.html#section-layer-geoms), including:
- **`geom_point`** for drawing individual points (e.g., a scatter plot)
- **`geom_line`** for drawing lines (e.g., for a line charts)
- **`geom_smooth`** for drawing smoothed lines (e.g., for simple trends or approximations)
- **`geom_bar`** for drawing bars (e.g., for bar charts)
- **`geom_polygon`** for drawing arbitrary shapes (e.g., for drawing an area in a coordinate plane)
- **`geom_map`** for drawing polygons in the shape of a map! (You can access the _data_ to use for these maps by using the [`map_data()`](http://ggplot2.tidyverse.org/reference/map_data.html) function).
Each of these geometries will need to include a set of **aesthetic mappings** (using the `aes()` function and assigned to the `mapping` argument), though the specific _visual properties_ that the data will map to will vary. For example, you can map data to the `shape` of a `geom_point` (e.g., if they should be circles or squares), or you can map data to the `linetype` of a `geom_line` (e.g., if it is solid or dotted), but not vice versa.
- Almost all `geoms` **require** an `x` and `y` mapping at the bare minimum.
```{r geom_examples, fig.show='hold', out.width=c('380pt', '380pt')}
# line chart of mileage by engine power
ggplot(data = mpg) +
geom_line(mapping = aes(x = displ, y = hwy))
# bar chart of car type
ggplot(data = mpg) +
geom_bar(mapping = aes(x = class)) # no y mapping needed!
```
What makes this really powerful is that you can add **multiple geometries** to a plot, thus allowing you to create complex graphics showing multiple aspects of your data
```{r multi_geom}
# plot with both points and smoothed line
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))
```
Of course the aesthetics for each `geom` can be different, so you could show multiple lines on the same plot (or with different colors, styles, etc). It's also possible to give each `geom` a different `data` argument, so that you can show multiple data sets in the same plot.
- If you want multiple `geoms` to utilize the same data or aesthetics, you can pass those values as arguments to the `ggplot()` function itself; any `geoms` added to that plot will use the values declared for the whole plot _unless overridden by individual specifications_.
#### Statistical Transformations
If you look at the above `bar` chart, you'll notice that the the `y` axis was defined for you as the `count` of elements that have the particular type. This `count` isn't part of the data set (it's not a column in `mpg`), but is instead a **statistical transformation** that the `geom_bar` automatically applies to the data. In particular, it applies the `stat_count` transformation, simply summing the number of rows each `class` appeared in the dataset.
`ggplot2` supports many different statistical transformations. For example, the "identity" transformation will leave the data "as is". You can specify which statistical transformation a `geom` uses by passing it as the **`stat`** argument:
```{r, bar_chart}
# bar chart of make and model vs. mileage
# quickly (lazily) filter the dataset to a sample of the cars: one of each make/model
new_cars <- mpg %>%
mutate(car = paste(manufacturer, model)) %>% # combine make + model
distinct(car, .keep_all = TRUE) %>% # select one of each cars -- lazy filtering!
slice(1:20) # only keep 20 cars
# create the plot (you need the `y` mapping since it is not implied by the stat transform of geom_bar)
ggplot(new_cars) +
geom_bar(mapping = aes(x = car, y = hwy), stat = "identity") +
coord_flip() # horizontal bar chart
```
Additionally, `ggplot2` contains **`stat_`** functions (e.g., `stat_identity` for the "identity" transformation) that can be used to specify a layer in the same way a `geom` does:
```{r stat_summary}
# generate a "binned" (grouped) display of highway mileage
ggplot(data = mpg) +
stat_bin(aes(x = hwy, color = hwy), binwidth = 4) # binned into groups of 4 units
```
Notice the above chart is actually a [histogram](https://en.wikipedia.org/wiki/Histogram)! Indeed, almost every `stat` transformation corresponds to a particular `geom` (and vice versa) by default. Thus they can often be used interchangeably, depending on how you want to emphasize your layer creation when writing the code.
```r
# these two charts are identical
ggplot(data = mpg) +
geom_bar(mapping = aes(x = class))
ggplot(data = mpg) +
stat_count(mapping = aes(x = class))
```
#### Position Adjustments
In addition to a default statistical transformation, each `geom` also has a default **position adjustment** which specifies a set of "rules" as to how different components should be positioned relative to each other. This position is noticeable in a `geom_bar` if you map a different variable to the color visual channel:
```{r, stacked_bar}
# bar chart of mileage, colored by engine type
ggplot(data = mpg) +
geom_bar(mapping = aes(x = hwy, fill = class)) # fill color, not outline color
```
The `geom_bar` by default uses a position adjustment of `"stack"`, which makes each "bar" a height appropriate to its value and _stacks_ them on top of each other. You can use the **`position`** argument to specify what position adjustment rules to follow:
```{r position_examples, fig.show='hold', out.width=c('380pt', '380pt')}
# a filled bar chart (fill the vertical height)
ggplot(data = mpg) +
geom_bar(mapping = aes(x = hwy, fill = drv), position = "fill")
# a dodged (group) bar chart -- values next to each other
# (not great dodging demos in this data set)
ggplot(data = mpg) +
geom_bar(mapping = aes(x = hwy, fill = drv), position = "dodge")
```
Check the documentation for each particular `geom` to learn more about its possible position adjustments.
### Styling with Scales
Whenever you specify an **aesthetic mapping**, `ggplot` uses a particular **scale** to determine the _range of values_ that the data should map to. Thus, when you specify
```r
# color the data by engine type
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
```
`ggplot` automatically adds a **scale** for each mapping to the plot:
```r
# same as above, with explicit scales
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class)) +
scale_x_continuous() +
scale_y_continuous() +
scale_colour_discrete()
```
Each scale can be represented by a function with the following name: `scale_`, followed by the name of the aesthetic property, followed by an `_` and the name of the scale. A `continuous` scale will handle things like numeric data (where there is a _continuous set_ of numbers), whereas a `discrete` scale will handle things like colors (since there is a small list of _distinct_ colors).
While the default scales will work fine, it is possible to explicitly add different scales to replace the defaults. For example, you can use a scale to change the direction of an axis:
```r
# mileage relationship, ordered in reverse
ggplot(data = mpg) +
geom_point(mapping = aes(x = cty, y = hwy)) +
scale_x_reverse()
```
Similarly, you can use `scale_x_log10()` to plot on a [logarithmic scale](https://en.wikipedia.org/wiki/Logarithmic_scale).
You can also use scales to specify the _range_ of values on a axis by passing in a `limits` argument. This is useful for making sure that multiple graphs share scales or formats.
```{r scale_limit, fig.show='hold', out.width=c('380pt', '380pt')}
# subset data by class
suv <- mpg %>% filter(class == "suv") # suvs
compact <- mpg %>% filter(class == "compact") # compact cars
# scales
x_scale <- scale_x_continuous(limits = range(mpg$displ))
y_scale <- scale_y_continuous(limits = range(mpg$hwy))
col_scale <- scale_colour_discrete(limits = unique(mpg$drv))
ggplot(data = suv) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv)) +
x_scale + y_scale + col_scale
ggplot(data = compact) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv)) +
x_scale + y_scale + col_scale
```
Notice how it is easy to compare the two data sets to each other because the axes and colors match!
These scales can also be used to specify the "tick" marks and labels; see the resources at the end of the chapter for details. And for further ways specifying where the data appears on the graph, see the [Coordinate Systems] section below.
<!-- Add an example here -->
#### Color Scales
A more common scale to change is which set of colors to use in a plot. While you can use scale functions to specify a list of colors to use, a more common option is to use a pre-defined palette from [**colorbrewer.org**](http://colorbrewer2.org/). These color sets have been carefully designed to look good and to be viewable to people with certain forms of color blindness. This color scale is specified with the `scale_color_brewer()` function, passing the `palette` as an argument.
```{r brewer_point}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class), size = 4) +
scale_color_brewer(palette = "Set3")
```
You can get the palette name from the _colorbrewer_ website by looking at the `scheme` query parameter in the URL. Or see the diagram [here](https://bl.ocks.org/mbostock/5577023) and hover the mouse over each palette for its name.
You can also specify _continuous_ color values by using a [gradient](http://ggplot2.tidyverse.org/reference/scale_gradient.html) scale, or [manually](http://ggplot2.tidyverse.org/reference/scale_manual.html) specify the colors you want to use as a _named vector_.
### Coordinate Systems
The next term from the _Grammar of Graphics_ that can be specified is the **coordinate system**. As with **scales**, coordinate systems are specified with functions (that all start with **`coord_`**) and are added to a `ggplot`. There are a number of different possible [coordinate systems](http://ggplot2.tidyverse.org/reference/index.html#section-coordinate-systems) to use, including:
- **`coord_cartesian`** the default [cartesian coordinate](https://en.wikipedia.org/wiki/Cartesian_coordinate_system) system, where you specify `x` and `y` values.
- **`coord_flip`** a cartesian system with the `x` and `y` flipped
- **`coord_fixed`** a cartesian system with a "fixed" aspect ratio (e.g., 1.78 for a "widescreen" plot)
- **`coord_polar`** a plot using [polar coordinates](https://en.wikipedia.org/wiki/Polar_coordinate_system)
- **`coord_quickmap`** a coordinate system that approximates a good aspect ratio for maps. See the documentation for more details.
Most of these system support the `xlim` and `ylim` arguments, which specify the _limits_ for the coordinate system.
### Facets
**Facets** are ways of _grouping_ a data plot into multiple different pieces (_subplots_). This allows you to view a separate plot for each value in a [categorical variable](https://en.wikipedia.org/wiki/Categorical_variable). Conceptually, breaking a plot up into facets is similar to using the `group_by()` verb in `dplyr`, with each facet acting like a _level_ in an R _factor_.
You can construct a plot with multiple facets by using the **`facet_wrap()`** function. This will produce a "row" of subplots, one for each categorical variable (the number of rows can be specified with an additional argument):
```{r facets}
# a plot with facets based on vehicle type.
# similar to what we did with `suv` and `compact`!
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~class)
```
Note that the argument to `facet_wrap()` function is written with a tilde (**`~`**) in front of it. This specifies that the column name should be treated as a **formula**. A formula is a bit like an "equation" in mathematics; it's like a string representing what set of operations you want to perform (putting the column name in a string also works in this simple case). Formulas are in fact the same structure used with _standard evaluation_ in `dplyr`; putting a `~` in front of an expression (such as `~ desc(colname)`) allows SE to work.
- In short: put a `~` in front of the column name you want to "group" by.
### Labels & Annotations
Textual labels and annotations (on the plot, axes, geometry, and legend) are an important part of making a plot understandable and communicating information. Although not an explicit part of the _Grammar of Graphics_ (they would be considered a form of geometry), `ggplot` makes it easy to add such annotations.
You can add titles and axis labels to a chart using the **`labs()`** function (_not_ `labels`, which is a different R function!):
```{r labels}
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class)) +
labs(
title = "Fuel Efficiency by Engine Power, 1999-2008", # plot title
x = "Engine power (litres displacement)", # x-axis label (with units!)
y = "Fuel Efficiency (miles per gallon)", # y-axis label (with units!)
color = "Car Type"
) # legend label for the "color" property
```
It is possible to add labels into the plot itself (e.g., to label each point or line) by adding a new `geom_text` or `geom_label` to the plot; effectively, you're plotting an extra set of data which happen to be the variable names:
```{r annotations}
# a data table of each car that has best efficiency of its type
best_in_class <- mpg %>%
group_by(class) %>%
filter(row_number(desc(hwy)) == 1)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + # same mapping for all geoms
geom_point(mapping = aes(color = class)) +
geom_label(data = best_in_class, mapping = aes(label = model), alpha = 0.5)
```
_R for Data Science_ (linked in the resources below) recommends using the [`ggrepel`](https://github.com/slowkow/ggrepel) package to help position labels.
## Plotting in Scripts
From the first encounter with _ggplot_ one typically gets the impression that it is the _ggplot_ function that creates the plot. This **is not true!**. A call to _ggplot_ just creates a _ggplot_-object, a data structure that contains data and all other necessary details for creating the plot. But the image itself is not created. It is created instead by the _print_-method. If you type an expression on R console, R evaluates and prints this expression. This is why we can use it as a manual calculator for simple math, such as `2 + 2`. The same is true for _ggplot_: it returns a _ggplot_-object, and given you don't store it into a variable, it is printed, and the print method of _ggplot_-object actually makes the image. This is why we can immediately see the images when we work with _ggplot_ on console.
Things may be different, however, if we do this in a script. When you execute a script, the returned non-stored expressions are not printed. For instance, the script
```{r ggplotNoPlot, eval=FALSE}
diamonds %>%
sample_n(1000) %>%
ggplot() +
geom_point(aes(carat, price)) # note: not printed
```
will not produce any output when sourced as a single script (and not run line-by-line). You have to print the returned objects explicitly, for instance as
```{r ggplotPrintPlot, eval=FALSE}
data <- diamonds %>%
sample_n(1000)
p <- ggplot(data) +
geom_point(aes(carat, price)) # store here
print(p) # print here
```
In scripts we often want the code not to produce image on screen, but store it in a file instead. This can be achieved in a variety of ways, for instance through redirecting graphical output to a pdf device with the command `pdf()`:
```{r ggplotPdf, eval=FALSE}
data <- diamonds %>%
sample_n(1000)
p <- ggplot(data) +
geom_point(aes(carat, price)) # store here
pdf(file="diamonds.pdf", width=10, height=8)
# redirect to a pdf file
print(p) # print here
dev.off() # remember to close the file
```
After redirecting the output, all plots will be written to the pdf file (as separate pages if you create more than one plot). Note you have to close the file with `def.off()`, otherwise it will be broken. There are other output options besides pdf, you may want to check `jpeg` and `png` image outputs. Finally, _ggplot_ also has a dedicated way to save individual plots to file using `ggsave`.
## Other Visualization Libraries
`ggplot2` is easily the most popular library for producing data visualizations in R. That said, `ggplot2` is used to produce **static** visualizations: unchanging "pictures" of plots. Static plots are great for for **explanatory visualizations**: visualizations that are used to communicate some information—or more commonly, an _argument_ about that information. All of the above visualizations have been ways to explain and demonstrate an argument about the data (e.g., the relationship between car engines and fuel efficiency).
Data visualizations can also be highly effective for **exploratory analysis**, in which the visualization is used as a way to _ask and answer questions_ about the data (rather than to convey an answer or argument). While it is perfectly feasible to do such exploration on a static visualization, many explorations can be better served with **interactive visualizations** in which the user can select and change the _view_ and presentation of that data in order to understand it.
While `ggplot2` does not directly support interactive visualizations, there are a number of additional R libraries that provide this functionality, including:
- [**`ggvis`**](http://ggvis.rstudio.com/) is a library that uses the _Grammar of Graphics_ (similar to `ggplot`), but for interactive visualizations. The interactivity is provided through the [`shiny`](http://www.rstudio.com/shiny/) library, which is introduced in a later chapter.
- [**Bokeh**](http://hafen.github.io/rbokeh/index.html) is an open-source library for developing interactive visualizations. It automatically provides a number of "standard" interactions (pop-up labels, drag to pan, select to zoom, etc) automatically. It is similar to `ggplot2`, in that you create a figure and then and then add _layers_ representing different geometries (points, lines etc). It has detailed and readable documentation, and is also available to other programming languages (such as Python).
- [**Plotly**](https://plot.ly/r/) is another library similar to _Bokeh_, in that it automatically provided standard interactions. It is also possible to take a `ggplot2` plot and [wrap](https://plot.ly/ggplot2/) it in Plotly in order to make it interactive. Plotly has many examples to learn from, though a less effective set of documentation than other libraries.
- [**`rCharts`**](http://rdatascience.io/rCharts/) provides a way to utilize a number of _JavaScript_ interactive visualization libraries. JavaScript is the programming language used to create interactive websites (HTML files), and so is highly specialized for creating interactive experiences.
There are many other libraries as well; searching around for a specific feature you need may lead you to a useful tool!
## Resources {-}
- [gglot2 Documentation](http://ggplot2.tidyverse.org/) (particularly the [function reference](http://ggplot2.tidyverse.org/reference/index.html))
- [ggplot2 Cheat Sheet](https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf) (see also [here](http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/))
- [Data Visualization (R4DS)](http://r4ds.had.co.nz/data-visualisation.html) - tutorial using `ggplot2`
- [Graphics for Communication (R4DS)](http://r4ds.had.co.nz/graphics-for-communication.html) - "part 2" of tutorial using `ggplot`
- [Graphics with ggplot2](http://www.statmethods.net/advgraphs/ggplot2.html) - explanation of `qplot()`
- [Telling stories with the grammar of graphics](https://codewords.recurse.com/issues/six/telling-stories-with-data-using-the-grammar-of-graphics)
- [A Layered Grammar of Graphics (Wickham)](http://vita.had.co.nz/papers/layered-grammar.pdf)
<!-- ??[Making Maps with R](http://eriqande.github.io/rep-res-web/lectures/making-maps-with-R.html) -->