-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
99 lines (60 loc) · 2.65 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# magutils
<!-- badges: start -->
[![R-CMD-check](https://github.com/f-hafner/magutils/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/f-hafner/magutils/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
The goal of magutils is to facilitate loading and extracting data from a database with records from Microsoft Academic Graph and ProQuest Dissertations and make the functions available to co-authors and RAs. In the future, we may publish a "back-end" package to generate the database.
## Installation
You can install the development version of magutils from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("f-hafner/magutils", build_vignettes = TRUE)
```
## Example
If you do not have access to the full database, use the example database like this:
```{r example}
library(magutils)
db_file <- db_example("AcademicGraph.sqlite")
conn <- connect_to_db(db_file)
```
Then query the graduate links:
```{r}
links <- get_links(conn, from = "graduates", lazy = TRUE)
```
Or query info on graduates:
```{r}
graduates <- get_proquest(conn, from = "graduates", lazy = FALSE, limit = 3)
```
You can join the two together
```{r}
library(magrittr)
links <- get_links(conn, from = "graduates", lazy = TRUE)
d_full <- get_proquest(conn, from = "graduates", limit = 5) %>%
dplyr::left_join(links, by = "goid") %>%
dplyr::collect()
```
At the end, do not forget to disconnect from the database:
```{r}
DBI::dbDisconnect(conn)
```
## Main functions
Extracting key tables
- `get_proquest`: Source data on dissertations in United States from ProQuest.
- `get_links`: Load links between ProQuest and MAG. Can be links from PhD graduates to MAG authors, or from PhD advisors to MAG authors
- `define_field`: define the field of study for records in a table.
- `define_gender`: define gender of a table of persons with firstnames.
- `augment_tbl`: augment a table with various additional information: output, affiliations, co-authors. Because `output` and `affiliations` are at the unit-year level, the result will be a table at the unit-year level. I am not sure if this is the best way to do it (also the naming wrt to the previous functions), but we have to see how it works in practice.
## Suggested usage
Load the links and/or proquest data, augment them as necessary, and then `collect` into memory.
For more details, `browseVignettes("magutils")`.