In this tutorial we learn how to use a relational database to merge and analyse CSV data from different sources. You can follow the instructions for this tutorial by opening the file
qmss-2016/relational_databases/README.html
(from your local clone of the QMSS repository)
in your browser.
The tutorial requires
- csvkit (optional)
- a sqlite database manager, either the
sqlite3
command line shell included in sqlite or alternatively SQLite Manager for Firefox
We will use the following data:
- Glottolog languages and dialects downloaded from http://glottolog.org/static/download/2.7/languages-and-dialects-geo.csv
- PHOIBLE phoneme data downloaded from https://raw.githubusercontent.com/phoible/dev/master/data/phoible-by-phoneme.tsv
- Ecological data from D-PLACE
The tutorial is organized in a way that may resemble a typical usage for relational databases in a research setting: Starting with a question
Does our data allow any insight regarding the debate about effects of aridity/humidity on tonality of languages?
we inspect available data, then procede to load this data into a relational database for further processing, and finally export a dataset in CSV from the database which may serve as the basis for further analysis.