Skip to content

Latest commit

 

History

History
356 lines (273 loc) · 16.6 KB

map.md

File metadata and controls

356 lines (273 loc) · 16.6 KB

cdfviz.map

A common way to visualize data from a CLDF StructureDataset is as "dots on a map", i.e. as WALS-like geographic maps.

This can be done using the cldfviz.map command. Consulting the help for the cldfbench cldfviz.map command displays a somewhat lengthy message. So for better readability, we'll explain some options here in more detail.

Note that some options are only valid for some output formats.

For example usage of cldfviz.map, see the Examples section below.

Output

With cldfviz.map you can create

  • interactive HTML maps, using the Leaflet library
  • printable maps in one of the image formats PNG, JPG or PDF, using the cartopy library.
  • printable and editable maps in SVG format, using the cartopy library.

Since installation of cartopy is somewhat complex, it isn't installed with cldfviz by default, but has to be explicitly specified as extra, running

pip install cldfviz[cartopy]

Choosing between output formats is done with the --format option, which accepts the string html, png, jpg, pdf and svg.

cldfviz tries to create similar looking maps for both output types so that you can explore your dataset using HTML maps, and then create corresponding maps for a publication by just swapping the --format option. In reality, though, you'll have to do some fiddling with the --markersize, --width, --height, --dpi and --padding-* options to create satisfactory results for printable maps.

You can specify a filename for the map created by cldfviz.map via the --output option. By default, the resulting map will be written to map.<format>. For all formats the resulting map will be contained in a single file. In the case of HTML maps, the map will need to be rendered in a browser with access to the internet, to load Javascript libraries and map tiles.

Geo data from Glottolog

Since CLDF datasets can reference languoids in the Glottolog catalog transparently, it is possible to supplement a dataset with geo data from Glottolog to locate its languages on a map.

To do so,

  • the dataset must have a column specified as glottocode in its LanguageTable (or Glottocodes as values of the Language_ID column for metadata-free datasets)
  • the Glottolog data must be specified
    • either as dataset locator locating glottolog-cldf for the --glottolog-cldf option
    • or as path to a clone or download of the glottolog/glottolog repository for the --glottolog option (if the repository has been cloned - a particular version of Glottolog can be specified using the --glottolog-version option. See the cldfbench docs for details on reference catalog maintenance.)

What to map

By default - i.e. without specifying anything - cldfviz.map will plot all languages in a dataset (for which geo coordinates can be determined) as dots on a map.

But you can also plot values of these languages for a selection of parameters in the dataset. To do so, specify a comma-separated list of parameter IDs for the --parameters option.

In addition, you can map other language properties give in the dataset's LanguageTable by specifying a comma-separated list of column names from the LanguageTable for the --language-properties option.

Configuring marker appearance

  • --*-colormaps The default visual style for cldfviz.map maps is "dots", i.e. colored circle markers plotted at the language's location on the map. Thus, the primary mechanism to influence the appearance is by specifying colormaps to control the colors used for corresponding parameter values.

    You don't have to specify any colormaps, but if you do, the number of colormaps specified for --colormaps (and --language-properties-colormaps respectively) must match the number of parameters (and language properties respectively) to be plotted.

    For details about how to specify colormaps, see colormaps.md.

  • --markersize The size of the map markers is controlled via the --markersize option. You might need to experiment a bit to figure out a perfect size, since "size in pixels" may translate to quite different optics depending on screen size, --dpi settings, projections, etc.

Other general options

There's a handfull of options to control the overall appearance of maps:

  • --title: Specify a title for the map plot.
  • --pacific-centered: Flag to center maps of the whole world at the pacific, thus not cutting large language families in half.
  • --language-labels: Flag to display language names on the map. Note: This quickly gets crowded.
  • --missing-value: Specify a color used to indicate missing values. If not specified missing values will be omitted. Note that this setting will only include rows from ValueTable having null as Value. It will not include synthetic null values for all languages in the dataset.
  • --no-legend: Flag to not add a legend to the map. This is mainly of interest for printable maps, e.g. when a legend is provided elsewhere in a paper.

Options for HTML maps

The following options are only relevant for HTML maps:

  • --base-layer: Specify a tile layer to use for the Leaflet maps. See cldfviz.map.leaflet for available layers.
  • --with-layers: Add a Leaflet layer control to toggle between displaying and hiding markers for individual values of a parameter.
  • --with-layers-for-combinations: Add a Leaflet layer control to toggle between displaying and hiding markers for individual combinations of values for the plotted parameters. Note: While this option allows more fine-grained control over the displayed markers (in comparison with --with-layers), it may lead to unwieldy legends in case several parameters with multiple values are chosen.

Options for printable maps

The following options are only relevant for image (aka printable) maps:

  • --padding-left|right|top|bottom: Specify the padding to be added to maps (around the bounding box of the displayed markers) in degrees.
  • --extent: Specify the explicit geographic extent of the map as comma-separated list of degrees for (left, right, top, bottom) edge of the map.
  • --width: Width of the figure in inches.
  • --height: Height of the figure in inches.
  • --dpi: Pixel density of the figure. The default of 100 makes for rather small file size and is mostly suitable for experimentation. For printable quality you should set it to 300.
  • --projection: Map projection. For available projections, see https://scitools.org.uk/cartopy/docs/latest/crs/projections.html
  • --with-stock-img: Add a map underlay (using cartopy's stock_img method).
  • --zorder: Specify explit drawing order (i.e. specify what's plotted on top) by giving a JSON dictionary mapping parameter values to integers (the higher, the more on top).

Examples

We'll explain the usage of the command by using it with the WALS CLDF data. See the README for instructions how to download this data.

A minimal example

If you have data about languages linked to Glottolog via Glottocode and can format this data in a file called values.csv looking as follows:

ID,Language_ID,Parameter_ID,Value
1,stan1295,romance,false
2,stan1290,romance,true
3,ital1282,romance,true

you can "put it on a map" using the geo-data from Glottolog by running

$ cldfbench cldfviz.map values.csv --parameters romance --colormap tol \
--glottolog-cldf glottolog-cldf-4.7/ --format svg

Datatypes

Multi-valued variables

While many typological datasets look like the one above (or like WALS), with one value per language and parameter, this may not always be the case. APiCS, for example, has quite a few multi-valued features, e.g. Order of subject, object, and verb. cldfviz.map supports this (much like the APiCS web app does) by plotting small pie-charts as markers in case of multi-valued languages:

$ cldfbench cldfviz.map cldf-datasets-apics-4ed59b5/cldf --parameters 1 --format svg --projection Mollweide --width 10

Note the difference in sector sizes between this map and the one on the APiCS site. The size of the sectors on the APiCS site is weighted by a frequency. Fortunately, this frequency is available in the CLDF data as well and can be used by cldfviz.map, too:

$ cldfbench cldfviz.map cldf-datasets-apics-4ed59b5/cldf --parameters 1 --weight-col Frequency \
--format svg --projection Mollweide --width 10

Continuous variables

cldfviz.map can detect and display continuous variables, too. There are no continuous features in APiCS or WALS, but since cldfviz.map also works with metadata-free CLDF datasets, let's quickly create one. Using the UNIX shell tools sed and awk and the tools of the csvkit toolbox, we can run

csvgrep -c Latitude,Glottocode -r".+" wals-2020.3/languages.csv | \
csvcut -c ID,Glottocode,Latitude | \
awk '{if(NR==1){print $0",Parameter_ID"}else{print $0",latitude"}}' | \
sed 's/ID,Glottocode,Latitude,Parameter_ID/ID,Language_ID,Value,Parameter_ID/g' > values.csv

Let's break this down: The first line selects all WALS languages for which latitude and Glottocode is given. The next line narrows the resulting CSV to just three columns - the future ID, Language_ID and Value columns of our metadata-free StructureDataset. The awk command adds a constant column Parameter_ID, and the sed command renames the columns appropriately.

The resulting CSV looks as follows:

$ head -n 4 values.csv 
ID,Language_ID,Value,Parameter_ID
aar,aari1239,6,latitude
aba,abau1245,-4,latitude
abb,chad1249,13.8333333333,latitude

Mapping metadata-free CLDF data always relies on Glottolog data for the geo-coordinates. Thus, we must point to it, when running

$ cldfbench cldfviz.map values.csv --parameters latitude \
--glottolog-cldf https://raw.githubusercontent.com/glottolog/glottolog-cldf/v4.7/cldf/cldf-metadata.json

WALS latitudes

Note that since we looked up coordinates in Glottolog, languages may be displayed at slightly different locations than above (when the coordinates in WALS differ). It may also be the case that languages are mapped to invalid Glottocodes (e.g. in this case Jugli).

Now we could have done this in a simpler way, too, because cldfviz.map has a special option to display language properties encoded as columns in the LanguageTable as if they were parameters of the dataset. We can use this option to visualize a claim from WALS' chapter 129 that there is a

strong correlation between values [for feature 129] and latitudinal location

cldfbench cldfviz.map wals-2020.3/ --parameters 129A --colormaps tol \
--markersize 20 --language-properties Latitude --pacific-centered

WALS 129A and latitude

As seen above, cldfviz.map can visualize multiple parameters at once. E.g. we can explore the related WALS features 129A, 130A and 130B, selecting suitable colormaps for the two boolean parameters:

cldfbench cldfviz.map wals-2020.3/ --parameters 129A,130A,130B \
--colormaps base,base,tol --pacific-centered --markersize 30 

WALS 129A, 130A and 130B

HTML maps

With the leaflet library, we can create interactive maps which can be explored in a browser.

Running

cldfbench cldfviz.map wals-2020.3/ --base-layer USGS.USTopo --pacific-centered --colormaps tol

will create an HTML page map.html and open it in the browser, thus rendering an interactive map of the languages in the dataset.

WALS languages

For smaller language samples, it may be suitable to display the language names on the map, too. Here's WALS' feature 10B:

cldfbench cldfviz.map wals-2020.3/ --parameters 10B --colormaps tol --markersize 20 --language-labels

WALS 10B

Leveraging the GeoJSON support in Leaflet, HTML maps allow inclusion of an additional GeoJSON overlay (and an associated GeoJSON options object), via --overlay-geojson and --overlay-options. One such overlay - the Terrestrial Ecoregions of the World - is provided with cldfviz.

cldfbench cldfviz.map wals-2020.3/ --parameters 10B --overlay-geojson ecoregions

Printable maps via cartopy

If cldfviz is installed with cartopy similar maps to the ones shown above can also be created in various image formats:

$ cldfbench cldfviz.map wals-2020.3/ --parameters 129A --colormaps tol --language-properties Latitude \
--pacific-centered --format svg --width 20 --height 10 --dpi 300 --markersize 20 --with-ocean \
--projection Mollweide

WALS 129A and latitude

While these maps lack the interactivity of the HTML maps, they may be better suited for inclusion in print formats than screen shots of maps in the browser. They also provide some additional options like a choice between various map projections.

Advanced dataset pre-processing

Going one step further, we might visualize data that has been synthesized on the fly. E.g. we can visualize the AES endangerment information given in the Glottolog CLDF data for the WALS languages:

Since we will alter the WALS CLDF data, we make a copy of it first:

cp -r wals-2020.3 wals-copy

And since we want to extract data from glottolog-cldf, we download this, too, as explained in the README.

Now we extract the AES data from Glottolog ...

csvgrep -c Parameter_ID -m"aes" glottolog-cldf-4.7/cldf/values.csv |\
csvgrep -c Value -m"NA" -i |\
csvcut -c Language_ID,Parameter_ID,Code_ID  > aes1.csv

... and massage it into a form that can be appended to the WALS ValueTable:

csvjoin -y 0 -c Glottocode,Language_ID wals-2020.3/cldf/languages.csv aes1.csv |\
csvcut -c Parameter_ID,Code_ID,ID |\
awk '{if(NR==1){print $0",ID"}else{print $0",aes-"NR}}' |\
sed 's/Parameter_ID,Code_ID,ID,ID/Parameter_ID,Value,Language_ID,ID/g' |\
csvcut -c ID,Language_ID,Parameter_ID,Value |\
awk '{if(NR==1){print $0",Code_ID,Comment,Source,Example_ID"}else{print $0",,,,"}}' > aes2.csv

Notes:

  • The first awk call adds a unique value ID. We cannot re-use the value ID from Glottolog, because the mapping between WALS and Glottolog languages is many-to-one.
  • Using awk to manipulate CSV data is somewhat fragile, since it will break if the data contains multi-line cell content. To guard against that, you may compare the row count reported by csvstat with the line count from wc -l before using awk.

Now we append the values and a row for the ParameterTable ...

csvstack aes2.csv wals-copy/cldf/values.csv > values.csv
cp values.csv wals-copy/cldf
echo "ID,Name,Description,Chapter_ID" > aes_param.csv
echo "aes,AES,," >> aes_param.csv
csvstack aes_param.csv wals-copy/cldf/parameters.csv > parameters.csv
cp parameters.csv wals-copy/cldf

... and make sure the resulting dataset is valid:

cldf validate wals-copy/

Finally, we can plot the map:

cldfbench cldfviz.map wals-copy/ --pacific-centered --colormaps seq --parameters aes

WALS AES