A common way to visualize data from a CLDF StructureDataset is as "dots on a map", i.e. as WALS-like geographic maps.
This can be done using the cldfviz.map
command.
Consulting the help for the cldfbench cldfviz.map
command displays a somewhat lengthy message. So for better
readability, we'll explain some options here in more detail.
Note that some options are only valid for some output formats.
For example usage of cldfviz.map
, see the Examples section below.
With cldfviz.map
you can create
- interactive HTML maps, using the Leaflet library
- printable maps in one of the image formats PNG, JPG or PDF, using the cartopy library.
- printable and editable maps in SVG format, using the cartopy library.
Since installation of cartopy is somewhat complex, it isn't installed with cldfviz
by default, but has to
be explicitly specified as extra, running
pip install cldfviz[cartopy]
Choosing between output formats is done with the --format
option, which accepts the string html
, png
, jpg
, pdf
and svg
.
cldfviz
tries to create similar looking maps for both output types so that you can explore your dataset using
HTML maps, and then create corresponding maps for a publication by just swapping the --format
option.
In reality, though, you'll have to do some fiddling with the --markersize
, --width
, --height
, --dpi
and
--padding-*
options to create satisfactory results for printable maps.
You can specify a filename for the map created by cldfviz.map
via the --output
option. By default, the
resulting map will be written to map.<format>
. For all formats the resulting map will be contained in a single
file. In the case of HTML maps, the map will need to be rendered in a browser with access to the internet, to
load Javascript libraries and map tiles.
Since CLDF datasets can reference languoids in the Glottolog catalog transparently, it is possible to supplement a dataset with geo data from Glottolog to locate its languages on a map.
To do so,
- the dataset must have a column specified as glottocode in
its
LanguageTable
(or Glottocodes as values of theLanguage_ID
column for metadata-free datasets) - the Glottolog data must be specified
- either as dataset locator locating
glottolog-cldf for the
--glottolog-cldf
option - or as path to a clone or download of the glottolog/glottolog repository
for the
--glottolog
option (if the repository has been cloned - a particular version of Glottolog can be specified using the--glottolog-version
option. See thecldfbench
docs for details on reference catalog maintenance.)
- either as dataset locator locating
glottolog-cldf for the
By default - i.e. without specifying anything - cldfviz.map
will plot all languages in a dataset (for which
geo coordinates can be determined) as dots on a map.
But you can also plot values of these languages for a selection of parameters in the dataset. To do so, specify
a comma-separated list of parameter IDs for the --parameters
option.
In addition, you can map other language properties give in the dataset's LanguageTable
by specifying a comma-separated
list of column names from the LanguageTable
for the --language-properties
option.
-
--*-colormaps
The default visual style forcldfviz.map
maps is "dots", i.e. colored circle markers plotted at the language's location on the map. Thus, the primary mechanism to influence the appearance is by specifying colormaps to control the colors used for corresponding parameter values.You don't have to specify any colormaps, but if you do, the number of colormaps specified for
--colormaps
(and--language-properties-colormaps
respectively) must match the number of parameters (and language properties respectively) to be plotted.For details about how to specify colormaps, see colormaps.md.
-
--markersize
The size of the map markers is controlled via the--markersize
option. You might need to experiment a bit to figure out a perfect size, since "size in pixels" may translate to quite different optics depending on screen size,--dpi
settings, projections, etc.
There's a handfull of options to control the overall appearance of maps:
--title
: Specify a title for the map plot.--pacific-centered
: Flag to center maps of the whole world at the pacific, thus not cutting large language families in half.--language-labels
: Flag to display language names on the map. Note: This quickly gets crowded.--missing-value
: Specify a color used to indicate missing values. If not specified missing values will be omitted. Note that this setting will only include rows fromValueTable
havingnull
asValue
. It will not include syntheticnull
values for all languages in the dataset.--no-legend
: Flag to not add a legend to the map. This is mainly of interest for printable maps, e.g. when a legend is provided elsewhere in a paper.
The following options are only relevant for HTML maps:
--base-layer
: Specify a tile layer to use for the Leaflet maps. See cldfviz.map.leaflet for available layers.--with-layers
: Add a Leaflet layer control to toggle between displaying and hiding markers for individual values of a parameter.--with-layers-for-combinations
: Add a Leaflet layer control to toggle between displaying and hiding markers for individual combinations of values for the plotted parameters. Note: While this option allows more fine-grained control over the displayed markers (in comparison with--with-layers
), it may lead to unwieldy legends in case several parameters with multiple values are chosen.
The following options are only relevant for image (aka printable) maps:
--padding-left|right|top|bottom
: Specify the padding to be added to maps (around the bounding box of the displayed markers) in degrees.--extent
: Specify the explicit geographic extent of the map as comma-separated list of degrees for (left, right, top, bottom) edge of the map.--width
: Width of the figure in inches.--height
: Height of the figure in inches.--dpi
: Pixel density of the figure. The default of100
makes for rather small file size and is mostly suitable for experimentation. For printable quality you should set it to300
.--projection
: Map projection. For available projections, see https://scitools.org.uk/cartopy/docs/latest/crs/projections.html--with-stock-img
: Add a map underlay (using cartopy'sstock_img
method).--zorder
: Specify explit drawing order (i.e. specify what's plotted on top) by giving a JSON dictionary mapping parameter values to integers (the higher, the more on top).
We'll explain the usage of the command by using it with the WALS CLDF data. See the README for instructions how to download this data.
If you have data about languages linked to Glottolog via Glottocode and can format this data in a file
called values.csv
looking as follows:
ID,Language_ID,Parameter_ID,Value
1,stan1295,romance,false
2,stan1290,romance,true
3,ital1282,romance,true
you can "put it on a map" using the geo-data from Glottolog by running
$ cldfbench cldfviz.map values.csv --parameters romance --colormap tol \
--glottolog-cldf glottolog-cldf-4.7/ --format svg
While many typological datasets look like the one above (or like WALS), with one value per language and parameter, this may not
always be the case. APiCS, for example, has quite a few multi-valued features, e.g. Order of subject, object, and verb.
cldfviz.map
supports this (much like the APiCS web app does) by plotting small pie-charts as markers
in case of multi-valued languages:
$ cldfbench cldfviz.map cldf-datasets-apics-4ed59b5/cldf --parameters 1 --format svg --projection Mollweide --width 10
Note the difference in sector sizes between this map and the one on the APiCS site. The size of the sectors
on the APiCS site is weighted by a frequency. Fortunately, this frequency is available in the CLDF
data as well and can be used by cldfviz.map
, too:
$ cldfbench cldfviz.map cldf-datasets-apics-4ed59b5/cldf --parameters 1 --weight-col Frequency \
--format svg --projection Mollweide --width 10
cldfviz.map
can detect and display continuous variables, too. There are no continuous features in APiCS or WALS, but since
cldfviz.map
also works with
metadata-free CLDF datasets, let's
quickly create one. Using the UNIX shell tools sed
and awk
and the
tools of the csvkit toolbox, we
can run
csvgrep -c Latitude,Glottocode -r".+" wals-2020.3/languages.csv | \
csvcut -c ID,Glottocode,Latitude | \
awk '{if(NR==1){print $0",Parameter_ID"}else{print $0",latitude"}}' | \
sed 's/ID,Glottocode,Latitude,Parameter_ID/ID,Language_ID,Value,Parameter_ID/g' > values.csv
Let's break this down: The first line selects all WALS languages for which latitude and Glottocode is given.
The next line narrows the resulting CSV to just three columns - the future ID
, Language_ID
and Value
columns of our metadata-free StructureDataset. The awk
command adds a constant column Parameter_ID
,
and the sed
command renames the columns appropriately.
The resulting CSV looks as follows:
$ head -n 4 values.csv
ID,Language_ID,Value,Parameter_ID
aar,aari1239,6,latitude
aba,abau1245,-4,latitude
abb,chad1249,13.8333333333,latitude
Mapping metadata-free CLDF data always relies on Glottolog data for the geo-coordinates. Thus, we must point to it, when running
$ cldfbench cldfviz.map values.csv --parameters latitude \
--glottolog-cldf https://raw.githubusercontent.com/glottolog/glottolog-cldf/v4.7/cldf/cldf-metadata.json
Note that since we looked up coordinates in Glottolog, languages may be displayed at slightly different locations than above (when the coordinates in WALS differ). It may also be the case that languages are mapped to invalid Glottocodes (e.g. in this case Jugli).
Now we could have done this in a simpler way, too, because cldfviz.map
has a special option to display language
properties encoded as columns in the LanguageTable
as if they were parameters of the dataset. We can use this
option to visualize a claim from WALS' chapter 129 that there is a
strong correlation between values [for feature 129] and latitudinal location
cldfbench cldfviz.map wals-2020.3/ --parameters 129A --colormaps tol \
--markersize 20 --language-properties Latitude --pacific-centered
As seen above, cldfviz.map
can visualize multiple parameters at once. E.g. we can explore the related WALS
features 129A, 130A and 130B, selecting suitable colormaps for the two boolean parameters:
cldfbench cldfviz.map wals-2020.3/ --parameters 129A,130A,130B \
--colormaps base,base,tol --pacific-centered --markersize 30
With the leaflet library, we can create interactive maps which can be explored in a browser.
Running
cldfbench cldfviz.map wals-2020.3/ --base-layer USGS.USTopo --pacific-centered --colormaps tol
will create an HTML page map.html
and open it in the browser, thus rendering an interactive
map of the languages in the dataset.
For smaller language samples, it may be suitable to display the language names on the map, too. Here's WALS' feature 10B:
cldfbench cldfviz.map wals-2020.3/ --parameters 10B --colormaps tol --markersize 20 --language-labels
Leveraging the GeoJSON support in Leaflet, HTML maps allow
inclusion of an additional GeoJSON overlay (and an associated GeoJSON options object),
via --overlay-geojson
and --overlay-options
. One such overlay - the Terrestrial Ecoregions of the World -
is provided with cldfviz
.
cldfbench cldfviz.map wals-2020.3/ --parameters 10B --overlay-geojson ecoregions
If cldfviz
is installed with cartopy
similar maps to the ones shown above can also be created
in various image formats:
$ cldfbench cldfviz.map wals-2020.3/ --parameters 129A --colormaps tol --language-properties Latitude \
--pacific-centered --format svg --width 20 --height 10 --dpi 300 --markersize 20 --with-ocean \
--projection Mollweide
While these maps lack the interactivity of the HTML maps, they may be better suited for inclusion in print formats than screen shots of maps in the browser. They also provide some additional options like a choice between various map projections.
Going one step further, we might visualize data that has been synthesized on the fly. E.g. we can visualize the AES endangerment information given in the Glottolog CLDF data for the WALS languages:
Since we will alter the WALS CLDF data, we make a copy of it first:
cp -r wals-2020.3 wals-copy
And since we want to extract data from glottolog-cldf
, we download this, too, as explained
in the README.
Now we extract the AES data from Glottolog ...
csvgrep -c Parameter_ID -m"aes" glottolog-cldf-4.7/cldf/values.csv |\
csvgrep -c Value -m"NA" -i |\
csvcut -c Language_ID,Parameter_ID,Code_ID > aes1.csv
... and massage it into a form that can be appended to the WALS ValueTable
:
csvjoin -y 0 -c Glottocode,Language_ID wals-2020.3/cldf/languages.csv aes1.csv |\
csvcut -c Parameter_ID,Code_ID,ID |\
awk '{if(NR==1){print $0",ID"}else{print $0",aes-"NR}}' |\
sed 's/Parameter_ID,Code_ID,ID,ID/Parameter_ID,Value,Language_ID,ID/g' |\
csvcut -c ID,Language_ID,Parameter_ID,Value |\
awk '{if(NR==1){print $0",Code_ID,Comment,Source,Example_ID"}else{print $0",,,,"}}' > aes2.csv
Notes:
- The first
awk
call adds a unique valueID
. We cannot re-use the valueID
from Glottolog, because the mapping between WALS and Glottolog languages is many-to-one. - Using
awk
to manipulate CSV data is somewhat fragile, since it will break if the data contains multi-line cell content. To guard against that, you may compare the row count reported bycsvstat
with the line count fromwc -l
before usingawk
.
Now we append the values and a row for the ParameterTable
...
csvstack aes2.csv wals-copy/cldf/values.csv > values.csv
cp values.csv wals-copy/cldf
echo "ID,Name,Description,Chapter_ID" > aes_param.csv
echo "aes,AES,," >> aes_param.csv
csvstack aes_param.csv wals-copy/cldf/parameters.csv > parameters.csv
cp parameters.csv wals-copy/cldf
... and make sure the resulting dataset is valid:
cldf validate wals-copy/
Finally, we can plot the map:
cldfbench cldfviz.map wals-copy/ --pacific-centered --colormaps seq --parameters aes