Skip to content

Latest commit

 

History

History

parameters

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Parameters

Obviously, cross-linguistic data often compares languages using comparative concepts, e.g. typological features like in WALS, or Swadesh terms as in many wordlists.

While it may sometimes be enough to refer to such a concept by ID, e.g. using a value like 116A as parameterReference to refer to WALS feature 116A in a Structure Dataset, often additional metadata must be provided. This should be done in CLDF datasets by including a ParameterTable, i.e. a table with "dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#ParameterTable", and pointing to rows in this table using the parameterReference property in the ValueTable.

Typed parameter values

Often, values for parameters are just text, e.g. word forms in the case of a CLDF Wordlist. In this case, the text string representing the value in the CSV table can simply be interpreted "as is" by CLDF consumers.

Categorical or ordinal parameters

If a parameter represents a categorical (or ordinal) variable, It is recommended to provide the list of possible values in a CodeTable (possibly extended with a column indicating the ordering of these values in the case of ordinal variables). The ValueTable should then include a codeReference column, but also list the string value in the value column. While this introduces some redundancy, it ensures compatibility with somewhat simplistic data access methods which may be employed e.g. for data visualization.

Numeric parameter values (or values of other types)

Sometimes typological surveys use data binning to transform values of varying data types (often numeric) into categorical data. Ideally, though, this step should be left to data analysis, unless the "bins" have some theoretical foundation. To make it possible to store string representations of typed data in CSV while still specifying how this data should be interpreted, a columnSpec column can be added to the ParameterTable. CLDF consumers SHOULD then consult the value of this column when reading values associated with the parameter.

As an example, we use the Python package csvw to obtain a reader for typed data as specified by a columnSpec value:

>>> import json
>>> from csvw import Column
>>> # Read the datatype description from a string value of the columnSpec column:
>>> reader = Column.fromvalue(json.loads('{"datatype": {"base": "decimal", "minimum": "1", "maximum": "11"}}'))
>>> # Use this reader to interpret string values from the value column as appropriate Python objects:
>>> reader.read('3.4')
Decimal('3.4')
>>> reader.read('30')
...
ValueError: value must be <= 11

Tip

This mechanism even allows list-valued parameter values. If for example a parameter's value for columnSpec is the string {"datatype": "integer", "separator": " "} values for the parameter can be read as follows:

reader = Column.fromvalue(json.loads('{"datatype": "integer", "separator": " "}'))
reader.read('1 2 3')
[1, 2, 3]

See also the related discussion at #109

Example

The ExampleTable of a Wordlist from the Intercontinental Dictionary Series is described here: https://github.com/intercontinental-dictionary-series/lindseyende/blob/v2.0/cldf/cldf-metadata.json#L269-L300

Since the parameters in this Wordlist are the lexical concepts listed in the IDS concept list, the corresponding Concepticon concept sets are specified using the concepticonReference property.

ParameterTable: parameters.csv

Name/Property Datatype Cardinality Description
ID string singlevalued

A unique identifier for a row in a table.

To allow usage of identifiers as path components of URLs IDs must only contain alphanumeric characters, underscore and hyphen.

Name string unspecified

A title, name or label for an entity.

Description string unspecified

A description for an entity.

ColumnSpec json singlevalued

A column specification given as JSON representation of a CSVW column description. This column specification may be used by CLDF consumers to read a parameter's value as typed data.

Note that a CSVW datatye description is not sufficient, because parsing a string value must also be informed by the column properties null and separator.