Skip to content

Commit

Permalink
Merge pull request #796 from SemanticComputing/data
Browse files Browse the repository at this point in the history
FI: add documentation
  • Loading branch information
matyaskopp authored Oct 8, 2023
2 parents 7127330 + 0e40572 commit d38275c
Showing 1 changed file with 21 additions and 1 deletion.
22 changes: 21 additions & 1 deletion Samples/ParlaMint-FI/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,32 @@

### Characteristics of the national parliament

The Parliament of Finland is the unicameral and supreme legislature of Finland. The Parliament consists of 200 members, 199 of whom are elected every four years from 13 multi-member districts electing 7 to 36 members using the proportional D'Hondt method. In addition, there is one member from Åland. Most MPs work in parliamentary groups which correspond with the political parties.

The ParlaMint-FI corpus contains the minutes of the Finnish Parliament's plenary sessions from parliamentary session 2015 to parliamentary session 2021 (28.4.2015-28.1.2022).

### Data source and acquisition

The minutes of the Finnish Parliament's plenary sessions from parliamentary session 2015 onwards are freely available on the Open Data service of the Parliament of Finland (https://avoindata.eduskunta.fi) via an API in XML format (wrapped in JSON). The minutes were fetched from the API using a Python script.

Biographical information (birth and death dates, sex) of speakers who are not MPs have been fetched from other sources, namely regarding the chancellors of justice and parliamentary ombudsmen are fetched from Wikidata via a SPARQL query.

### Data encoding process

The original XML data was transformed into TEI-XML using a series of Python and shell scripts (https://github.com/SemanticComputing/semparl-data-transformation).

### Corpus-specific metadata

There is no metadata available going beyond what’s common for all corpora.

### Structure

### Linguistic annotation
There are no additional TEI elements beyond what’s described in the ParlaMint schema.

### Linguistic annotation

The linguistic annotation was generated using a Python script utilizing a previously generated linguistically annotated version of the minutes of the Finnish Parliament's plenary sessions in RDF format (which was produced in the Finnish Semantic Parliament project (https://seco.cs.aalto.fi/projects/semparl/en/)).

There is an issue in the linguistically annotated data regarding speeches that contain transcriber comments and/or interruptions. Transcriber comments and interruptions, and also the parts of the speeches that are after a transcriber comment or interruption aren't included in the linguistically annotated version.

There is no specific linguistic annotation going beyond what’s common for all corpora.

0 comments on commit d38275c

Please sign in to comment.