From 2704b29bc010c97eae88910d94e6018021542df1 Mon Sep 17 00:00:00 2001 From: Jouni Tuominen Date: Thu, 28 Sep 2023 15:09:21 +0300 Subject: [PATCH 1/2] FI: add documentation --- Samples/ParlaMint-FI/README.md | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/Samples/ParlaMint-FI/README.md b/Samples/ParlaMint-FI/README.md index b31affdcc..54459a2e2 100644 --- a/Samples/ParlaMint-FI/README.md +++ b/Samples/ParlaMint-FI/README.md @@ -6,12 +6,30 @@ ### Characteristics of the national parliament +The Parliament of Finland is the unicameral and supreme legislature of Finland. The Parliament consists of 200 members, 199 of whom are elected every four years from 13 multi-member districts electing 7 to 36 members using the proportional D'Hondt method. In addition, there is one member from Åland. Most MPs work in parliamentary groups which correspond with the political parties. + +The ParlaMint-FI corpus contains the minutes of the Finnish Parliament's plenary sessions from parliamentary session 2015 to parliamentary session 2021 (28.4.2015-28.1.2022). + ### Data source and acquisition +The minutes of the Finnish Parliament's plenary sessions from parliamentary session 2015 onwards are freely available on the Open Data service of the Parliament of Finland (https://avoindata.eduskunta.fi) via an API in XML format (wrapped in JSON). The minutes were fetched from the API using a Python script. + ### Data encoding process +The original XML data was transformed into TEI-XML using a series of Python and shell scripts (https://github.com/SemanticComputing/semparl-data-transformation). + ### Corpus-specific metadata +There is no metadata available going beyond what’s common for all corpora. + ### Structure -### Linguistic annotation \ No newline at end of file +There are no additional TEI elements beyond what’s described in the ParlaMint schema. + +### Linguistic annotation + +The linguistic annotation was generated using a Python script utilizing a previously generated linguistically annotated version of the minutes of the Finnish Parliament's plenary sessions in RDF format (which was produced in the Finnish Semantic Parliament project (https://seco.cs.aalto.fi/projects/semparl/en/)). + +There is an issue in the linguistically annotated data regarding speeches that contain transcriber comments and/or interruptions. Transcriber comments and interruptions, and also the parts of the speeches that are after a transcriber comment or interruption aren't included in the linguistically annotated version. + +There is no specific linguistic annotation going beyond what’s common for all corpora. \ No newline at end of file From 0e405726abfa147fb1a52d51905fbd0a6f3839ca Mon Sep 17 00:00:00 2001 From: Jouni Tuominen Date: Fri, 29 Sep 2023 10:08:46 +0300 Subject: [PATCH 2/2] FI: documentation: add note on fetching biographical information on non-MPs --- Samples/ParlaMint-FI/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Samples/ParlaMint-FI/README.md b/Samples/ParlaMint-FI/README.md index 54459a2e2..57eb80d8d 100644 --- a/Samples/ParlaMint-FI/README.md +++ b/Samples/ParlaMint-FI/README.md @@ -14,6 +14,8 @@ The ParlaMint-FI corpus contains the minutes of the Finnish Parliament's plenary The minutes of the Finnish Parliament's plenary sessions from parliamentary session 2015 onwards are freely available on the Open Data service of the Parliament of Finland (https://avoindata.eduskunta.fi) via an API in XML format (wrapped in JSON). The minutes were fetched from the API using a Python script. +Biographical information (birth and death dates, sex) of speakers who are not MPs have been fetched from other sources, namely regarding the chancellors of justice and parliamentary ombudsmen are fetched from Wikidata via a SPARQL query. + ### Data encoding process The original XML data was transformed into TEI-XML using a series of Python and shell scripts (https://github.com/SemanticComputing/semparl-data-transformation).