Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data ES-PV #510

Open
wants to merge 29 commits into
base: data
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
303caf4
sample corpus (updated version)
miruskieta May 24, 2023
3a63211
fix main title
miruskieta May 24, 2023
9df6849
Update ParlaMint-ES-PV_2015-02-05.xml add a reference to term event
miruskieta May 24, 2023
da94263
Update ParlaMint-ES-PV_2015-02-12.xml add a reference to term event
miruskieta May 24, 2023
14c96d3
Update ParlaMint-ES-PV_2015-02-13.xml
miruskieta May 24, 2023
89107eb
Update ParlaMint-ES-PV_2015-02-19.xml
miruskieta May 24, 2023
6a95506
Update ParlaMint-ES-PV_2015-02-27.xml
miruskieta May 24, 2023
d5ee8ad
Merge branch 'main' into main
miruskieta May 24, 2023
a91dd07
Merge branch 'data' into main
matyaskopp May 24, 2023
68d842e
Update ParlaMint-ES-PV_2015-02-12.xml
miruskieta May 24, 2023
c01cded
Update ParlaMint-ES-PV_2015-02-05.xml
miruskieta May 24, 2023
4b1a001
Update ParlaMint-ES-PV_2015-02-12.xml
miruskieta May 24, 2023
86f5628
Update ParlaMint-ES-PV_2015-02-13.xml
miruskieta May 24, 2023
16283b0
Update ParlaMint-ES-PV_2015-02-19.xml
miruskieta May 24, 2023
4e41742
Update ParlaMint-ES-PV_2015-02-27.xml
miruskieta May 24, 2023
c26d024
Delete ParlaMint-ES-PV_2015-02-05.ana.xml
miruskieta May 24, 2023
bd89d48
Delete ParlaMint-ES-PV_2015-02-12.ana.xml
miruskieta May 24, 2023
4fb4456
Delete ParlaMint-ES-PV_2015-02-13.ana.xml
miruskieta May 24, 2023
600b593
Delete ParlaMint-ES-PV_2015-02-19.ana.xml
miruskieta May 24, 2023
9acefa2
Delete ParlaMint-ES-PV_2015-02-27.ana.xml
miruskieta May 24, 2023
fb8443f
ana files updated
miruskieta May 24, 2023
80f53e9
Update README.md
miruskieta May 24, 2023
34ebac1
Delete ParlaMint-ES-PV_2015-02-05.ana.xml
miruskieta Jul 4, 2023
238ab0e
corrected errors
miruskieta Sep 10, 2023
7d13b8a
updated title
miruskieta Sep 10, 2023
a614703
updated a member
miruskieta Sep 10, 2023
2cf89dc
tapia error
miruskieta Sep 10, 2023
c03b3ff
Update ParlaMint-ES-PV.ana.xml
miruskieta Sep 28, 2023
6b4efe9
Update README.md
miruskieta Oct 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4,364 changes: 4,364 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-ES-PV.ana.xml

Large diffs are not rendered by default.

4,331 changes: 4,331 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-ES-PV.xml

Large diffs are not rendered by default.

1,461 changes: 1,461 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-ES-PV_2015-02-05.ana.xml

Large diffs are not rendered by default.

1,460 changes: 1,460 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-ES-PV_2015-02-05.xml

Large diffs are not rendered by default.

1,119 changes: 1,119 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-ES-PV_2015-02-12.ana.xml

Large diffs are not rendered by default.

1,118 changes: 1,118 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-ES-PV_2015-02-12.xml

Large diffs are not rendered by default.

1,472 changes: 1,472 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-ES-PV_2015-02-13.ana.xml

Large diffs are not rendered by default.

1,471 changes: 1,471 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-ES-PV_2015-02-13.xml

Large diffs are not rendered by default.

1,463 changes: 1,463 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-ES-PV_2015-02-19.ana.xml

Large diffs are not rendered by default.

1,462 changes: 1,462 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-ES-PV_2015-02-19.xml

Large diffs are not rendered by default.

1,613 changes: 1,613 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-ES-PV_2015-02-27.ana.xml

Large diffs are not rendered by default.

1,612 changes: 1,612 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-ES-PV_2015-02-27.xml

Large diffs are not rendered by default.

40 changes: 40 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-taxonomy-NER.ana.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
<?xml version="1.0" encoding="UTF-8"?>
<taxonomy xmlns="http://www.tei-c.org/ns/1.0"
xml:id="ParlaMint-taxonomy-NER.ana"
xml:lang="mul">
<desc xml:lang="en">
<term>Named entities</term>
</desc>
<category xml:id="PER">
<catDesc xml:lang="cs">
<term>osoba</term>
</catDesc>
<catDesc xml:lang="en">
<term>person</term>
</catDesc>
</category>
<category xml:id="LOC">
<catDesc xml:lang="cs">
<term>místo</term>
</catDesc>
<catDesc xml:lang="en">
<term>location</term>
</catDesc>
</category>
<category xml:id="ORG">
<catDesc xml:lang="cs">
<term>organizace</term>
</catDesc>
<catDesc xml:lang="en">
<term>organization</term>
</catDesc>
</category>
<category xml:id="MISC">
<catDesc xml:lang="cs">
<term>různé</term>
</catDesc>
<catDesc xml:lang="en">
<term>miscellaneous</term>
</catDesc>
</category>
</taxonomy>
200 changes: 200 additions & 0 deletions Data/ParlaMint-ES-PV/ParlaMint-taxonomy-UD-SYN.ana.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
<?xml version="1.0" encoding="UTF-8"?>
<taxonomy xmlns="http://www.tei-c.org/ns/1.0"
xml:id="ParlaMint-taxonomy-UD-SYN.ana"
xml:lang="mul">
<desc xml:lang="en">
<term>UD syntactic relations</term>
</desc>
<category xml:id="acl">
<catDesc xml:lang="en">
<term>acl</term>: Clausal modifier of noun (adjectival clause)</catDesc>
</category>
<category xml:id="acl_relcl">
<catDesc xml:lang="en">
<term>acl:relcl</term>: Relative clause modifier</catDesc>
</category>
<category xml:id="advcl">
<catDesc xml:lang="en">
<term>advcl</term>: Adverbial clause modifier</catDesc>
</category>
<category xml:id="advmod">
<catDesc xml:lang="en">
<term>advmod</term>: Adverbial modifier</catDesc>
</category>
<category xml:id="advmod_emph">
<catDesc xml:lang="en">
<term>advmod:emph</term>: Emphasizing word, intensifier</catDesc>
</category>
<category xml:id="amod">
<catDesc xml:lang="en">
<term>amod</term>: Adjectival modifier</catDesc>
</category>
<category xml:id="appos">
<catDesc xml:lang="en">
<term>appos</term>: Appositional modifier</catDesc>
</category>
<category xml:id="aux">
<catDesc xml:lang="en">
<term>aux</term>: Auxiliary</catDesc>
</category>
<category xml:id="aux_pass">
<catDesc xml:lang="en">
<term>aux:pass</term>: Passive auxiliary</catDesc>
</category>
<category xml:id="case">
<catDesc xml:lang="en">
<term>case</term>: Case marking</catDesc>
</category>
<category xml:id="cc">
<catDesc xml:lang="en">
<term>cc</term>: Coordinating conjunction</catDesc>
</category>
<category xml:id="ccomp">
<catDesc xml:lang="en">
<term>ccomp</term>: Clausal complement</catDesc>
</category>
<category xml:id="cc_preconj">
<catDesc xml:lang="en">
<term>cc:preconj</term>: Preconjunct</catDesc>
</category>
<category xml:id="compound">
<catDesc xml:lang="en">
<term>compound</term>: Compound</catDesc>
</category>
<category xml:id="conj">
<catDesc xml:lang="en">
<term>conj</term>: Conjunct</catDesc>
</category>
<category xml:id="cop">
<catDesc xml:lang="en">
<term>cop</term>: Copula</catDesc>
</category>
<category xml:id="csubj">
<catDesc xml:lang="en">
<term>csubj</term>: Clausal subject</catDesc>
</category>
<category xml:id="csubj_pass">
<catDesc xml:lang="en">
<term>csubj:pass</term>: Clausal passive subject</catDesc>
</category>
<category xml:id="dep">
<catDesc xml:lang="en">
<term>dep</term>: Unspecified dependency</catDesc>
</category>
<category xml:id="det">
<catDesc xml:lang="en">
<term>det</term>: Determiner</catDesc>
</category>
<category xml:id="det_numgov">
<catDesc xml:lang="en">
<term>det:numgov</term>: Pronominal quantifier governing the case of the noun</catDesc>
</category>
<category xml:id="det_nummod">
<catDesc xml:lang="en">
<term>det:nummod</term>: Pronominal quantifier agreeing in case with the noun</catDesc>
</category>
<category xml:id="discourse">
<catDesc xml:lang="en">
<term>discourse</term>: Discourse element</catDesc>
</category>
<category xml:id="expl">
<catDesc xml:lang="en">
<term>expl</term>: Expletive</catDesc>
</category>
<category xml:id="expl_pass">
<catDesc xml:lang="en">
<term>expl:pass</term>: Reflexive pronoun used in reflexive passive</catDesc>
</category>
<category xml:id="expl_pv">
<catDesc xml:lang="en">
<term>expl:pv</term>: Reflexive clitic with an inherently reflexive verb</catDesc>
</category>
<category xml:id="expl_impers">
<catDesc xml:lang="en">
<term>expl:impers</term>:?impersonal?</catDesc>
</category>
<category xml:id="fixed">
<catDesc xml:lang="en">
<term>fixed</term>: Fixed multiword expression</catDesc>
</category>
<category xml:id="flat">
<catDesc xml:lang="en">
<term>flat</term>: Flat multiword expression</catDesc>
</category>
<category xml:id="flat_foreign">
<catDesc xml:lang="en">
<term>flat:foreign</term>: Flat multiword expression: foreign</catDesc>
</category>
<category xml:id="flat_name">
<catDesc xml:lang="en">
<term>flat:name</term>: Flat name</catDesc>
</category>
<category xml:id="iobj">
<catDesc xml:lang="en">
<term>iobj</term>: Indirect object</catDesc>
</category>
<category xml:id="mark">
<catDesc xml:lang="en">
<term>mark</term>: Marker</catDesc>
</category>
<category xml:id="nmod">
<catDesc xml:lang="en">
<term>nmod</term>: Nominal modifier</catDesc>
</category>
<category xml:id="nsubj">
<catDesc xml:lang="en">
<term>nsubj</term>: Nominal subject</catDesc>
</category>
<category xml:id="nsubj_pass">
<catDesc xml:lang="en">
<term>nsubj:pass</term>: Passive nominal subject</catDesc>
</category>
<category xml:id="nummod">
<catDesc xml:lang="en">
<term>nummod</term>: Numeric modifier</catDesc>
</category>
<category xml:id="nummod_gov">
<catDesc xml:lang="en">
<term>nummod:gov</term>: Numeric modifier governing the case of the noun</catDesc>
</category>
<category xml:id="obj">
<catDesc xml:lang="en">
<term>obj</term>: Object</catDesc>
</category>
<category xml:id="obl">
<catDesc xml:lang="en">
<term>obl</term>: Oblique nominal</catDesc>
</category>
<category xml:id="obl_arg">
<catDesc xml:lang="en">
<term>obl:arg</term>: Oblique argument</catDesc>
</category>
<category xml:id="orphan">
<catDesc xml:lang="en">
<term>orphan</term>: Orphan</catDesc>
</category>
<category xml:id="parataxis">
<catDesc xml:lang="en">
<term>parataxis</term>: Parataxis</catDesc>
</category>
<category xml:id="punct">
<catDesc xml:lang="en">
<term>punct</term>: Punctuation</catDesc>
</category>
<category xml:id="reparandum">
<catDesc xml:lang="en">
<term>reparandum</term>: Overridden disfluency (here used for program mistakes!)</catDesc>
</category>
<category xml:id="root">
<catDesc xml:lang="en">
<term>root</term>: Root</catDesc>
</category>
<category xml:id="vocative">
<catDesc xml:lang="en">
<term>vocative</term>: Vocative</catDesc>
</category>
<category xml:id="xcomp">
<catDesc xml:lang="en">
<term>xcomp</term>: Open clausal complement</catDesc>
</category>
</taxonomy>
28 changes: 25 additions & 3 deletions Data/ParlaMint-ES-PV/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,37 @@


## Documentation
ParlaMint is a project that aims to (1) create a multilingual set of comparable corpora of parliamentary proceedings uniformly encoded according to the Parla-CLARIN recommendations and covering the COVID-19 pandemic from November 2019 as well as the earlier period from 2015 to serve as a reference corpus; (2) process the corpora linguistically to add Universal Dependencies syntactic structures and Named Entity annotation; (3) make the corpora available through concordancers and Parlameter; and (4) build use cases in Political Sciences and Digital Humanities based on the corpus data.

### Characteristics of the national parliament
The Basque Parliament (Basque: Eusko Legebiltzarra, Spanish: Parlamento Vasco) is the legislative body of the Basque Autonomous Community of Spain and the elected assembly to which the Basque Government is responsible.

The Parliament meets in the Basque capital, Vitoria-Gasteiz, although the first session of the modern assembly, as constituted by the Statute of Autonomy of the Basque Country, was held in Guernica – the symbolic centre of Basque freedoms – on 31 March 1980.

It is composed of seventy-five deputies representing citizens from the three provinces of the Basque autonomous community. Each province (Álava, Gipuzkoa and Biscay) elects the same number of deputies.

URL: https://www.legebiltzarra.eus


### Data source and acquisition
On March 9, 2021, the Basque Parliament Office adopted the decision to share the transcripts of the Basque Parliament (and their translations, when possible) to contribute to the creation of the Basque corpus in the ParlaMint project (Nº 2021/1887).

Minutes of the Basque Parliament, Term X, XI and XII (2015 - 2022).




### Data encoding process

### Corpus-specific metadata
The conversion consists of several steps to transform and enrich the html source.

- The first step was to transform the DOC to xml.
- The second step was to add a language detection with a Python script.

The main challenges were related to detect all the comments. Some of them, short texts comments are stil in the text.

### Linguistic annotation

### Structure
Part-of-Speech (PoS) analysis of Basque and Spanish sentences was conducted using 'udpipe2.pl' from the following service: http://lindat.mff.cuni.cz/services/udpipe.

### Linguistic annotation
A training process for a multilingual language model, xlm-roberta-large, involving fine-tuning it specifically for Basque and Spanish Named Entity Recognition and Classification (NERC) tasks, was conducted utilizing the following entity tags: PER (Person), LOC (Location), ORG (Organization), and MISC (Miscellaneous).