Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES Feedback #696

Open
5 of 6 tasks
matyaskopp opened this issue Jun 20, 2023 · 13 comments · Fixed by #692
Open
5 of 6 tasks

ES Feedback #696

matyaskopp opened this issue Jun 20, 2023 · 13 comments · Fixed by #692
Milestone

Comments

@matyaskopp
Copy link
Collaborator

matyaskopp commented Jun 20, 2023

@charlicruz, @calzada

Improve note annotations

  • notes

eg:

<note>Aplausos</note>

should be

<kinesic type="applause">
 <desc>Aplausos</desc>
</kinesic>

most common notes with frequencies:

  21339 <note>Aplausos</note>
   4356 <note>Rumores</note>
   3777 <note>Pausa</note>
   1629 <note>Pausa.-Una trabajadora del servicio de limpieza procede a desinfectar la tribuna de oradores</note>
    698 <note>aplausos</note>
    629 <note>EAJ-PNV</note>
    568 <note>Risas</note>
    448 <note>Protestas</note>
    326 <note>rumores</note>
    305 <note>Aplausos.-Rumores</note>
    261 <note>Rumores.-Aplausos</note>
    245 <note>Aplausos.</note>
    215 <note>La señora presidenta ocupa la Presidencia</note>
    173 <note>Continúan los rumores</note>
    161 <note>El señor vicepresidente, Prendes Prendes, ocupa la Presidencia</note>
    146 <note>Asentimiento</note>
    144 <note>Risas.-Aplausos</note>
    143 <note>Protestas.-Aplausos</note>
    136 <note>Convergència i Unió</note>
    127 <note>Risas y aplausos</note>
    123 <note>Aplausos de las señoras y los señores diputados del Grupo Parlamentario VOX, puestos en pie</note>
    119 <note>Pausa.-Una trabajadora del servicio de limpieza procede a desinfectar la tribuna de oradores.</note>
    111 <note>Prolongados aplausos</note>
    101 <note>Muestra un documento</note>
     96 <note>Aplausos.-Protestas</note>
     92 <note>La señora vicepresidenta, Navarro Garzón, ocupa la Presidencia</note>
     90 <note>La señora vicepresidenta, Villalobos Talero, ocupa la Presidencia</note>
     88 <note>El señor presidente ocupa la Presidencia</note>
     82 <note>Varios señores diputados: ¡Muy bien!-Aplausos</note>
     80 <note>Rumores y protestas</note>
     78 <note>risas</note>
     78 <note>protestas</note>
     63 <note>nueva</note>
     63 <note>Aplausos de las señoras y los señores diputados del Grupo Parlamentario Confederal de Unidos Podemos-En Comú Podem-En Marea, puestos en pie</note>
     62 <note>Una trabajadora del servicio de limpieza procede a desinfectar la tribuna de oradores</note>
     60 <note>Rumores.-Protestas</note>
     59 <note>El señor vicepresidente, Rodríguez Gómez de Celis, ocupa la Presidencia</note>
     57 <note>Aplausos.-Varios señores diputados: ¡Muy bien!</note>
     52 <note>Risas.-Rumores</note>
     51 <note>Muestra un gráfico</note>
     47 <note>PNV</note>
     46 <note>Pausa.</note>
     46 <note>muestra un documento</note>
     46 <note>Aplausos de las señoras y los señores diputados del Grupo Parlamentario Ciudadanos, puestos en pie</note>
     41 <note>Muestra una fotografía</note>
     40 <note>Varias señoras y señores diputados: ¡Muy bien!-Aplausos</note>
     40 <note>Rumores.-Risas</note>
     38 <note>Aplausos de las señoras y los señores diputados del Grupo Parlamentario Socialista, puestos en pie</note>
     37 <note>Un señor diputado: ¡Muy bien!-Aplausos</note>
     37 <note>nuevo</note>
     37 <note>Aplausos.-Un señor diputado: ¡Muy bien!</note>
     35 <note>Aplausos de las señoras y los señores diputados del Grupo Parlamentario Popular en el Congreso, puestos en pie</note>
     34 <note>Democràcia i Llibertat</note>
     32 <note>Continúan las protestas</note>
     29 <note>El señor vicepresidente, Barrero López, ocupa la Presidencia</note>
     29 <note>CONVERGÈNCIA I UNIÓ</note>
     28 <note>La señora vicepresidenta, Montserrat Montserrat, ocupa la Presidencia</note>
     27 <note>La señora vicepresidenta, Elizo Serrano, ocupa la Presidencia</note>
     26 <note>Pausa. Una trabajadora del servicio de limpieza procede a desinfectar la tribuna de oradores</note>
     26 <note>Denegación</note>
     25 <note>Un señor diputado pronuncia palabras que no se perciben</note>
     25 <note>La señora vicepresidenta, Romero Sánchez, ocupa la Presidencia</note>
     23 <note>Pronuncia palabras en catalán</note>
     23 <note>muestra un gráfico</note>
     23 <note>Aplausos.-Risas</note>

Missing who when chair

  • chairman speeches

Missing who attribute
https://github.com/matyaskopp/PARLAMINT-ES-MC/blob/4dc6c5f53597e2bdc3b3925a4424cb38764a4931/ParlaMint.sample/ParlaMint-ES_2015-01-20-CD150120.xml#L100-L103

<u xml:id="ParlaMint-ES_2015-01-20-CD150120.u1" ana="#chair">
  <seg xml:id="ParlaMint-ES_2015-01-20-CD150120.u1.1">Se abre la sesión.</seg>
  <seg xml:id="ParlaMint-ES_2015-01-20-CD150120.u1.2">Convalidación o derogación del Real Decreto-ley 15/2014, de 19 de diciembre, de modificación del Régimen Económico y Fiscal de Canarias. Para presentar el real decreto-ley, tiene la palabra en nombre del Gobierno el ministro de Hacienda y Administraciones Públicas.</seg>
</u>

source:
https://github.com/matyaskopp/PARLAMINT-ES-MC/blob/4dc6c5f53597e2bdc3b3925a4424cb38764a4931/CD.sample/CD150120.xml#L57-L76

<speaker>
<name>UNKNOWN</name>
<birth_date>UNKNOWN</birth_date>
<birth_place country="ES">UNKNOWN</birth_place>
<status>NA</status>
<gender>UNKNOWN</gender>
<institution>
<ni country="ES">CD</ni>
</institution>
<constituency country="ES" region="UNKNOWN"/>
<affiliation>
<national_party>UNKNOWN</national_party>
<cd group="UNKNOWN"/>
</affiliation>
<post>PRESIDENTE</post>
</speaker>
<speech id="spXY" language="ES">
Se abre la sesión. 
Convalidación o derogación del Real Decreto-ley 15/2014, de 19 de diciembre, de modificación del Régimen Económico y Fiscal de Canarias. Para presentar el real decreto-ley, tiene la palabra en nombre del Gobierno el ministro de Hacienda y Administraciones Públicas. 
</speech>

chairman name is present in source file:
https://github.com/matyaskopp/PARLAMINT-ES-MC/blob/4dc6c5f53597e2bdc3b3925a4424cb38764a4931/CD.sample/CD150120.xml#L52

<body>
  <chair who="JESÚS POSADA MORENO">
    <!-- all speeches -->
  </chair>
</body>

list of chairmans with frequencies:

cat CD/*.xml|grep '<chair'|sed 's/^ *//;s/\r//'|sort|uniq -c|sort -nr
    208 <chair who="MERITXELL BATET LAMAÑA">
    161 <chair who="ANA MARÍA PASTOR JULIÁN">
     56 <chair who="JESÚS POSADA MORENO">
      8 <chair who="PATXI LÓPEZ ÁLVAREZ">
      5 <chair who="NA">
      5 <chair who="ALFONSO RODRÍGUEZ GÓMEZ DE CELIS">
      3 <chair who="PATXI LÓPEZ ÁLVAREZ ">
      2 <chair who="JOSÉ IGNACIO PRENDES PRENDES">
      1 <chair who="MERITXELL BATET LAMAÑA ">
      1 <chair who="CELIA VILLALOBOS TALERO VICEPRESIDENTA PRIMERA">

No guest speakers ???

  • guest speaker role

This is a bit strange. In ES parliament, there is no speaker labelled with guest category (ana="#guest")

Missing parliamentaryGroups

  • parliamentaryGroup

it seems that source data contain parliamentary groups. They are now required (https://clarin-eric.github.io/ParlaMint/#sec-parties) in ParlaMint (parties can be converted into groups or better, encode both party+groups)

ParlaMint requires that a corpus must use parliamentary groups, while the use of political parties is optional. Note that if political parties are used, it is also expected to encode which political parties constitute a parliamentary group; this is encoded via the element, as further explained in the Section on Relations between organisations.

list of parliamentary groups with number of affiliated persons

cat CD/*.xml|tr '\r\n' '  ' |sed 's/<speaker>/\n<speaker>/g;s/<\/speaker>/\n/g'|grep speaker |sed 's/^.*<name>//;s@</name.*group="@\t@;s@".*$@@;'|grep -v '<'|sort|uniq|cut -f 2|sort|uniq -c
     18 GC-CiU
      1 GC-DL
     48 GCs
     47 GCUP-EC-EM
     44 GCUP-EC-GC
      5 GEH Bildu
      5 GER
     13 GIP
     37 GMx
      1 GMX
    259 GP
     13 GPlu
     15 GR
    264 GS
      7 GUPyD
     10 GV (EAJ-PNV)
      1 GVox
     54 GVOX
     96 NA
     12 UNKNOWN

Parliamentary group - party pairs:

cat CD/*.xml|tr '\r\n' '  ' |sed 's/<speaker>/\n<speaker>/g;s/<\/speaker>/\n/g'|grep speaker |sed 's/^.*<national_party>//;s@</national_party.*group="@\t@;s@".*$@@;'|grep -v '<'|sort|uniq
AMAIUR	GMx
BNG	GMx
BNG	GPlu
CCa-PNC	GMx
CCa-PNC-NC	GMx
CC-NC-PNC	GMx
CDC	GMx
CiU	GC-CiU
COMPROMÍS-Q	GMx
C-P-EUPV	GCUP-EC-EM
C-P-EUPV	GMx
Cs	GCs
CUP-PR	GMx
DL	GC-DL
EAJ-PNV	GV (EAJ-PNV)
ECP	GCUP-EC-EM
ECP	GCUP-EC-GC
ECP-GUAYEM EL CANVI	GCUP-EC-GC
EC-UP	GCUP-EC-GC
EH Bildu	GEH Bildu
EH Bildu	GMx
EM-P-A-EU	GCUP-EC-EM
ERC-CATSÍ	GER
ERC-RI.cat	GMx
ERC-S	GR
EUiA	GIP
EUPV	GIP
GB	GMx
 GP	GP
ICV	GIP
IC-V	GMX
IZQ-PLU	GIP
JxCat-JUNTS	GPlu
JxCat-JUNTS(Junts)	GPlu
MÁS PAÍS-EQUO	GPlu
MÉS COMPROMÍS	GPlu
NA+	GMx
NA	NA
NC-CCa-PNC	GMx
PP-EU	GP
PP-FORO	GMx
PP-FORO	GP
PP	GP
PP-PAR	GP
PRC	GMx
PSC(PSC-PSOE)	GS
PSC-PSOE	GS
PsdeG-PSOE	GS
PSdeG-PSOE	GS
PSdG-PSOE	GS
PSE-EE-PSOE	GS
PSOEdeAndalucía	GS
PSOE	GS
PSOE	NA
PSOE-NCa	GS
¡Teruel Existe!	GMx
UNKNOWN	UNKNOWN
UP	GCUP-EC-EM
UP	GCUP-EC-GC
UPM	GCUP-EC-EM
UPN	GMx
UPN-PP	GMx
UPyD	GUPyD
Vox	GVox
Vox	GVOX

Missing translation

  • translation

https://github.com/matyaskopp/ParlaMint/blob/e48f74e3c66adb5a32b8d1051be3d2ebb58c097c/Data/ParlaMint-ES/ParlaMint-taxonomy-parla.legislature.xml#L200-L207

                  <category xml:id="parla.meeting.ceremonial">
                     <catDesc xml:lang="es">
                        <term>--</term>
                     </catDesc>
                     <catDesc xml:lang="en">
                        <term>Ceremonial meeting</term>
                     </catDesc>
                  </category>

parliamentaryGroup affiliation overlaps

  • overlapping parliamentaryGroup (party) affiliations

I have discovered this accidentally because it produces a different error:

Error: ERROR: multiple party statuses for MartínezMaría on 2021-01-28: Coalition Opposition

   <person xml:id="MartínezMaría">
      <persName>
         <forename>María</forename>
         <forename>Luz</forename>
         <surname>Martínez</surname>
         <surname>Seijo</surname>
      </persName>
      <sex value="F"/>
      <birth when="1968-11-10"/>
      <affiliation ref="#CD" role="member" from="2016-04-19" to="2023-02-14"/>
      <affiliation role="member" ref="#party.Cs" from="2020-02-11" to="2023-02-14"/>
      <affiliation role="member" ref="#party.PP" from="2018-06-19" to="2022-12-21"/>
      <affiliation role="member"
                   ref="#party.PSOE"
                   from="2016-04-19"
                   to="2021-12-15"/>
      <affiliation role="member" ref="#party.UP" from="2016-12-13" to="2019-02-13"/>
   </person>

for this error, there can be many reasons:

  • MartínezMaría migrates between parties a lot (embodied multiple times to a single party) - the script is not able to deal with it
  • there is a namesake
  • bug in source data
@matyaskopp matyaskopp linked a pull request Jun 20, 2023 that will close this issue
@calzada
Copy link

calzada commented Jun 21, 2023

To identify member of Parliament, see .+?.
Here is a list of all members of Parliament as they appear in the files:
'MINISTRA DE ASUNTOS EXTERIORES, UNIÓN EUROPEA Y COOPERACIÓN' 13 occurrences
'MINISTRA DE ASUNTOS SOCIALES Y AGENDA 2030' 1 occurrences
'MINISTRA DE CIENCIA E INNOVACIÓN' 13 occurrences
' MINISTRA DE DEFENSA' 64 occurrences
'MINISTRA DE DERECHOS SOCIALES Y AGENDA 2030' 31 occurrences
'MINISTRA DE EDUCACIÓN Y FORMACIÓN PROFESIONAL' 50 occurrences
'MINISTRA DE EXTERIORES, UNIÓN EUROPEA Y COOPERACIÓN' 3 occurrences
' MINISTRA DE HACIENDA' 74 occurrences
'MINISTRA DE HACIENDA Y FUNCIÓN PÚBLICA' 180 occurrences
' MINISTRA DE HACIENDA Y PORTAVOZ DEL GOBIERNO' 5 occurrences
' MINISTRA DE IGUALDAD' 66 occurrences
'MINISTRA DE INDUSTRIA, COMERCIO Y TURISMO' 53 occurrences
'MINISTRA DE INDUSTRIA, COMERCIO Y TURISMO ' 1 occurrences
' MINISTRA DE JUSTICIA' 64 occurrences
'MINISTRA DE POLÍTICA TERRITORIAL' 16 occurrences
'MINISTRA DE POLÍTICA TERRITORIAL Y FUNCIÓN PÚBLICA' 1 occurrences
'MINISTRA DE POLÍTICA TERRITORIAL Y PORTAVOZ DEL GOBIERNO' 1 occurrences
' MINISTRA DE SANIDAD' 81 occurrences
'MINISTRA DE TRABAJO Y ECONOMÍA SOCIAL' 6 occurrences
'MINISTRA DE TRANSPORTES, MOVILIDAD Y AGENDA URBANA' 94 occurrences
'MINISTRA DE TRANSPORTES, MOVILILIDAD Y AGENDA URBANA' 1 occurrences
'MINISTRA HACIENDA Y FUNCIÓN PÚBLICA' 1 occurrences
'MINISTRO DE AGRICULTURA, PESCA Y ALIMENTACIÓN' 42 occurrences
'MINISTRO DE ASUNTOS EXTERIORES, UNIÓN EUROPEA Y COOPERACIÓN' 46 occurrences
'MINISTRO DE CIENCIA E INNOVACIÓN' 2 occurrences
' MINISTRO DE CONSUMO' 11 occurrences
' MINISTRO DE CULTURA Y DEPORTE' 16 occurrences
'MINISTRO DE INCLUSIÓN, SEGURIDAD SOCIAL Y MIGRACIONES' 70 occurrences
' MINISTRO DE JUSTICIA' 38 occurrences
'MINISTRO DE LA PRESIDENCIA, RELACIONES CON LAS CORTES Y MEMORIA DEMOCRÁTICA' 139 occurrences
'MINISTRO DE LA PRESIDENCIA, RELACIONES CON LAS CORTES Y MEMORIA HISTÓRICA' 1 occurrences
' MINISTRO DEL INTERIOR' 184 occurrences
'MINISTRO DEL INTERIOR' 2 occurrences
' MINISTRO DE POLÍTICA TERRITORIAL Y FUNCIÓN PÚBLICA' 1 occurrences
'MINISTRO DE POLÍTICA TERRITORIAL Y FUNCIÓN PÚBLICA' 21 occurrences
'MINISTRO DE TRABAJO Y ECONOMÍA SOCIAL' 1 occurrences
' MINISTRO DE TRANSPORTES, MOVILIDAD Y AGENDA URBANA' 1 occurrences
'MINISTRO DE TRANSPORTES, MOVILIDAD Y AGENDA URBANA' 26 occurrences
' MINISTRO DE UNIVERSIDADES' 10 occurrences
' PRESIDENTE DE GOBIERNO' 1 occurrences
' PRESIDENTE DEL GOBIERNO' 343 occurrences
'PRESIDENTE DEL GOBIERNO' 7 occurrences
'VICEPRESIDENTA CUARTA DEL GOBIERNO Y MINISTRA PARA LA TRANSICIÓN ECOLÓGICA Y EL RETO DEMOGRÁFICO' 27 occurrences
'VICEPRESIDENTA CUARTA Y MINISTRA PARA LA TRANSICIÓN ECOLÓGICA Y EL RETO DEMOGRÁFICO' 6 occurrences
'VICEPRESIDENTA PRIMERA, MINISTRA DE LA PRESIDENCIA, RELACIONES CON LAS CORTES Y MEMORIA DEMOCRÁTICA' 1 occurrences
'VICEPRESIDENTA PRIMERA DEL GOBIERNO Y MINISTRA DE ASUNTOS ECONÓMICOS Y TRANSFORMACIÓN DIGITAL' 31 occurrences
'VICEPRESIDENTA PRIMERA DEL GOBIERNO Y MINISTRA DE LA PRESIDENCIA, RELACIONES CON LAS CORTES E IGUALDAD' 2 occurrences
'VICEPRESIDENTA PRIMERA DEL GOBIERNO Y MINISTRA DE LA PRESIDENCIA, RELACIONES CON LAS CORTES Y MEMORIA DEMOCRÁTICA' 54 occurrences
'VICEPRESIDENTA PRIMERA Y MINISTRA ASUNTOS ECONÓMICOS Y TRANSFORMACIÓN DIGITAL' 3 occurrences
'VICEPRESIDENTA PRIMERA Y MINISTRA DE ASUNTOS ECONÓMICOS Y TRANSFORMACIÓN DIGITAL' 191 occurrences
'VICEPRESIDENTA PRIMERA Y MINISTRA DE ASUNTOS ECONÓMICOS Y TRANSFORMACIÓN DIGITAL ' 1 occurrences
'VICEPRESIDENTA PRIMERA Y MINISTRA DE LA PRESIDENCIA, RELACIONES CON LAS CORTES Y MEMORIA DEMOCRÁTICA' 8 occurrences
' VICEPRESIDENTA SEGUNDA' 1 occurrences
'VICEPRESIDENTA SEGUNDA DEL GOBIERNO Y MINISTRA DE ASUNTOS ECONÓMICOS Y TRANSFORMACIÓN DIGITAL' 12 occurrences
'VICEPRESIDENTA SEGUNDA DEL GOBIERNO Y MINISTRA DE TRABAJO Y ECONOMÍA SOCIAL' 14 occurrences
'VICEPRESIDENTA SEGUNDA Y MINISTRA DE ASUNTOS ECONÓMICOS Y TRANSFORMACIÓN DIGITAL' 2 occurrences
'VICEPRESIDENTA SEGUNDA Y MINISTRA DE TRABAJO Y ECONOMÍA SOCIAL' 96 occurrences
'VICEPRESIDENTA TERCERA DEL GOBIERNO Y MINISTRA DE ASUNTOS ECONÓMICOS Y TRANSFORMACIÓN DIGITAL' 12 occurrences
'VICEPRESIDENTA TERCERA DEL GOBIERNO Y MINISTRA DE TRABAJO Y ECONOMÍA SOCIAL' 24 occurrences
'VICEPRESIDENTA TERCERA DEL GOBIERNO Y MINISTRA PARA LA TRANSICIÓN ECOLÓGICA Y EL RETO DEMOGRÁFICO' 13 occurrences
'VICEPRESIDENTA TERCERA Y MINISTRA DE ASUNTOS ECONÓMICOS Y TRANSFORMACIÓN DIGITAL' 1 occurrences
'VICEPRESIDENTA TERCERA Y MINISTRA DE LA TRANSICIÓN ECOLÓGICA Y EL RETO DEMOGRÁFICO' 1 occurrences
'VICEPRESIDENTA TERCERA Y MINISTRA DE TRABAJO Y ECONOMÍA SOCIAL' 7 occurrences
'VICEPRESIDENTA TERCERA Y MINISTRA PARA LA TRANSICIÓN ECOLÓGICA Y EL RETO DEMODRÁGICO' 1 occurrences
'VICEPRESIDENTA TERCERA Y MINISTRA PARA LA TRANSICIÓN ECOLÓGICA Y EL RETO DEMOGRÁFICO' 120 occurrences
'VICEPRESIDENTA TERCERA Y MINISTRA PARA LA TRANSICIÓN ECOLÓGICA Y EL RETO DEMOGRÁFICO,' 3 occurrences
'VICEPRESIDENTA TERCERA Y MINISTRA PARA LA TRANSICIÓN ECOLÓGICA Y RETO DEMOGRÁFICO' 2 occurrences
'VICEPRESIDENTA Y MINISTRA PARA LA TRANSICIÓN ECOLÓGICA Y EL RETO DEMOGRÁFICO' 1 occurrences
'VICEPRESIDENTE DEL GOBIERNO Y MINISTRO DE DERECHOS SOCIALES Y AGENDA 2030' 6 occurrences
' VICEPRESIDENTE PRIMERO' 1 occurrences
'VICEPRESIDENTE SEGUNDO DEL GOBIERNO Y MINISTRO DE DERECHO SOCIALES Y AGENDA 2030' 3 occurrences
'VICEPRESIDENTE SEGUNDO DEL GOBIERNO Y MINISTRO DE DERECHOS SOCIALES Y AGENDA 2030' 33 occurrences
'VICEPRESIDENTE SEGUNDO Y MINISTRO DE DERECHOS SOCIALES Y AGENDA 2030' 7 occurrences
'VICEPRESIENTA PRIMERA DEL GOBIERNO Y MINISTRA DE ASUNTOS ECONÓMICOS Y TRANSFORMACIÓN DIGITAL ' 1 occurrences

@matyaskopp
Copy link
Collaborator Author

thanks @calzada

To identify member of Parliament, see .+?.

Now I can see it, but there is no affiliation timespan. Are there changes in government during government periods?
https://github.com/matyaskopp/ParlaMint/blob/e48f74e3c66adb5a32b8d1051be3d2ebb58c097c/Data/ParlaMint-ES/ParlaMint-ES-listOrg.xml#L47-L56

      <listEvent>
         <event from="2011-12-21" to="2018-06-01" xml:id="GOV.6">
            <label xml:lang="es">Séptimo Gobierno de España (21.12.2011 - 02.06-2018)</label>
            <label xml:lang="en">7th Government of Spain (21.12.2011 - 02.06-2018)</label>
         </event>
         <event from="2018-06-02" xml:id="GOV.7">
            <label xml:lang="es">Octavo Gobierno de España (02.06.2018-)</label>
            <label xml:lang="en">8th Government of Spain (02.06.2018-)</label>
         </event>
      </listEvent>

Or can the minister be affiliated for the whole period?

Is the list of ministers complete? ( = Did every minister have a speech in parliament?)

@calzada
Copy link

calzada commented Jun 21, 2023 via email

@matyaskopp
Copy link
Collaborator Author

I have taken a more detailed look into the content of <post> element.

Simple post

  • one post at a time.
<speaker>
<name>Pastor Julián, Ana María</name>
<birth_date>19571111</birth_date>
<birth_place country="ES">Cubillos</birth_place>
<status>NA</status>
<gender>female</gender>
<institution>
<ni country="ES">CD</ni>
</institution>
<constituency country="ES" region="Madrid"/>
<affiliation>
<national_party>PP</national_party>
<cd group="GP"/>
</affiliation>
<post> VICEPRESIDENTA</post>
</speaker>

affiliations can be represented this way:

<affiliation ref="#CD" role="member" from="2015-01-21" to="2023-02-22"/> <!-- first and last seen in parliament -->
<affiliation ref="#CD" role="deputyHead"/> <!-- first and last seen in parliament (in this role) should be added/ or do we have a better source for this? -->
<!-- and also parliamentaryGroup and optionally party should be added: -->
<affiliation role="member" ref="#group.GP"/>
<affiliation role="member" ref="#party.PP"/>

Post cumulations:

<post>VICEPRESIDENTA PRIMERA DEL GOBIERNO, MINISTRA DE LA PRESIDENCIA, RELACIONES CON LAS CORTES Y MEMORIA DEMOCRÁTICA</post>

should become (and again, an issue with unknown dates)

<affiliation ref="#GOV" role="member"/>
<affiliation ref="#GOV" role="deputyHead">
  <roleName>VICEPRESIDENTA PRIMERA DEL GOBIERNO</roleName>
</affiliation>
<affiliation ref="#GOV" role="minister">
  <roleName>MINISTRA DE LA PRESIDENCIA, RELACIONES CON LAS CORTES Y MEMORIA DEMOCRÁTICA</roleName>
</affiliation>

@calzada
Copy link

calzada commented Jun 22, 2023

See this:

Second government of Pedro Sánchez - Wikipedia

Is there anything I have to do?
Best
mc

@matyaskopp
Copy link
Collaborator Author

Is there anything I have to do?

Gathering minister information from Wikipedia can be done with a script (I hope). @charlicruz or @matyaskopp can do it.


Another issue is to decide how to handle parliamentary groups and their possible relation with political parties. Is this information reachable?

  1. This is needed, for complex solution:
  • parliamentary group full names - in transcriptions there are only abbreviated ones
  • party-group representation relation timespan
  • finally coalition/opposition should show relation among parliamentary groups
  1. or we can do it easily (with a small lie - most of ParlaMinters do it):
  • change politicalParty role to parliamentaryGroup role

@calzada, are you ok with the 2nd option?

  • I personally prefer 2nd option because I am not sure if @charlicruz is with us and changing politicalParty role to parliamentaryGroup role can be done without any need to gather additional information. This can be done by me.

@calzada
Copy link

calzada commented Jun 26, 2023 via email

@charlicruz
Copy link
Collaborator

I can modify the politicalParty role to parliamentaryGroup for all xml files. I have uploaded the CD150120.xml example. Again, I have problems with commit and push under GitHub desktop as I have no permisssion and I uploaded directly by webpage. If it is correct, we do it for the rest, what do you think?

Another issue is there are so many <national_party>UNKNOWN</national_party>
and I don't know how to modify it step by step
I expected to have a small xml sample working but I have some problems after make compilation

Matyas, will you be available in July?

@matyaskopp
Copy link
Collaborator Author

I can modify the politicalParty role to parliamentaryGroup for all xml files.

This is already done:

I have uploaded the CD150120.xml example. Again, I have problems with commit and push under GitHub desktop as I have no permisssion and I uploaded directly by webpage. If it is correct, we do it for the rest, what do you think?

I don't know what should I think, you are modifying source CD format charlicruz/PARLAMINT-ES-MC@09457fd which become invalid according to https://github.com/charlicruz/PARLAMINT-ES-MC/blob/master/CD/cd.dtd
You have to discuss these changes with @calzada first.
I believe the best solution is to leave CD format as it is and just modify the conversion script , but you need to be up to date with my fork, because I made a lot of changes in https://github.com/matyaskopp/PARLAMINT-ES-MC/blob/master/bin/cd2parmamint.xsl

Another issue is there are so many <national_party>UNKNOWN</national_party>
and I don't know how to modify it step by step
I expected to have a small xml sample working but I have some problems after make compilation

the UNKNOWN party can be preserved, as you can see, conversion does not propagate it into TEI file:
https://github.com/matyaskopp/ParlaMint/blob/a10afc44515fc57d0d46196157c0d4f8d3939afb/Data/ParlaMint-ES/ParlaMint-ES-listPerson.xml

Matyas, will you be available in July?

more or less yes


Now I am implementing a script for gathering government members from wikipedia and then integrating affiliations in <listPerson> (I hope it will not take much time - tomorrow it should be ready)

@matyaskopp
Copy link
Collaborator Author

@TomazErjavec
I am close to finishing all necessary scripts for producing the ParlaMint-ES corpus.
Can you please take a look at the sample #692? If there is nothing serious before I start processing the whole corpus.

@TomazErjavec
Copy link
Collaborator

Very nice indeed! I didn't do a formal validation, as you have probably done that but I noticed a few minor things:

  • for handles you could use http://hdl.handle.net/11356/1859 (TEI) and http://hdl.handle.net/11356/1860 (ana) (but finalize script inserts that anyway)
  • utterances often have transcriber comments at the end, and, strictly speaking, they should go outside, i.e. just after the utterance; but in practice it doesn't much matter
  • more of an aesthetic issue: you have IDs like "ParlaMint-ES_2023-02-23-CD230223.u1.1.s1.w1", it would be more consistent to have "ParlaMint-ES_2023-02-23-CD230223.u1.seg1.s1.w1"

@matyaskopp
Copy link
Collaborator Author

I am aware of that. I will preserve by wrong handle http://hdl.handle.net/11356/XXXX, it is safer to have totally wrong handle, instead of pointing to some existing, but wrong handle


  • utterances often have transcriber comments at the end, and, strictly speaking, they should go outside, i.e. just after the utterance; but in practice it doesn't much matter

But there will be utterances without segments or notes. We do not allow it.
I have discovered a several utterances of this type:
source https://www.congreso.es/public_oficiales/L14/CONG/DS/PL/DSCD-14-PL-75.PDF
image

CD (https://github.com/calzada/PARLAMINT-ES-MC/blob/28684ab93851880c18fda17a526f839f2ec909a1/CD/CD210202.xml#L2004-L2024)

<intervention id='in78'>
<speaker>
<name>Bassa Coll, Montserrat</name>
<birth_date>19650420</birth_date>
<birth_place country="ES">UNKNOWN</birth_place>
<status>NA</status>
<gender>female</gender>
<institution>
<ni country="ES">CD</ni>
</institution>
<constituency country="ES" region="Girona"/>
<affiliation>
<national_party>ERC-S</national_party>
<cd group="GR"/>
</affiliation>
<post>NA</post>
</speaker>
<speech id='sp78'  language="ES">
<omit type="comment">Termina su intervención en catalán.-Aplausos</omit>.
</speech>
</intervention>

result:

            <u xml:id="ParlaMint-ES_2021-02-02-CD210202.u78"
               who="#MontserratBassaColl"
               ana="#regular">
               <vocal type="clarification">
                  <desc>Termina su intervención en catalán.-Aplausos</desc>
               </vocal>
            </u>

  • more of an aesthetic issue: you have IDs like "ParlaMint-ES_2023-02-23-CD230223.u1.1.s1.w1", it would be more consistent to have "ParlaMint-ES_2023-02-23-CD230223.u1.seg1.s1.w1

Good point, I will implement it, but I will use p prefix instead of seg (to be consistent with UA and CZ :-))

@TomazErjavec
Copy link
Collaborator

OK, good arguments for ignoring first two suggestions, and, yes, p prefix is then indeed better for the third. Good luck!

@TomazErjavec TomazErjavec added this to the Future milestone Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants