Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chairman speeches do not contain who attribute #24

Open
2 tasks
matyaskopp opened this issue Jul 13, 2023 · 13 comments
Open
2 tasks

chairman speeches do not contain who attribute #24

matyaskopp opened this issue Jul 13, 2023 · 13 comments

Comments

@matyaskopp
Copy link
Collaborator

The chairman's speeches do not refer to a person that speaks:

<u xml:id="ParlaMint-ES_2015-01-20-CD150120.u1" ana="#chair">
<seg xml:id="ParlaMint-ES_2015-01-20-CD150120.u1.1">Se abre la sesión.</seg>
<seg xml:id="ParlaMint-ES_2015-01-20-CD150120.u1.2">Convalidación o derogación del Real Decreto-ley 15/2014, de 19 de diciembre, de modificación del Régimen Económico y Fiscal de Canarias. Para presentar el real decreto-ley, tiene la palabra en nombre del Gobierno el ministro de Hacienda y Administraciones Públicas.</seg>
</u>

the information is present in the source:

<chair who="JESÚS POSADA MORENO">

cd2parmamint.xsl needs to be improved:

  • adding chairman in who attribute and also in a listPerson in temporary component files

parlamint2root.xsl needs changes too:

  • the chairing person does not contain party affiliation in the source, but in other meetings, he/she can be presented as a regular deputy (then the information is present)

reported here: clarin-eric/ParlaMint#696 (comment)

@rdelibanoc
Copy link
Collaborator

@matyaskopp we need confirmation of this. How do we proceed to fix this error, do we change it in the source, or shall we change it in the Tei xml. I think we should change it in the source, does this mean we run the markup and annotation process from scratch.

@matyaskopp
Copy link
Collaborator Author

  • We don't want to change the source CD files, it is defined in cd.dtd file and no changes are allowed (@calzada)
  • We don't want to change it manually in TEI files - they are automatically generated, so the manual changes will be destroyed with the next update

We want to change the scripts, as I described in the issue.

  • if there is not enough information in CD files, then another additional source can be added

You can start with data that is present in CD format:

  • parse name
  • generate a person id (in the same way other ids are generated)
  • create a <person> element that contains only person/@xml:id and person/persName(+ child nodes)

side note: The general idea of adding new sources that I used in adding government members is

  • download data. eg:

    PARLAMINT-ES-MC/Makefile

    Lines 88 to 95 in be3e2be

    download: data-wiki
    data-wiki:
    mkdir data-wiki || :
    wget https://en.wikipedia.org/wiki/Second_government_of_Pedro_S%C3%A1nchez -O data-wiki/gov-2020-01-13.htm
    wget https://en.wikipedia.org/wiki/First_government_of_Pedro_S%C3%A1nchez -O data-wiki/gov-2018-06-07.htm
    wget https://en.wikipedia.org/wiki/Second_government_of_Mariano_Rajoy -O data-wiki/gov-2016-11-04.htm
    wget https://en.wikipedia.org/wiki/First_government_of_Mariano_Rajoy -O data-wiki/gov-2011-12-21.htm
  • add them to the git repository (in case someone changes the source)
  • process/use the data:

    PARLAMINT-ES-MC/Makefile

    Lines 97 to 98 in be3e2be

    data-gov-wiki2tei:
    perl bin/gov-wiki2tei.pl data-wiki/gov-listPerson.xml data-wiki/gov-????-??-??.htm

@rdelibanoc
Copy link
Collaborator

As you can see from the snapshot, in the original files for the "chair" we have the POST (e.g. Presidente) but we do not have the name.

CleanShot 2023-07-18 at 14 12 14@2x

Can we make do without this information (pliz)?

@matyaskopp
Copy link
Collaborator Author

the name is in the <chair> element and is the same for the whole meeting I guess:

<chair who="JESÚS POSADA MORENO">

@calzada
Copy link
Owner

calzada commented Jul 18, 2023 via email

@matyaskopp
Copy link
Collaborator Author

so I have no idea what

<chair who="JESÚS POSADA MORENO"> 

means in CD files. According to wikipedia, Jesús Posada was a chairman...

@calzada
Copy link
Owner

calzada commented Jul 18, 2023 via email

@matyaskopp
Copy link
Collaborator Author

element body allows multiple chair elements

<!ELEMENT body (chair | omit)+>

so I expected that if the speaker change happened, there would be a new chair element.

I have checked your source CD files, and only one chair element exists. So you are claiming there that there is no chairman change. If we use CD files as a source, it is probably better to propagate this error (if it is an error, there are no chairman changes).

If there is no chairman change in source pdf(?) files, then you should expect that there was no chairman change in the chamber of deputies.

@TomazErjavec
Copy link
Collaborator

Hi! the way it was in 2.1 is to have one chair per session, this is the condition:

<xsl:when test="matches(speaker/post, '^\s*(VICE)?PRESIDENT[AE]\s*$', 'i')">chair</xsl:when>

@calzada
Copy link
Owner

calzada commented Jul 19, 2023 via email

@matyaskopp
Copy link
Collaborator Author

matyaskopp commented Jul 19, 2023

Can you give me a sample of chairman changes? I have found only a few, where PRESIDENTA changes to PRESIDENTE, but it looks more like a typo in the source (wrong gender).
eg https://github.com/calzada/PARLAMINT-ES-MC/blob/be3e2be3cf70619c4cd8513b90b19df4d54db87d/CD/CD230214.xml
there are these chairs:

53  PRESIDENTA,UNKNOWN
1   PRESIDENTE,UNKNOWN
16  VICEPRESIDENTA,Elizo Serrano, María Gloria
13  VICEPRESIDENTA,Pastor Julián, Ana María
12  VICEPRESIDENTE,Rodríguez Gómez de Celis, Alfonso

There is only one occurrence of PRESIDENTE, so we can probably say it is a typo and set MERITXELL BATET LAMAÑA

<chair who="MERITXELL BATET LAMAÑA">

as chair.
But I can be wrong - the idea is to record what is in the transcriptions - we can say it is truth...

@calzada
Copy link
Owner

calzada commented Jul 19, 2023 via email

@matyaskopp matyaskopp mentioned this issue Aug 1, 2023
@calzada
Copy link
Owner

calzada commented Aug 3, 2023

@matyaskopp
You could equally use "UNKNOWN". This is what we did at ECPC (our research group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants