Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add choice + sic + corr #190

Open
ttasovac opened this issue Apr 22, 2023 · 13 comments
Open

add choice + sic + corr #190

ttasovac opened this issue Apr 22, 2023 · 13 comments

Comments

@ttasovac
Copy link
Contributor

This is a placeholder from the Lexical Resources Summit 2023.

We agreed to allow corrections, but this must be well documented.

@ttasovac
Copy link
Contributor Author

I've enabled choice, sic and corr in this branch.

For quick testing against the schema in this branch, you can use this url: https://raw.githubusercontent.com/DARIAH-ERIC/lexicalresources/feature/choice%2Bsic%2Bcorr/Schemas/TEILex0/out/TEILex0.rng

@xlhrld if you or anyobdy else can post some examples of corrections here (previously validated against the above link) and suggest some narrative for the guidelines, I will be eternally grateful.

@anacastrosalgado
Copy link
Collaborator

This example is taken from the Morais front matter (tested with the suggested schema) and illustrates the use of choice, sic and corr.

<item n="0.1.3">
   <abbr type="title" rend="italic">Acções Epiſc.</abbr>
   <!-- Lucas de Andrade not Andrada -->
   <expan>Acções Epiſcopaes de Lucas de <choice><sic>Andrada</sic><corr resp="#Salgado">Andrade</corr></choice></expan>
   <pc>.</pc>
</item>

Is this helpful, @ttasovac ?

Editorial Changes

To signal known detected typos in the printed editions, we recommend using the choice element and related elements:

sic contains text reproduced, although apparently incorrect or inaccurate
Attributes include:
corr gives a correction for the apparent error in the copy text
resp signifies the editor, transcriber or encoder responsible for suggesting the correction held
cert signifies the degree of certainty ascribed to the correction held

corr contains the correct form of a passage erroneous in the copy text
Attributes include:
sic gives the original form of the apparent error in the copy text
resp signifies the editor or transcriber responsible for suggesting the correction
cert signifies the degree of certainty ascribed to the correction

If both readings are given, the choice between sic and corr is largely a question of individual preference; since both record the same information, either may be mechanically transformed into the other.

@ttasovac
Copy link
Contributor Author

ttasovac commented May 7, 2023

Thanks, @anacastrosalgado. Not sure about why you have two lists of slightly different definitions. Also, corr and sic are not attributes of choice — these are elements nested within choice. B

The example is good and we can use it. I would prefer if we could in addition get an example of a correction from the dictionary proper — i.e. when there are mistakes within entries. @xlhrld I think you said you were interested in this — do you have any examples?

@anacastrosalgado
Copy link
Collaborator

anacastrosalgado commented May 8, 2023

@ttasovac These examples are taken from the Morais.

SANDALO

<entry xmlns="http://www.tei-c.org/ns/1.0" xml:id="MORAIS.1.DLP.SANDALO" type="mainEntry"
   xml:lang="pt">
   <form type="lemma">
      <orth>SANDALO</orth>
   </form>
   <metamark function="lemmaDelimiter">,</metamark>
   <gramGrp>
      <gram type="pos" norm="NOUN">ſ.</gram>
      <gram type="gen">m.</gram>
   </gramGrp>
   <sense xml:id="MORAIS.1.DLP.SANDALO.s.1">
      <def>arvore , e a madeira della aromatica , que he de 3 cores , branca , roixa , ou
         <choice>
            <sic>vermelhɐ</sic>
            <corr resp="#Salgado">vermelha</corr>
         </choice> , e cetrina , ou pallida , uſa-ſe na Farmacia , e na Aſia para perfumes</def>
   </sense>
   <metamark function="senseDelimiter">.</metamark>
</entry>

LIBERDADE

<entry xmlns="http://www.tei-c.org/ns/1.0" xml:id="MORAIS.1.DLP.LIBERDADE" type="mainEntry"
   xml:lang="pt">
   <form type="lemma">
      <orth>LIBERDADE</orth>
   </form>
   <metamark function="lemmaDelimiter">,</metamark>
   <gramGrp>
      <gram type="pos" norm="NOUN">ſ.</gram>
      <gram type="gen">f.</gram>
   </gramGrp>
   <sense xml:id="MORAIS.1.DLP.LIBERDADE.s.1">
      <def>a faculdade , que a alma tem de fazer , ou deixar de fazer alguma coiſa , como mais quer</def>
   </sense>
   <metamark function="senseDelimiter">.</metamark>
   <metamark function="senseDelimiter">§</metamark>
   <sense xml:id="MORAIS.1.DLP.LIBERDADE.s.2">
      <def>A faculdade de poder fazer impunemente , e ſem ſer reſponſavel , tudo 0 que não he prohibido pelas leis , ſem haver quem arbitrariamente
         tome conhecimenro diſſo</def>
   </sense>
   <metamark function="senseDelimiter">.</metamark>
   <metamark function="senseDelimiter">§</metamark>
   <sense xml:id="MORAIS.1.DLP.LIBERDADE.s.3">
      <def>A faculdade de
         O eſtado da nação , que não reconhece 
         <choice>
            <sic>ſuperioridadade</sic>
            <corr resp="#Salgado">ſuperioridade</corr>
         </choice> a outra</def>
   </sense>
   <!-- [...] -->
</entry>

Editorial Changes

To signal known detected typos in the printed editions, we recommend using the choice element and related elements:

sic contains text reproduced, although apparently incorrect or inaccurate
corr gives a correction for the apparent error in the copy text
resp signifies the editor, transcriber or encoder responsible for suggesting the correction held
cert signifies the degree of certainty ascribed to the correction held

@anacastrosalgado
Copy link
Collaborator

anacastrosalgado commented May 9, 2023

In this example, I want to add the infinitive that is missing (CAUSTICAR). May I use choice?

CAUSTICADO

<entry xmlns="http://www.tei-c.org/ns/1.0" xml:id="MORAIS.1.DLP.CAUSTICADO" type="mainEntry" xml:lang="pt">
   <form type="lemma">
      <orth>CAUSTICADO</orth>
   </form>
   <form type="inflected">
      <orth>causticado</orth>
   </form>
   <gramGrp>
      <!-- or  gram type="pos" norm="ADJECTIVE"? -->
      <gram type="tense">part. paſſ.</gram>
      <lbl>de</lbl>
      <choice>
         <sic></sic>
         <corr resp="#Salgado"> <ref target="#MORAIS.1.DLP.CAUSTICAR" type="mainEntry">
            causticar</ref></corr>
      </choice>
   </gramGrp>
</entry>

@daliboris
Copy link
Contributor

For this case the <supplied> is the right element: https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-supplied.html.

<lbl>de</lbl>
<supplied resp="#Salgado"> 
   <ref target="#MORAIS.1.DLP.CAUSTICAR" type="mainEntry">causticar</ref>
</supplied>

@anacastrosalgado
Copy link
Collaborator

anacastrosalgado commented May 9, 2023

@daliboris You're my new hero! :-) Thanks.

@xlhrld
Copy link
Collaborator

xlhrld commented May 10, 2023

From the Wörterbuch der deutschen Gegenwartssprache (WDG) I would suggest the following. Editorial practice in the WDG was to abbreviate the headword if it occurs unaltered in examples as is the case in the second example below. In the first, however, the grammatical expansion would be x-ten as opposed to x-te so it should have been provided as a full form:

wdg-4410

<entry xml:id="x-fach" xml:lang="de">
                <form type="lemma">
                    <orth>x-te</orth>
                </form>
                <gramGrp>
                    <pc>/</pc><gram type="pos">Ord.zahl</gram><pc>;</pc>
                </gramGrp>
                <sense xml:id="x-fach-s-1">
                    <def>bezeichnet eine große, unbestimmte Anzahl</def><pc>/</pc>
                    <cit type="example">
                        <quote>ich habe es dir zum
                            <choice><sic>x.</sic><corr>x-ten</corr></choice>
                            <!-- does not expand to x-te but to x-ten -->
                            Male gesagt</quote><pc>;</pc>
                    </cit>
                    <cit type="example">
                        <usg type="domain">Math.</usg>
                        <quote>eine Zahl in die x. Potenz erheben</quote>
                        <!-- expands to x-te -->
                    </cit>
                </sense>
            </entry>

@xlhrld
Copy link
Collaborator

xlhrld commented May 10, 2023

There can be severeal types of errors we would like to keep:

  • genuine orthographic errors
  • violations of editorial practices
  • typesetting errors (such as missing or turned letters)
  • (possibly more)

Especially the first two types can be interesting research data in their own right for digitized old dictionaries.

edit: s/orhographic/orthographic (a bit ironic …)

@anacastrosalgado
Copy link
Collaborator

To discuss: correction in a domain label (?)

ppt

<entry xmlns="http://www.tei-c.org/ns/1.0" xml:id="MOR1.DLP.APOTEMA" type="mainEntry" xml:lang="pt">
   <form type="lemma">
      <orth>APOTEMA</orth>
   </form>
   <metamark function="lemmaDelimiter">,</metamark>
   <gramGrp>
      <gram type="pos" norm="NOUN">ſ.</gram>
      <gram type="gen">m.</gram>
   </gramGrp>
   <sense xml:id="MOR1.DLP.APOTEMA.s.1">
      <usg type="domain" corresp="#domain.mathematics http://vocabs.rossio.fcsh.unl.pt/morais_domains/0024">
         <choice>
            <sic>Matemat.</sic>
            <corr resp="#Salgado">Mathem.</corr>
         </choice>
      </usg>
      <def>raio recto</def>
      <lbl expand="verbi gratia" xml:lang="la"><hi>v. g.</hi></lbl>
      <metamark function="exampleDelimiter">,,</metamark>
      <cit type="example" xml:lang="pt">
         <quote>a apotema de hum poligono he a recta perpendicularmente tirada do centro ao
            lado do poligono</quote>
      </cit>
   </sense>
   <pc>.</pc>
</entry>

@daliboris
Copy link
Contributor

daliboris commented Sep 6, 2023

On the last example of the usage label:

Is it possible that Matemat. is the correct abbreviation in Portuguese, i.e. possible variant to Mathem.?

If so, I suggest using the <reg> and <orig> elements instead.

See TEI P5:

<reg> (regularization) contains a reading which has been regularized or normalized in some sense

@anacastrosalgado
Copy link
Collaborator

anacastrosalgado commented Sep 6, 2023

@daliboris Yes, it could be considered a variant. Although, I was not "accepting" it because it is not listed on the front matter. I wouldn't say I liked the solution of sic; I think we can go for your solution. There was a constant fluctuation in the first Morais editions, as you can see on slides 13-14: [(https://mordigital.fcsh.unl.pt/wp-content/uploads/DSNA_24_versionFINAL-1.pdf)]

@daliboris
Copy link
Contributor

To your example @anacastrosalgado:

<abbr type="title" rend="italic">Acções Epiſc.</abbr>

In TEI Lex-0 guidlines, the @type attribute suggested value title is defined as (title) the abbreviation is for a title of address (Dr, Ms, Mr, …).

But there is also note: As the sample values make clear, abbreviations may be classified by the method used to construct them, the method of writing them, or the referent of the term abbreviated; the typology used is up to the encoder and should be carefully planned to meet the needs of the expected use.

In your case, title value is meant as an abbreviation for the title of work/source. Perhaps a different value can be used to avoid similarity to the title of address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

4 participants