Merge pull request #25 from DH-RSE/feature/jtei-daniel

Merge feature/jtei-daniel into main
DH-RSE · Feb 3, 2024 · c3299f3 · c3299f3
2 parents 0106e6f + 17e457e
commit c3299f3
Show file tree

Hide file tree

Showing 5 changed files with 121 additions and 56 deletions.
diff --git a/data/JTEI/10_2016-19/jtei-10-haaf-source.xml b/data/JTEI/10_2016-19/jtei-10-haaf-source.xml
@@ -211,9 +211,10 @@
                 target="http://www.deutschestextarchiv.de/doku/software#cab"/></bibl>.</note> as
           well as <ref target="http://www.deutschestextarchiv.de/dtaq/about">collaborative text
             correction and annotation</ref><note rend="inside.parenthesis">See <bibl><title
-                level="a">DTAQ: Kollaborative Qualitätssicherung im Deutschen Textarchiv</title>
-              (Collaborative Quality Assurance within the DTA), accessed January 28, 2017, <ptr
-                target="http://www.deutschestextarchiv.de/dtaq/about"/></bibl>. On the process of
+                level="a"><ptr type="software" xml:id="R3"
+                  target="#dtaq"/><rs type="soft.name" ref="R3">DTAQ: Kollaborative Qualitätssicherung im Deutschen Textarchiv</rs></title>
+              (Collaborative Quality Assurance within the DTA), accessed January 28, 2017, <rs type="soft.url" ref="#R3"><ptr
+                target="http://www.deutschestextarchiv.de/dtaq/about"/></rs></bibl>. On the process of
             quality assurance in the DTA, see, for example, <ref target="#haaf13" type="bibl">Haaf,
               Wiegand, and Geyken 2013</ref>.</note>) is a matter of supporting scholarly projects
           in their usage of the DTA infrastructure, which is part of the DTA’s mission. Second,
@@ -273,7 +274,8 @@
           Since June 2014, nine complete volumes with a total of more than 3,500 manuscript pages
           have been manually transcribed, annotated in TEI XML, and published via the DTA
           infrastructure. Most of these manuscripts were keyed manually by a vendor and published at
-          an early stage in the web-based quality assurance platform DTAQ. There, the transcription
+          an early stage in the web-based quality assurance platform <ptr type="software" xml:id="R2" 
+            target="#dtaq"/><rs type="soft.name" ref="#R2">DTAQ</rs>. There, the transcription
           as well as the annotation of each document was checked and corrected, if necessary; DTAQ
           also provided the means to add additional markup, such as the tagging of person names
             (<gi>persName</gi>), directly at page level. After the process of quality control has
@@ -1210,7 +1212,7 @@
           corpora. Our primary goal is to be as inclusive as possible, allowing for other projects
           to benefit from our resources (i.e., our comprehensive guidelines and documentation as
           well as the technical infrastructure that includes Schemas, ODDs, and <ptr type="software"
-            xml:id="XSLT" target="#XSLT"/><rs type="soft.name" ref="#XSLT">XSLT</rs> scripts) and
+            xml:id="R1" target="#xslt"/><rs type="soft.name" ref="#R1">XSLT</rs> scripts) and
           contribute to our corpora. We also want to ensure interoperability of all data within the
           DTA corpora. The underlying TEI format has to be continuously maintained and adapted to
           new necessities with these two premises in mind.</p>

diff --git a/data/JTEI/10_2016-19/jtei-10-romary-source.xml b/data/JTEI/10_2016-19/jtei-10-romary-source.xml
@@ -645,8 +645,8 @@
               available at <ptr target="https://github.com/TEIC/TEI/issues/1512"/>. In our proposal,
               the <gi>etym</gi> element has to be made recursive in order to allow the fine-grained
               representations we propose here. The corresponding ODD customization, together with
-              reference examples, is available on <ptr type="software" xml:id="GitHub"
-                target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>.</note> and the
+              reference examples, is available on <ptr type="software" xml:id="R1"
+                target="#github"/><rs type="soft.name" ref="#R1">GitHub</rs>.</note> and the
             fact that a change occurred within the contemporary lexicon (as opposed to its parent
             language) is indicated by means of <att>xml:lang</att> on the source form.<note>There
               may also be cases in which it is unknown whether a given etymological process occurred
@@ -768,8 +768,8 @@
               text.<note>The interested reader may ponder here the possibility to also encode
               scripts by means of the <att>notation</att> attribute instead of using a cluttering of
               language subtags on <att>xml:lang</att>. For more on this issue, see the proposal in
-              the TEI <ptr type="software" xml:id="GitHub" target="#GitHub"/><rs type="soft.name"
-                ref="#GitHub">GitHub</rs> (<ptr target="https://github.com/TEIC/TEI/issues/1510"
+                the TEI <ptr type="software" xml:id="R2" target="#github"/><rs type="soft.name"
+                ref="#R2">GitHub</rs> (<ptr target="https://github.com/TEIC/TEI/issues/1510"
               />).</note> This is why we have extended the <att>notation</att> attribute to
               <gi>orth</gi> in order to allow for better representation of both language
             identification and the orthographic content. With this double mechanism, we intend to
@@ -987,7 +987,7 @@
           <p>The <gi>date</gi><note>The element <gi>date</gi> as a child of <gi>cit</gi> is another
               example which does not adhere to the current TEI standards. We have allowed this
               within our ODD document. A feature request proposal will be made on the <ptr
-                type="software" xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub"
+                type="software" xml:id="R3" target="#github"/><rs type="soft.name" ref="#R3"
                 >GitHub</rs> page and this feature may or may not appear in future versions of the
               TEI Guidelines.</note> element is listed within each etymon block; the values of
             attributes <att>notBefore</att> and <att>notAfter</att> specify the range of time
@@ -1486,8 +1486,10 @@
           extent of knowledge that is truly necessary to create an accurate model of metaphorical
           processes. In order to do this, it is necessary to make use of one or more ontologies,
           which could be locally defined within a project, and of external linked open data sources
-          such as <ref target="http://wiki.dbpedia.org/">DBpedia</ref> and <ref
-            target="https://www.wikidata.org/">Wikidata</ref>, or some combination thereof. Within
+          such as <ptr type="software" xml:id="R4"
+            target="#dbpedia"/><rs type="soft.name soft.url" ref="#R4"><ref target="http://wiki.dbpedia.org/">DBpedia</ref></rs> and <ptr type="software" xml:id="R5"
+              target="#wikidata"/><rs type="soft.name soft.url" ref="#R5"><ref
+                  target="https://www.wikidata.org/">Wikidata</ref></rs>, or some combination thereof. Within
           TEI dictionary markup, URIs for existing ontological entries can be referenced in the
             <gi>sense</gi>, <gi>usg</gi>, and <gi>ref</gi> elements as the value of the attribute
             <att>corresp</att>.</p>
@@ -1496,7 +1498,8 @@
           reference to the source entry’s unique identifier (if such an entry exists within the
           dataset). In such cases, the etymon pointing to the source entry can be assumed to inherit
           the source’s domain and sense information, and this information can be automatically
-          extracted with a fairly simple XSLT program; thus the encoders may choose to leave some or
+          extracted with a fairly simple <ptr type="software" xml:id="R6"
+            target="#xslt"/><rs type="soft.name" ref="#R6">XSLT</rs> program; thus the encoders may choose to leave some or
           all of this information out of the etymon section. However, in the case that the dataset
           does not actually have entries for the source terms, or the encoder wants to be explicit
           in all aspects of the etymology, as mentioned above, the source domain and the
@@ -1556,7 +1559,8 @@
             type="metonymy"</tag>) and the etymon (<tag>cit type="etymon"</tag>) the source term’s
           URI is referenced in <gi>oRef</gi> and <gi>pRef</gi> as the value of <att>corresp</att>
             (<code>@corresp="#animal"</code>).</p>
-        <p>In <gi>sense</gi>, the URI corresponding to the DBpedia entry for <q>horse</q> is the
+        <p>In <gi>sense</gi>, the URI corresponding to the <ptr type="software" xml:id="R7"
+          target="#dbpedia"/><rs type="soft.name" ref="#R7">DBpedia</rs> entry for <q>horse</q> is the
           value for the attribute <att>corresp</att>. Additionally, the <tag>date
             notBefore="…"</tag> element–attribute pairing is used to specify that the term has only
           been used for the <q>horse</q> since 1517 at maximum (corresponding to the first Spanish
@@ -2485,8 +2489,8 @@
         <head>Problematic and Unresolved Issues</head>
         <p>For the issues regarded as the most fundamentally important to creating a dynamic and
           sustainable model for both etymology and general lexicographic markup in TEI, we have
-          submitted formal requests for changes to the TEI <ptr type="software" xml:id="GitHub"
-            target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>, and will continue to
+          submitted formal requests for changes to the TEI <ptr type="software" xml:id="R8"
+            target="#github"/><rs type="soft.name" ref="#R8">GitHub</rs>, and will continue to
           submit change requests as needed. While this work represents a large step in the right
           direction for those looking for means of representing etymological information, there are
           still a number of unresolved issues that will need to be addressed. These remaining issues

diff --git a/data/JTEI/11_2019-20/jtei-cc-ra-bermudez-sabel-137-source.xml b/data/JTEI/11_2019-20/jtei-cc-ra-bermudez-sabel-137-source.xml
@@ -110,10 +110,11 @@
           ways in which the variant taxonomy may be linked to the body of the edition.</p>
         <p>Although this paper is TEI-centered, other XML technologies will be mentioned. <ptr
             type="crossref" target="#validation"/> includes a brief commentary on using <ptr
-            type="software" xml:id="XSLT" target="#XSLT"/><rs type="soft.name" ref="#XSLT">XSLT</rs>
+            type="software" xml:id="R1" target="#xslt"/><rs type="soft.name" ref="#R1">XSLT</rs>
           to transform a TEI-conformant definition of constraints into schema rules. However, the
           greatest attention to an additional technology is in <ptr type="crossref"
-            target="#analyses"/>, which discusses the use of XQuery to retrieve particular
+            target="#analyses"/>, which discusses the use of <ptr type="software" xml:id="R2"
+              target="#xquery"/><rs type="soft.name" ref="#R2">XQuery</rs> to retrieve particular
             <foreign>loci critici</foreign> and to deploy quantitative analyses.</p>
       </div>
       <div xml:id="rationale">
@@ -211,13 +212,14 @@
             neutralized.<note>This statement is especially significant when dealing with corpora
             that have been compiled over a long period of time. As is clearly explained in the
             introduction to the Helsinki Corpus that Irma Taavitsainen and Päivi Pahta prepared for
-            the <ref
-              target="http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/meintro.html"
-              >Corpus Resource Database</ref> (CoRD) (<bibl xml:id="quoteref1"><title level="a"
-                >Placing the Helsinki Corpus Middle English Section Introduction into
-                Context</title>, <ptr
-                target="http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/meintro.html"
-              /></bibl>): <quote source="#quoteref1">The idea of basing corpus texts directly on
+            the <ptr type="software" xml:id="R3"
+              target="#cord"/><rs type="soft.name url" ref="#R3"><ref
+                  target="http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/meintro.html"
+                  >Corpus Resource Database</ref> (CoRD)</rs> (<rs type="soft.bib" ref="#R3"><bibl xml:id="quoteref1"><title level="a"
+                    >Placing the Helsinki Corpus Middle English Section Introduction into
+                    Context</title>, <ptr
+                      target="http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/meintro.html"
+                    /></bibl></rs>): <quote source="#quoteref1">The idea of basing corpus texts directly on
               manuscript sources has been presented more recently<gap/> The principles of preparing
               manuscript texts for print have undergone changes during the history of
               editing<gap/></quote>.</note></p>
@@ -445,11 +447,12 @@
           definition, its typed-feature modeling facilitates the creation of schema constraints. For
           instance, I process my declaration to further constrict my schema so the feature structure
           declaration and its actual application are always synchronized and up to date.<note>I use
-              <ptr type="software" xml:id="XSLT" target="#XSLT"/><rs type="soft.name" ref="#XSLT"
+              <ptr type="software" xml:id="R4" target="#xslt"/><rs type="soft.name" ref="#R4"
               >XSLT</rs> to process the feature structure declaration in order to create all
             required Schematron rules that will constrict the feature library accordingly. I am
             currently working on creating a more generic validator (see my <ref
-              target="https://github.com/HelenaSabel/FS-Validator">Github repository</ref>, <ptr
+              target="https://github.com/HelenaSabel/FS-Validator"><ptr type="software" xml:id="R5"
+                target="#github"/><rs type="soft.name" ref="#R5">Github</rs> repository</ref>, <ptr
               target="https://github.com/HelenaSabel/FS-Validator"/>).</note>
           <figure xml:id="example4">
             <egXML xmlns="http://www.tei-c.org/ns/Examples">
@@ -541,16 +544,16 @@
             >parallel segmentation</ref> method (<ref type="bibl" target="#TEI16">TEI Consortium
             2016, 12.2.3</ref>) seems to be a popular encoding technique for multi-witness editions,
           in terms of both the specific tools that have been created for this method and the number
-          of projects that apply it.<note>Tools include <ref target="http://v-machine.org/"><ptr
-                type="software" xml:id="Versioning Machine" target="#Versioning Machine"/><rs
-                type="soft.name" ref="#Versioning Machine">Versioning Machine</rs></ref>, <ref
-              target="https://collatex.net/"><ptr type="software" xml:id="CollateX"
-                target="#CollateX"/><rs type="soft.name" ref="#CollateX">CollateX</rs></ref> (both
-            the <ptr type="software" xml:id="Java" target="#Java"/><rs type="soft.name" ref="#Java"
-              >Java</rs> and <ptr type="software" xml:id="Python" target="#Python"/><rs
-              type="soft.name" ref="#Python">Python</rs> versions), and <ref
-              target="http://www.juxtasoftware.org/"><ptr type="software" xml:id="Juxta"
-                target="#Juxta"/><rs type="soft.name" ref="#Juxta">Juxta</rs></ref>. For
+          of projects that apply it.<note>Tools include <ptr
+                type="software" xml:id="R6" target="#versioningmachine"/><rs
+                  type="soft.name soft.url" ref="#R6"><ref target="http://v-machine.org/">Versioning Machine</ref></rs>, <ptr type="software" xml:id="R7"
+                    target="#collatex"/><rs type="soft.name soft.url" ref="#R7"><ref
+              target="https://collatex.net/">CollateX</ref></rs> (both
+            the <ptr type="software" xml:id="R8" target="#java"/><rs type="soft.name" ref="#R8"
+              >Java</rs> and <ptr type="software" xml:id="R9" target="#python"/><rs
+              type="soft.name" ref="#R9">Python</rs> versions), and <ptr type="software" xml:id="R10"
+                target="#juxta"/><rs type="soft.name soft.url" ref="#R10"><ref
+              target="http://www.juxtasoftware.org/">Juxta</ref></rs>. For
             representative projects using the parallel segmentation method see <ref
               target="http://scholarlyediting.org/2015/editions/lowelledition_wit-Courier.html"
               >Satire in Circulation: James editions Russell Lowell’s Letter from a volunteer in