Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
hennyu committed Feb 6, 2024
2 parents de5c086 + c31a115 commit 5d608d0
Show file tree
Hide file tree
Showing 26 changed files with 2,494 additions and 1,939 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ jobs:
git config --local user.name "GitHub Action"
git add schema/tei_software_annotation.xml
git add schema/tei_software_annotation.rng
git add schema/tei_jtei_annotated.odd
git add schema/tei_jtei_annotated.rng
git commit -m "Add updated odd and generated rng"
- name: Push changes
uses: ad-m/github-push-action@master
Expand Down
12 changes: 7 additions & 5 deletions data/JTEI/10_2016-19/jtei-10-haaf-source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -211,9 +211,10 @@
target="http://www.deutschestextarchiv.de/doku/software#cab"/></bibl>.</note> as
well as <ref target="http://www.deutschestextarchiv.de/dtaq/about">collaborative text
correction and annotation</ref><note rend="inside.parenthesis">See <bibl><title
level="a">DTAQ: Kollaborative Qualitätssicherung im Deutschen Textarchiv</title>
(Collaborative Quality Assurance within the DTA), accessed January 28, 2017, <ptr
target="http://www.deutschestextarchiv.de/dtaq/about"/></bibl>. On the process of
level="a"><ptr type="software" xml:id="R3"
target="#dtaq"/><rs type="soft.name" ref="#R3">DTAQ: Kollaborative Qualitätssicherung im Deutschen Textarchiv</rs></title>
(Collaborative Quality Assurance within the DTA), accessed January 28, 2017, <rs type="soft.url" ref="#R3"><ptr
target="http://www.deutschestextarchiv.de/dtaq/about"/></rs></bibl>. On the process of
quality assurance in the DTA, see, for example, <ref target="#haaf13" type="bibl">Haaf,
Wiegand, and Geyken 2013</ref>.</note>) is a matter of supporting scholarly projects
in their usage of the DTA infrastructure, which is part of the DTA’s mission. Second,
Expand Down Expand Up @@ -273,7 +274,8 @@
Since June 2014, nine complete volumes with a total of more than 3,500 manuscript pages
have been manually transcribed, annotated in TEI XML, and published via the DTA
infrastructure. Most of these manuscripts were keyed manually by a vendor and published at
an early stage in the web-based quality assurance platform DTAQ. There, the transcription
an early stage in the web-based quality assurance platform <ptr type="software" xml:id="R2"
target="#dtaq"/><rs type="soft.name" ref="#R2">DTAQ</rs>. There, the transcription
as well as the annotation of each document was checked and corrected, if necessary; DTAQ
also provided the means to add additional markup, such as the tagging of person names
(<gi>persName</gi>), directly at page level. After the process of quality control has
Expand Down Expand Up @@ -1210,7 +1212,7 @@
corpora. Our primary goal is to be as inclusive as possible, allowing for other projects
to benefit from our resources (i.e., our comprehensive guidelines and documentation as
well as the technical infrastructure that includes Schemas, ODDs, and <ptr type="software"
xml:id="XSLT" target="#XSLT"/><rs type="soft.name" ref="#XSLT">XSLT</rs> scripts) and
xml:id="R1" target="#xslt"/><rs type="soft.name" ref="#R1">XSLT</rs> scripts) and
contribute to our corpora. We also want to ensure interoperability of all data within the
DTA corpora. The underlying TEI format has to be continuously maintained and adapted to
new necessities with these two premises in mind.</p>
Expand Down
26 changes: 15 additions & 11 deletions data/JTEI/10_2016-19/jtei-10-romary-source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -645,8 +645,8 @@
available at <ptr target="https://github.com/TEIC/TEI/issues/1512"/>. In our proposal,
the <gi>etym</gi> element has to be made recursive in order to allow the fine-grained
representations we propose here. The corresponding ODD customization, together with
reference examples, is available on <ptr type="software" xml:id="GitHub"
target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>.</note> and the
reference examples, is available on <ptr type="software" xml:id="R1"
target="#github"/><rs type="soft.name" ref="#R1">GitHub</rs>.</note> and the
fact that a change occurred within the contemporary lexicon (as opposed to its parent
language) is indicated by means of <att>xml:lang</att> on the source form.<note>There
may also be cases in which it is unknown whether a given etymological process occurred
Expand Down Expand Up @@ -768,8 +768,8 @@
text.<note>The interested reader may ponder here the possibility to also encode
scripts by means of the <att>notation</att> attribute instead of using a cluttering of
language subtags on <att>xml:lang</att>. For more on this issue, see the proposal in
the TEI <ptr type="software" xml:id="GitHub" target="#GitHub"/><rs type="soft.name"
ref="#GitHub">GitHub</rs> (<ptr target="https://github.com/TEIC/TEI/issues/1510"
the TEI <ptr type="software" xml:id="R2" target="#github"/><rs type="soft.name"
ref="#R2">GitHub</rs> (<ptr target="https://github.com/TEIC/TEI/issues/1510"
/>).</note> This is why we have extended the <att>notation</att> attribute to
<gi>orth</gi> in order to allow for better representation of both language
identification and the orthographic content. With this double mechanism, we intend to
Expand Down Expand Up @@ -987,7 +987,7 @@
<p>The <gi>date</gi><note>The element <gi>date</gi> as a child of <gi>cit</gi> is another
example which does not adhere to the current TEI standards. We have allowed this
within our ODD document. A feature request proposal will be made on the <ptr
type="software" xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub"
type="software" xml:id="R3" target="#github"/><rs type="soft.name" ref="#R3"
>GitHub</rs> page and this feature may or may not appear in future versions of the
TEI Guidelines.</note> element is listed within each etymon block; the values of
attributes <att>notBefore</att> and <att>notAfter</att> specify the range of time
Expand Down Expand Up @@ -1486,8 +1486,10 @@
extent of knowledge that is truly necessary to create an accurate model of metaphorical
processes. In order to do this, it is necessary to make use of one or more ontologies,
which could be locally defined within a project, and of external linked open data sources
such as <ref target="http://wiki.dbpedia.org/">DBpedia</ref> and <ref
target="https://www.wikidata.org/">Wikidata</ref>, or some combination thereof. Within
such as <ptr type="software" xml:id="R4"
target="#dbpedia"/><rs type="soft.name soft.url" ref="#R4"><ref target="http://wiki.dbpedia.org/">DBpedia</ref></rs> and <ptr type="software" xml:id="R5"
target="#wikidata"/><rs type="soft.name soft.url" ref="#R5"><ref
target="https://www.wikidata.org/">Wikidata</ref></rs>, or some combination thereof. Within
TEI dictionary markup, URIs for existing ontological entries can be referenced in the
<gi>sense</gi>, <gi>usg</gi>, and <gi>ref</gi> elements as the value of the attribute
<att>corresp</att>.</p>
Expand All @@ -1496,7 +1498,8 @@
reference to the source entry’s unique identifier (if such an entry exists within the
dataset). In such cases, the etymon pointing to the source entry can be assumed to inherit
the source’s domain and sense information, and this information can be automatically
extracted with a fairly simple XSLT program; thus the encoders may choose to leave some or
extracted with a fairly simple <ptr type="software" xml:id="R6"
target="#xslt"/><rs type="soft.name" ref="#R6">XSLT</rs> program; thus the encoders may choose to leave some or
all of this information out of the etymon section. However, in the case that the dataset
does not actually have entries for the source terms, or the encoder wants to be explicit
in all aspects of the etymology, as mentioned above, the source domain and the
Expand Down Expand Up @@ -1556,7 +1559,8 @@
type="metonymy"</tag>) and the etymon (<tag>cit type="etymon"</tag>) the source term’s
URI is referenced in <gi>oRef</gi> and <gi>pRef</gi> as the value of <att>corresp</att>
(<code>@corresp="#animal"</code>).</p>
<p>In <gi>sense</gi>, the URI corresponding to the DBpedia entry for <q>horse</q> is the
<p>In <gi>sense</gi>, the URI corresponding to the <ptr type="software" xml:id="R7"
target="#dbpedia"/><rs type="soft.name" ref="#R7">DBpedia</rs> entry for <q>horse</q> is the
value for the attribute <att>corresp</att>. Additionally, the <tag>date
notBefore="…"</tag> element–attribute pairing is used to specify that the term has only
been used for the <q>horse</q> since 1517 at maximum (corresponding to the first Spanish
Expand Down Expand Up @@ -2485,8 +2489,8 @@
<head>Problematic and Unresolved Issues</head>
<p>For the issues regarded as the most fundamentally important to creating a dynamic and
sustainable model for both etymology and general lexicographic markup in TEI, we have
submitted formal requests for changes to the TEI <ptr type="software" xml:id="GitHub"
target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>, and will continue to
submitted formal requests for changes to the TEI <ptr type="software" xml:id="R8"
target="#github"/><rs type="soft.name" ref="#R8">GitHub</rs>, and will continue to
submit change requests as needed. While this work represents a large step in the right
direction for those looking for means of representing etymological information, there are
still a number of unresolved issues that will need to be addressed. These remaining issues
Expand Down
45 changes: 24 additions & 21 deletions data/JTEI/11_2019-20/jtei-cc-ra-bermudez-sabel-137-source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -110,10 +110,11 @@
ways in which the variant taxonomy may be linked to the body of the edition.</p>
<p>Although this paper is TEI-centered, other XML technologies will be mentioned. <ptr
type="crossref" target="#validation"/> includes a brief commentary on using <ptr
type="software" xml:id="XSLT" target="#XSLT"/><rs type="soft.name" ref="#XSLT">XSLT</rs>
type="software" xml:id="R1" target="#xslt"/><rs type="soft.name" ref="#R1">XSLT</rs>
to transform a TEI-conformant definition of constraints into schema rules. However, the
greatest attention to an additional technology is in <ptr type="crossref"
target="#analyses"/>, which discusses the use of XQuery to retrieve particular
target="#analyses"/>, which discusses the use of <ptr type="software" xml:id="R2"
target="#xquery"/><rs type="soft.name" ref="#R2">XQuery</rs> to retrieve particular
<foreign>loci critici</foreign> and to deploy quantitative analyses.</p>
</div>
<div xml:id="rationale">
Expand Down Expand Up @@ -211,13 +212,14 @@
neutralized.<note>This statement is especially significant when dealing with corpora
that have been compiled over a long period of time. As is clearly explained in the
introduction to the Helsinki Corpus that Irma Taavitsainen and Päivi Pahta prepared for
the <ref
target="http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/meintro.html"
>Corpus Resource Database</ref> (CoRD) (<bibl xml:id="quoteref1"><title level="a"
>Placing the Helsinki Corpus Middle English Section Introduction into
Context</title>, <ptr
target="http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/meintro.html"
/></bibl>): <quote source="#quoteref1">The idea of basing corpus texts directly on
the <ptr type="software" xml:id="R3"
target="#cord"/><rs type="soft.name url" ref="#R3"><ref
target="http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/meintro.html"
>Corpus Resource Database</ref> (CoRD)</rs> (<rs type="soft.bib" ref="#R3"><bibl xml:id="quoteref1"><title level="a"
>Placing the Helsinki Corpus Middle English Section Introduction into
Context</title>, <ptr
target="http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/meintro.html"
/></bibl></rs>): <quote source="#quoteref1">The idea of basing corpus texts directly on
manuscript sources has been presented more recently<gap/> The principles of preparing
manuscript texts for print have undergone changes during the history of
editing<gap/></quote>.</note></p>
Expand Down Expand Up @@ -445,11 +447,12 @@
definition, its typed-feature modeling facilitates the creation of schema constraints. For
instance, I process my declaration to further constrict my schema so the feature structure
declaration and its actual application are always synchronized and up to date.<note>I use
<ptr type="software" xml:id="XSLT" target="#XSLT"/><rs type="soft.name" ref="#XSLT"
<ptr type="software" xml:id="R4" target="#xslt"/><rs type="soft.name" ref="#R4"
>XSLT</rs> to process the feature structure declaration in order to create all
required Schematron rules that will constrict the feature library accordingly. I am
currently working on creating a more generic validator (see my <ref
target="https://github.com/HelenaSabel/FS-Validator">Github repository</ref>, <ptr
target="https://github.com/HelenaSabel/FS-Validator"><ptr type="software" xml:id="R5"
target="#github"/><rs type="soft.name" ref="#R5">Github</rs> repository</ref>, <ptr
target="https://github.com/HelenaSabel/FS-Validator"/>).</note>
<figure xml:id="example4">
<egXML xmlns="http://www.tei-c.org/ns/Examples">
Expand Down Expand Up @@ -541,16 +544,16 @@
>parallel segmentation</ref> method (<ref type="bibl" target="#TEI16">TEI Consortium
2016, 12.2.3</ref>) seems to be a popular encoding technique for multi-witness editions,
in terms of both the specific tools that have been created for this method and the number
of projects that apply it.<note>Tools include <ref target="http://v-machine.org/"><ptr
type="software" xml:id="Versioning Machine" target="#Versioning Machine"/><rs
type="soft.name" ref="#Versioning Machine">Versioning Machine</rs></ref>, <ref
target="https://collatex.net/"><ptr type="software" xml:id="CollateX"
target="#CollateX"/><rs type="soft.name" ref="#CollateX">CollateX</rs></ref> (both
the <ptr type="software" xml:id="Java" target="#Java"/><rs type="soft.name" ref="#Java"
>Java</rs> and <ptr type="software" xml:id="Python" target="#Python"/><rs
type="soft.name" ref="#Python">Python</rs> versions), and <ref
target="http://www.juxtasoftware.org/"><ptr type="software" xml:id="Juxta"
target="#Juxta"/><rs type="soft.name" ref="#Juxta">Juxta</rs></ref>. For
of projects that apply it.<note>Tools include <ptr
type="software" xml:id="R6" target="#versioningmachine"/><rs
type="soft.name soft.url" ref="#R6"><ref target="http://v-machine.org/">Versioning Machine</ref></rs>, <ptr type="software" xml:id="R7"
target="#collatex"/><rs type="soft.name soft.url" ref="#R7"><ref
target="https://collatex.net/">CollateX</ref></rs> (both
the <ptr type="software" xml:id="R8" target="#java"/><rs type="soft.name" ref="#R8"
>Java</rs> and <ptr type="software" xml:id="R9" target="#python"/><rs
type="soft.name" ref="#R9">Python</rs> versions), and <ptr type="software" xml:id="R10"
target="#juxta"/><rs type="soft.name soft.url" ref="#R10"><ref
target="http://www.juxtasoftware.org/">Juxta</ref></rs>. For
representative projects using the parallel segmentation method see <ref
target="http://scholarlyediting.org/2015/editions/lowelledition_wit-Courier.html"
>Satire in Circulation: James editions Russell Lowell’s Letter from a volunteer in
Expand Down
Loading

0 comments on commit 5d608d0

Please sign in to comment.