merge

DH-RSE · Feb 14, 2024 · ba9b707 · ba9b707
2 parents 2b04666 + ffccb38
commit ba9b707
Show file tree

Hide file tree

Showing 10 changed files with 582 additions and 384 deletions.
diff --git a/data/JTEI/14_2021-23/jtei-bleeker-et-al-199-source.xml b/data/JTEI/14_2021-23/jtei-bleeker-et-al-199-source.xml
@@ -377,29 +377,30 @@
                      nonlinear objects, modeled as a GODDAG data structure (<ref target="#huit2003"
                         type="bibl">Huitfeldt and Sperberg-McQueen 2003</ref>). In GODDAG, all
                      children of the markup nodes are typically ordered, but TexMECS provides a
-                     notation to mark certain markup nodes as unordered. The GODDAG processor
-                     ignores the default linear order of these elements’ children, and therefore
-                     TexMECS supports the representation of nonlinear structures. No known working
-                     implementation of TexMECS, however, is currently available. At first glance,
-                     EARMARK (Extremely Annotated RDF Markup) also seems to support the option to
-                     represent nonlinearity: with EARMARK, users can express different linear
-                     structures using RDF statements about text fragments, and in this way it is
-                     possible to describe multiple text orders (<ref type="bibl" target="#per2009"
-                        >Peroni and Vitali 2009</ref>, 4.1; <ref type="bibl" target="#iorio2009">Di
-                        Iorio 2009</ref>). However, multi-orderedness is not the same as partial
-                     orderedness: if a text is partially ordered, it means that (part of the) text
-                     has no order. Multi-orderedness always implies a certain order. The EARMARK
-                     specification as described in <ref type="bibl" target="#per2009">Peroni and
-                        Vitali 2009</ref> does not natively support partially ordered text, in the
-                     sense that EARMARK users cannot mark the branching of the text stream. It is
-                     also important to note that EARMARK is a metamarkup language, which means that
-                     users encode their texts not in EARMARK but in an RDF
-                        serialization.<note>Recognizing the challenge of expressing literary texts
-                        as RDF statements, <ref target="#bara2012" type="bibl" xml:id="quoteref8"
-                           >Barabucci et al.</ref> developed the FRETTA approach, which is designed
-                           <quote source="#quoteref8">to express EARMARK annotations in an embedded
-                           syntax such as XML</quote>. It is unclear, however, whether this approach
-                        has been further developed or implemented.</note></p>
+                     notation to mark certain markup nodes as unordered. The <ptr type="software"
+                        xml:id="R5" target="#goddag"/><rs type="soft.name" ref="#R5">GODDAG
+                        processor</rs> ignores the default linear order of these elements’ children,
+                     and therefore TexMECS supports the representation of nonlinear structures. No
+                     known working implementation of TexMECS, however, is currently available. At
+                     first glance, EARMARK (Extremely Annotated RDF Markup) also seems to support
+                     the option to represent nonlinearity: with EARMARK, users can express different
+                     linear structures using RDF statements about text fragments, and in this way it
+                     is possible to describe multiple text orders (<ref type="bibl"
+                        target="#per2009">Peroni and Vitali 2009</ref>, 4.1; <ref type="bibl"
+                        target="#iorio2009">Di Iorio 2009</ref>). However, multi-orderedness is not
+                     the same as partial orderedness: if a text is partially ordered, it means that
+                     (part of the) text has no order. Multi-orderedness always implies a certain
+                     order. The EARMARK specification as described in <ref type="bibl"
+                        target="#per2009">Peroni and Vitali 2009</ref> does not natively support
+                     partially ordered text, in the sense that EARMARK users cannot mark the
+                     branching of the text stream. It is also important to note that EARMARK is a
+                     metamarkup language, which means that users encode their texts not in EARMARK
+                     but in an RDF serialization.<note>Recognizing the challenge of expressing
+                        literary texts as RDF statements, <ref target="#bara2012" type="bibl"
+                           xml:id="quoteref8">Barabucci et al.</ref> developed the FRETTA approach,
+                        which is designed <quote source="#quoteref8">to express EARMARK annotations
+                           in an embedded syntax such as XML</quote>. It is unclear, however,
+                        whether this approach has been further developed or implemented.</note></p>
                </div>
                <div xml:id="discontinuity2">
                   <head>Discontinuity</head>
@@ -543,9 +544,11 @@
                   <p>TAGML may resemble existing markup languages like XML, TexMECS, or LMNL, but
                      TAGML is more expressive. For instance, in XML all annotation values are of
                      type string, but TAGML offers data-typing of annotations. These data types are
-                     expressed in UTF-8 and interpreted by the TAGML parser as different data types.
-                     Encoders can distinguish between integer, string, or Boolean values (<ptr
-                        target="#tagml1" type="crossref"/>). <figure xml:id="tagml1">
+                     expressed in UTF-8 and interpreted by the <ptr type="software" xml:id="R6"
+                        target="#tagmlparser"/><rs type="soft.name" ref="#R6">TAGML parser</rs> as
+                     different data types. Encoders can distinguish between integer, string, or
+                     Boolean values (<ptr target="#tagml1" type="crossref"/>). <figure
+                        xml:id="tagml1">
                         <graphic url="img/tagml1.png" width="1852px" height="70px"/>
                         <head type="legend">Example of TAGML, featuring different types of
                            annotation value.</head>
@@ -572,11 +575,17 @@
                      encoding complex textual features, TAGML is designed to make that modeling
                      process as natural as possible. The markup language has the same compactness as
                      XML and is independent of the user environment.<note>TAGML can be edited in any
-                        editor, but the open source text editor Sublime has <ref
-                           target="https://huygensing.github.io/tagml-sublime-syntax/"> a TAGML
-                           syntax highlighting package</ref>, and the <ref
-                           target="https://huygensing.github.io/alexandria/">reference
-                           implementation Alexandria</ref> can be used to parse and validate TAGML
+                        editor, but the open source text editor <ptr type="software" xml:id="R7"
+                           target="#sublime"/><rs type="soft.name" ref="#R7">Sublime</rs> has <ptr
+                           type="software" xml:id="R8" target="#sublimepackage"/><rs type="soft.url"
+                           ref="#R7"><ref
+                              target="https://huygensing.github.io/tagml-sublime-syntax/"> a <rs
+                                 type="soft.name" ref="#R8">TAGML syntax highlighting
+                              package</rs></ref></rs>, and the <ptr type="software" xml:id="R9"
+                           target="#alexandria"/><rs type="soft.url" ref="#R9"><ref
+                              target="https://huygensing.github.io/alexandria/">reference
+                              implementation <rs type="soft.name" ref="#R9"
+                           >Alexandria</rs></ref></rs> can be used to parse and validate TAGML
                         documents and store them as a TAG hypergraph.</note> Following the argument
                      of <ref type="bibl" target="#sper2008">Sperberg-McQueen and Huitfeldt</ref> and
                         <ref target="#per2009" type="bibl">Peroni and Vitali</ref>, we did not
@@ -1048,8 +1057,8 @@
                retrieve all quotes together. The first would not pose a problem for TEI XML, but
                retrieving the disjointed quotations as one (merged) utterance would only be possible
                with additional, vocabulary-specific coding. Processing the two <gi>q</gi> elements
-               as a single <gi>q</gi> requires a set of <ptr type="software" xml:id="XSLT"
-                  target="#XSLT"/><rs type="soft.name" ref="#XSLT">XSLT</rs> instructions that check
+               as a single <gi>q</gi> requires a set of <ptr type="software" xml:id="R1"
+                  target="#XSLT"/><rs type="soft.name" ref="#R1">XSLT</rs> instructions that check
                the values of the <att>xml:id</att> and the <att>next</att> and <att>prev</att>
                attributes in order to know which <gi>q</gi> elements should be stitched together. In
                TAGML, both scenarios would be equally straightforward. The hypergraph can be queried
@@ -1091,22 +1100,22 @@
                   <head type="legend">TEI transcription of <ptr target="#discont4" type="crossref"
                      /></head>
                </figure> To process the text of this fragment correctly, one needs to write a rather
-               complicated set of <ptr type="software" xml:id="XSLT" target="#XSLT"/><rs
-                  type="soft.name" ref="#XSLT">XSLT</rs> instructions. At the very least, these
+               complicated set of <ptr type="software" xml:id="R2" target="#XSLT"/><rs
+                  type="soft.name" ref="#R2">XSLT</rs> instructions. At the very least, these
                instructions need to match the values of the <att>xml:id</att> and <att>prev</att> in
                order to process the first part of the deletion, look for the second part of the
                deletion, and then concatenate their textual content. At the same time, one has to
                prevent the second part from being processed twice (first as the second part of the
                deletion, and the second time together with the regular <gi>del</gi> elements). After
-               some experimenting and consulting several <ptr type="software" xml:id="XSLT"
-                  target="#XSLT"/><rs type="soft.name" ref="#XSLT">XSLT</rs> specialists, we have
-               come to no less than three different sets of instructions.<note>The authors are
-                  grateful to Peter Boot, Vincent Neyt, and Frederike Neuber for sharing their
-                  expertise and invaluable insights.</note> And considering the ingenuity and
-               technical expertise of the TEI community, we are quite certain there are even more
-               ways. In short, it can be a challenging and time-consuming process to write and tweak
-               vocabulary-specific and schema-aware tools—a daunting task for any TEI XML user who
-               lacks a certain level of technical expertise. </p>
+               some experimenting and consulting several <ptr type="software" xml:id="R3"
+                  target="#XSLT"/><rs type="soft.name" ref="#R3">XSLT</rs> specialists, we have come
+               to no less than three different sets of instructions.<note>The authors are grateful
+                  to Peter Boot, Vincent Neyt, and Frederike Neuber for sharing their expertise and
+                  invaluable insights.</note> And considering the ingenuity and technical expertise
+               of the TEI community, we are quite certain there are even more ways. In short, it can
+               be a challenging and time-consuming process to write and tweak vocabulary-specific
+               and schema-aware tools—a daunting task for any TEI XML user who lacks a certain level
+               of technical expertise. </p>
          </div>
          <div xml:id="conclusion">
             <head>Conclusion</head>

diff --git a/evaluation/csv/citation-types-frequencies.csv b/evaluation/csv/citation-types-frequencies.csv
@@ -1,8 +1,8 @@
-Citation type,/2015/ abs. frequency (n=27),/2015/ rel. frequency (in %),/2016/ abs. frequency (n=45),/2016/ rel. frequency (in %),/2017/ abs. frequency (n=0),/2017/ rel. frequency (in %),/2018/ abs. frequency (n=74),/2018/ rel. frequency (in %),/2019/ abs. frequency (n=67),/2019/ rel. frequency (in %),/2020/ abs. frequency (n=53),/2020/ rel. frequency (in %),ALL / abs. frequency (n=266),ALL / rel. frequency (in %)
-Bib.Soft,1,3.7,0,.0,0,0,11,14.86,3,4.48,1,1.89,16,6.02
-Bib.Ref,5,18.52,14,31.11,0,0,11,14.86,17,25.37,15,28.3,62,23.31
-Name.Only,19,70.37,31,68.89,0,0,48,64.86,52,77.61,37,69.81,187,70.3
-Agent,1,3.7,0,.0,0,0,5,6.76,1,1.49,4,7.55,11,4.14
-URL,5,18.52,2,4.44,0,0,22,29.73,13,19.4,10,18.87,52,19.55
-PID,0,.0,0,.0,0,0,0,.0,1,1.49,0,.0,1,.38
-Ver,1,3.7,0,.0,0,0,6,8.11,2,2.99,0,.0,9,3.38
+Citation type, abs. frequency (n=119), rel. frequency (in %),ALL / abs. frequency (n=119),ALL / rel. frequency (in %)
+Soft.Bib,1,.84,1,.84
+Soft.Bib.Ref,0,.0,0,.0
+Soft.Name,119,100.0,119,100.0
+Soft.Agent,0,.0,0,.0
+Soft.URL,9,7.56,9,7.56
+Soft.PID,0,.0,0,.0
+Soft.Ver,0,.0,0,.0