Skip to content

Commit

Permalink
corrected annotations and added to software list
Browse files Browse the repository at this point in the history
  • Loading branch information
daniel-jettka committed Feb 5, 2024
1 parent 1ccd144 commit 638589f
Show file tree
Hide file tree
Showing 14 changed files with 80 additions and 66 deletions.
2 changes: 1 addition & 1 deletion data/JTEI/10_2016-19/jtei-10-haaf-source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@
well as <ref target="http://www.deutschestextarchiv.de/dtaq/about">collaborative text
correction and annotation</ref><note rend="inside.parenthesis">See <bibl><title
level="a"><ptr type="software" xml:id="R3"
target="#dtaq"/><rs type="soft.name" ref="R3">DTAQ: Kollaborative Qualitätssicherung im Deutschen Textarchiv</rs></title>
target="#dtaq"/><rs type="soft.name" ref="#R3">DTAQ: Kollaborative Qualitätssicherung im Deutschen Textarchiv</rs></title>
(Collaborative Quality Assurance within the DTA), accessed January 28, 2017, <rs type="soft.url" ref="#R3"><ptr
target="http://www.deutschestextarchiv.de/dtaq/about"/></rs></bibl>. On the process of
quality assurance in the DTA, see, for example, <ref target="#haaf13" type="bibl">Haaf,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -548,10 +548,9 @@
type="soft.name" ref="#R3">GitHub</rs>).</p>
<p>But the story did not end there. The freely available and processable collection of
abstracts inspired Peter Andorfer, a colleague of the editors at the Austrian Centre for
Digital Humanities, to use this text collection to built an <ptr type="software" xml:id="R12"
target="#existdbpoweredwebapplication"/><rs type="soft.name" ref="#R12">eXistdb-powered web
application</rs> (<rs type="soft.bib.ref" ref="#R12"><ref type="bibl" target="#andorfer17">Andorfer and Hannesschläger
2017</ref></rs>). In the context of licensing issues, it is important to mention that
Digital Humanities, to use this text collection to built an eXistdb-powered web
application (<ref type="bibl" target="#andorfer17">Andorfer and Hannesschläger
2017</ref>). In the context of licensing issues, it is important to mention that
Andorfer was never approached by the editors or explicitly asked to process the TEI
files, and he only informed the editors about the web application that he was building
when it was already available online (as a <soCalled>work in progress</soCalled>, but
Expand Down
42 changes: 21 additions & 21 deletions data/JTEI/13_2020-22/jtei-cc-ra-parisse-182-source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@
format. Backward conversion is possible in many cases, with limitations inherent in the
destination target format. <ptr type="software" xml:id="R8" target="#teicorpo"/>
<rs type="soft.name" ref="#R8">TEICORPO</rs> can run the <ptr type="software" xml:id="R9"
target="#treetager"/>
target="#treetagger"/>
<rs type="soft.name" ref="#R9">Treetagger</rs> part-of-speech tagger and the <ptr
type="software" xml:id="R10" target="#stanfordcorenlp"/>
<rs type="soft.name" ref="#R10">Stanford CoreNLP</rs> tools on TEI files and can export
Expand Down Expand Up @@ -231,15 +231,15 @@
<div xml:id="similarities">
<head>Similarities with and Differences from Other Approaches</head>
<p>Many software packages dedicated to editing spoken language transcription contain
utilities that can convert many formats: for example, <ptr type="software" xml:id="15"
utilities that can convert many formats: for example, <ptr type="software" xml:id="R15"
target="#exmaralda"/><rs type="soft.name" ref="#R15">EXMARaLDA</rs> (<rs
type="Bib.Ref" target="#R15"><ref type="bibl" target="#schmidt2004">Schmidt 2004</ref>
</rs>; see <rs type="URL" target="#R15"><ptr target="https://exmaralda.org"/></rs>),
<ptr type="software" xml:id="16" target="#anvil"/>
<ptr type="software" xml:id="R16" target="#anvil"/>
<rs type="soft.name" ref="#R16">Anvil</rs> (<rs type="Bib.Ref" target="#R16">
<ref type="bibl" target="#kipp2001">Kipp 2001</ref></rs>; see <rs type="URL"
target="#R16"><ptr target="https://www.anvil-software.org"/></rs>), and <ptr
type="software" xml:id="17" target="#elan"/><rs type="soft.name" ref="#17">ELAN</rs>
type="software" xml:id="R17" target="#elan"/><rs type="soft.name" ref="#R17">ELAN</rs>
(<rs type="bib.ref" target="#R17"><ref type="bibl" target="#wittenburg2006">Wittenburg
et al. 2006</ref></rs>; see <rs type="URL" target="#R17">
<ptr target="https://archive.mpi.nl/tla/elan"/></rs>). However, in all cases, the
Expand All @@ -257,7 +257,7 @@
<p>The list of tools that are considered in the two projects is nearly the same. The only
tools missing in the <ptr type="software" xml:id="R18" target="#teicorpo"/>
<rs type="soft.name" ref="#R18">TEICORPO</rs> approach are <ptr type="software"
xml:id="19" target="#exmaralda"/><rs type="soft.name" ref="#R19">EXMARaLDA</rs> and
xml:id="R19" target="#exmaralda"/><rs type="soft.name" ref="#R19">EXMARaLDA</rs> and
<ptr type="software" xml:id="R19" target="#folker"/>FOLKER (<rs type="bib.ref"
target="#R19"><ref type="bibl" target="#schmidts2010">Schmidt and Schütte
2010</ref></rs>; see <rs type="URL" target="#R19"><ptr
Expand Down Expand Up @@ -620,7 +620,7 @@
tools, a single-level annotation structure within the <gi>spanGrp</gi> elements is
insufficient to represent the complex organization that can be constructed with the
<ptr type="software" xml:id="R78" target="#elan"/><rs type="soft.name" ref="#R78"
>ELAN</rs> and <ptr type="software" xml:id="R78" target="#praat"/>
>ELAN</rs> and <ptr type="software" xml:id="R79" target="#praat"/>
<rs type="soft.name" ref="#R79">Praat</rs> tools. <ptr type="software" xml:id="R80"
target="#elan"/><rs type="soft.name" ref="#R80">ELAN</rs> is a tool used by many
researchers to describe data of greater complexity than the data presented in the
Expand Down Expand Up @@ -792,7 +792,7 @@
<figure xml:id="fig4">
<graphic url="media/image2.PNG" width="620px" height="980px"/>
<head type="legend"><ptr type="software" xml:id="R98" target="#elan"/><rs
type="soft.name" ref="#98">ELAN</rs> example of a temporal division</head>
type="soft.name" ref="#R98">ELAN</rs> example of a temporal division</head>
</figure>
<figure xml:id="example_code_4">
<egXML xmlns="http://www.tei-c.org/ns/Examples">
Expand Down Expand Up @@ -851,7 +851,7 @@
corpora to be used with other editing tools, some of which are suited to specific
processing: for example, <ptr type="software" xml:id="R104" target="#praat"/>
<rs type="soft.name" ref="#R104">Praat</rs> for phonetics/phonology; <ptr
type="software" xml:id="#R105" target="#transcriber"/>
type="software" xml:id="R105" target="#transcriber"/>
<rs type="soft.name" ref="#R105">Transcriber</rs>/<ptr type="software" xml:id="R106"
target="#clan"/>
<rs type="soft.name" ref="#R106">CLAN</rs> for raw transcription; and <ptr
Expand Down Expand Up @@ -1076,7 +1076,7 @@
<rs type="soft.name" ref="#R126">CLAN</rs> , <ptr type="software" xml:id="R127"
target="#elan"/><rs type="soft.name" ref="#R127">ELAN</rs>, <ptr type="software"
xml:id="R128" target="#praat"/>
<rs type="soft.name" ref="R128">Praat</rs>, <ptr type="software" xml:id="R129"
<rs type="soft.name" ref="#R128">Praat</rs>, <ptr type="software" xml:id="R129"
target="#transcriber"/>
<rs type="soft.name" ref="#R129">Transcriber</rs>, nor of course in TEI format.</p>
<p><ptr type="software" xml:id="R130" target="#teicorpo"/>
Expand All @@ -1094,7 +1094,7 @@
<rs type="soft.name" ref="#R134">TEICORPO</rs>: <ptr type="software" xml:id="R135"
target="#treetagger"/>
<rs type="soft.name" ref="#R135">TreeTagger</rs> and <ptr type="software" xml:id="R136"
target="#corenlp"/>
target="#stanfordcorenlp"/>
<rs type="soft.name" ref="#R136">CoreNLP</rs>.</p>
<div xml:id="treetagger">
<head><ptr type="software" xml:id="R138" target="#treetagger"/>
Expand All @@ -1118,11 +1118,11 @@
<rs type="soft.name" ref="#R140">TEICORPO</rs> should be used to generate an annotated
file with lemma and POS information based on <ptr type="software" xml:id="R141"
target="#treetagger"/>
<rs type="soft.name" ref="#141">TreeTagger</rs>. <ptr type="software" xml:id="142"
<rs type="soft.name" ref="#R141">TreeTagger</rs>. <ptr type="software" xml:id="R142"
target="#treetagger"/>
<rs type="soft.name" ref="#142">TreeTagger</rs> should be installed separately. The
implementation of <ptr type="software" xml:id="143" target="#treetagger"/>
<rs type="soft.name" ref="#143">TreeTagger</rs> in <ptr type="software" xml:id="R144"
<rs type="soft.name" ref="#R142">TreeTagger</rs> should be installed separately. The
implementation of <ptr type="software" xml:id="R143" target="#treetagger"/>
<rs type="soft.name" ref="#R143">TreeTagger</rs> in <ptr type="software" xml:id="R144"
target="#teicorpo"/>
<rs type="soft.name" ref="#R144">TEICORPO</rs> includes the ability to use any
syntactic model. For French data, we used the PERCEO model (<ref type="bibl"
Expand Down Expand Up @@ -1150,7 +1150,7 @@
<gi>filename</gi></p></cell>
<cell><p><gi>filename</gi> is the full location of the <ptr type="software"
xml:id="R146" target="#treetagger"/>
<rs type="soft.name" ref="#146">TreeTagger</rs> program, according to the system
<rs type="soft.name" ref="#R146">TreeTagger</rs> program, according to the system
used (Windows, MacOS, or Linux).</p></cell>
</row>
<row>
Expand All @@ -1163,7 +1163,7 @@
<p>The environment variable TREE_TAGGER can be used to locate the model and the program.
If no <code>-program</code> option is used, the default name for the <ptr
type="software" xml:id="R147" target="#treetagger"/>
<rs type="soft.name" ref="#147">TreeTagger</rs> program is used.</p>
<rs type="soft.name" ref="#R147">TreeTagger</rs> program is used.</p>
<p>The <code>-model</code> parameter is mandatory.</p>
<p>The resulting filename ends with <code>.tei_corpo_ttg.tei_corpo.xml</code> or a
specific name provided by the user (option <code>-o</code>).</p>
Expand Down Expand Up @@ -1279,10 +1279,10 @@
</div>
<div xml:id="stanford">
<head><ptr type="software" xml:id="R148" target="#stanfordcorenlp"/>
<rs type="soft.name" ref="#148">Stanford CoreNLP</rs></head>
<rs type="soft.name" ref="#R148">Stanford CoreNLP</rs></head>
<p><ptr type="software" xml:id="R149" target="#stanfordcorenlp"/>
<rs type="soft.name" ref="#149">The Stanford Core Natural Language Processing</rs><note>
<p>Accessed March 11, 2021, <rs type="url" ref="#149"><ptr
<rs type="soft.name" ref="#R149">The Stanford Core Natural Language Processing</rs><note>
<p>Accessed March 11, 2021, <rs type="url" ref="#R149"><ptr
target="https://nlp.stanford.edu/software/"/></rs>.</p>
</note> (<ptr type="software" xml:id="R150" target="#stanfordcorenlp"/>
<rs type="soft.name" ref="#R150">CoreNLP</rs>) package is a suite of tools (<rs
Expand Down Expand Up @@ -1437,7 +1437,7 @@
recent developments (see <ref type="bibl" target="#badin2021">Badin et al. 2021</ref>)
made it possible to insert metadata stored in CSV files (including participant metadata)
into the TEI files. This makes it possible to achieve more powerful corpus analysis
using a tool such as <ptr type="software" xml:id="R177" target="txm"/><rs
using a tool such as <ptr type="software" xml:id="R177" target="#txm"/><rs
type="soft.name" ref="#R177">TXM</rs>.</p>
<p>Our approach is somewhat similar to what is suggested in the conclusion of Schmidt,
Hedeland, and Jettka (<ref type="bibl" target="#schmidt2017">2017</ref>), who describe a
Expand Down Expand Up @@ -1465,7 +1465,7 @@
<div xml:id="conclusion">
<head>Conclusion</head>
<p><ptr type="software" xml:id="R183" target="#teicorpo"/>
<rs type="soft.name" ref="R183">TEICORPO</rs> is a functional tool, created by the CORLI
<rs type="soft.name" ref="#R183">TEICORPO</rs> is a functional tool, created by the CORLI
network and ORTOLANG, that converts files created by software specializing in editing
spoken-language data into TEI format. The result is fully compatible with the most recent
developments in TEI, especially those that concern spoken-language material.</p>
Expand Down
2 changes: 1 addition & 1 deletion data/JTEI/13_2020-22/jtei-cc-ra-wittern-189-source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -574,7 +574,7 @@
usage has been increasing slowly but steadily.</p>
<div xml:id="kanripo">
<head>Kanripo Project Details</head>
<p>All the texts are freely available on <rs type="soft.name" ref="">GitHub</rs> in their
<p>All the texts are freely available on <rs type="soft.name" ref="#github">GitHub</rs> in their
source form. This repository of texts can be accessed through the <ref
target="https://www.kanripo.org/">kanripo.org</ref> website, but also through a module
of the Emacs editor called Mandoku. This allows users to query, access, clone, edit, and
Expand Down
2 changes: 1 addition & 1 deletion data/JTEI/14_2021-23/jtei-cc-ra-mylonas-202-source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -619,7 +619,7 @@
target="http://nomisma.org/">Nomisma</ref>, and <ref
target="http://www.cidoc-crm.org/crmtex/home-8">CRMtex</ref><note>CIDOC (International
Committee for Documentation) Conceptual <ptr type="software" xml:id="Reference"
target="#Reference"/><rs type="soft.name" ref="#Reference">Reference</rs> Model,
target="#omekareference"/><rs type="soft.name" ref="#Reference">Reference</rs> Model,
accessed July 4, 2022, <ptr target="http://www.cidoc-crm.org/"/>; Nomisma (knowledge
organization system for numismatics), accessed July 4, 2022, <ptr
target="http://nomisma.org/"/>; CRMtex model for the study of ancient texts (an
Expand Down
4 changes: 2 additions & 2 deletions data/JTEI/7_2014/jtei-7-dee-source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -734,9 +734,9 @@
<head>Integrated Resources</head>
<p>While initiatives such as TAPAS, TEICHI, and <ref
target="https://sites.google.com/site/cwrcwriterhelp/"><ptr type="software"
xml:id="CWRC-Writer" target="#CWRC-Writer"/><rs type="soft.name" ref="#CWRC-Writer"
xml:id="CWRC-Writer" target="#cwrcwriter"/><rs type="soft.name" ref="#CWRC-Writer"
>CWRC-Writer</rs></ref><note><p><title level="a">Welcome to CWRC Writer</title>,
<ptr type="software" xml:id="CWRC-Writer" target="#CWRC-Writer"/><rs
<ptr type="software" xml:id="CWRC-Writer" target="#cwrcwriter"/><rs
type="soft.name" ref="#CWRC-Writer">CWRC-Writer</rs> Help, accessed September 7,
2013, <ptr target="https://sites.google.com/site/cwrcwriterhelp/"/>.</p></note> have
begun to address to different aspects of these needs (<ref type="bibl"
Expand Down
6 changes: 3 additions & 3 deletions data/JTEI/8_2014-15/jtei-8-boschetti-source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@
open-source general-purpose framework <ref target="http://cocoon.apache.org/"
>Cocoon</ref><note><ptr target="http://cocoon.apache.org/"/>.</note> and the native
XML database <ref target="http://exist-db.org/"><ptr type="software" xml:id="eXist-db"
target="#eXist-db"/><rs type="soft.name" ref="#eXist-db">eXist-db</rs></ref><note><ptr
target="#existdb"/><rs type="soft.name" ref="#eXist-db">eXist-db</rs></ref><note><ptr
target="http://exist-db.org/"/>.</note> deserve to be mentioned. Specifically for
TEI-annotated documents, <ref target="http://www.tustep.uni-tuebingen.de/tustep_eng.html"
>TUSTEP</ref>,<note><ptr target="http://www.tustep.uni-tuebingen.de/tustep_eng.html"
Expand Down Expand Up @@ -245,7 +245,7 @@
exposes methods that parse the XML file and create <ptr type="software" xml:id="Java"
target="#Java"/><rs type="soft.name" ref="#Java">Java</rs> objects. The resources are
stored and maintained in a native XML database management system (i.e., <ptr
type="software" xml:id="eXist-db" target="#eXist-db"/><rs type="soft.name"
type="software" xml:id="eXist-db" target="#existdb"/><rs type="soft.name"
ref="#eXist-db">eXist-db</rs>). The APIs and services provided by Lucene, a software
library developed and hosted by the Apache Foundation, have been used for indexing the
textual data.</p>
Expand Down Expand Up @@ -646,7 +646,7 @@
<p> The marshalling and unmarshalling process handles the serialization of the object
representation of the TEI document, in order to store and retrieve data on the filesystem
or in native XML databases, such as <ptr type="software" xml:id="eXist-db"
target="#eXist-db"/><rs type="soft.name" ref="#eXist-db">eXist-db</rs>.</p>
target="#existdb"/><rs type="soft.name" ref="#eXist-db">eXist-db</rs>.</p>
<p>Performance measurement tools such as JMeter will help to optimize the performance of the
library components.</p>
<p> Software currently under development will be available on <ptr type="software"
Expand Down
Loading

0 comments on commit 638589f

Please sign in to comment.