From be1881396b93b82f2028d4bc7cbdf5b0b933e07f Mon Sep 17 00:00:00 2001 From: Anne Ferger Date: Wed, 13 Mar 2024 16:56:42 +0100 Subject: [PATCH] some small fixes to the annotations --- .../10_2016-19/jtei-10-burghart-source.xml | 66 +- .../JTEI/10_2016-19/jtei-10-dumont-source.xml | 13 - .../JTEI/10_2016-19/jtei-10-emsley-source.xml | 13 - data/JTEI/10_2016-19/jtei-10-haaf-source.xml | 8 +- .../10_2016-19/jtei-10-homenda-source.xml | 13 - .../JTEI/10_2016-19/jtei-10-romary-source.xml | 20 +- .../10_2016-19/jtei-10-viglianti-source.xml | 13 - data/JTEI/11_2019-20/jtei-11-intro-source.xml | 13 - .../jtei-cc-ra-bermudez-sabel-137-source.xml | 24 +- ...jtei-cc-ra-hannessschlaeger-164-source.xml | 24 +- data/JTEI/12_2019-20/jtei-12-intro-source.xml | 13 - .../jtei-cc-ra-bauman-170-source.xml | 38 +- .../jtei-cc-ra-flanders-176-source.xml | 276 ++--- .../jtei-cc-pn-kuhry-188-source.xml | 124 +-- .../jtei-cc-ra-parisse-182-source.xml | 944 +++++++++--------- .../jtei-cc-ra-winslow-186-source.xml | 6 +- .../jtei-cc-ra-wittern-189-source.xml | 64 +- .../jtei-barabuccietal-196-source.xml | 74 +- .../jtei-bleeker-et-al-199-source.xml | 22 +- ...tei-burnard-shoch-odebrecht-194-source.xml | 28 +- .../jtei-cc-pn-erjavec-195-source.xml | 46 +- .../jtei-cc-pn-holmes-193-source.xml | 34 +- .../jtei-cc-ra-mylonas-202-source.xml | 8 +- .../jtei-rioriande-torresallen-250-source.xml | 174 ++-- data/JTEI/7_2014/jtei-7-carter-source.xml | 13 - data/JTEI/7_2014/jtei-7-dee-source.xml | 48 +- .../7_2014/jtei-7-pfannenschmidt-source.xml | 13 - data/JTEI/7_2014/jtei-7-schmidt-source.xml | 13 - .../7_2014/jtei-7-schreibman-intro-source.xml | 13 - data/JTEI/8_2014-15/jtei-8-barbero-source.xml | 13 - data/JTEI/8_2014-15/jtei-8-berti-source.xml | 64 +- data/JTEI/8_2014-15/jtei-8-blanco-source.xml | 38 +- .../8_2014-15/jtei-8-boschetti-source.xml | 184 ++-- data/JTEI/8_2014-15/jtei-8-ciotti-source.xml | 13 - data/JTEI/8_2014-15/jtei-8-dumont-source.xml | 216 ++-- data/JTEI/8_2014-15/jtei-8-iglesia-source.xml | 112 +-- data/JTEI/8_2014-15/jtei-8-intro-source.xml | 10 +- data/JTEI/8_2014-15/jtei-8-moerth-source.xml | 10 +- data/JTEI/8_2014-15/jtei-8-munoz-source.xml | 14 +- .../JTEI/8_2014-15/jtei-8-rosselli-source.xml | 482 ++++----- .../JTEI/9_2016-17/jtei-9-armaselu-source.xml | 36 +- data/JTEI/9_2016-17/jtei-9-ciotti-source.xml | 8 +- data/JTEI/9_2016-17/jtei-9-turska-source.xml | 2 +- .../jtei-vagionakis-204-source.xml | 114 +-- .../rolling_2022/jtei-mitiku-212-source.xml | 56 +- .../rolling_2022/jtei-teilex-207-source.xml | 4 +- 46 files changed, 1689 insertions(+), 1845 deletions(-) diff --git a/data/JTEI/10_2016-19/jtei-10-burghart-source.xml b/data/JTEI/10_2016-19/jtei-10-burghart-source.xml index 3aab424..8e86a2f 100644 --- a/data/JTEI/10_2016-19/jtei-10-burghart-source.xml +++ b/data/JTEI/10_2016-19/jtei-10-burghart-source.xml @@ -195,8 +195,8 @@

- Towards a Toolbox + Towards a Toolbox

After this survey, I started writing my own scripts in order to support my editorial work, as a set of separate tools. It quickly occurred to me that those scripts could easily be grouped together in a toolbox that, with a little bit of @@ -233,7 +233,7 @@

The TEI Critical Apparatus Toolbox is an online application for the quick and easy visualization and processing of TEI XML critical editions. It is not meant to be a publication tool: the Critical Apparatus <ptr - type="software" xml:id="Toolbox" target="#Toolbox"/><rs type="soft.name" + type="software" xml:id="Toolbox" target="softw:Toolbox"/><rs type="cit:soft.name" ref="#Toolbox">Toolbox</rs> specifically targets the needs of editors during the preparation of their ongoing work, allowing them to perform quality controls on their TEI files and to display their work-in-progress text either in the style of a @@ -257,7 +257,7 @@ visualization for such encoding is not easy, because there might not be an identifiable critical text (yet), and the styles can be mixed in the apparatus. The Critical Apparatus <ptr type="software" xml:id="Toolbox" - target="#Toolbox"/><rs type="soft.name" ref="#Toolbox">Toolbox</rs> will + target="softw:Toolbox"/>Toolbox will display each style (lemma and reading[s], or readings only) differently: In each case, the content of lem and rdg are highlighted, with a white background. @@ -279,8 +279,8 @@

Displaying this code in the Critical - Apparatus <ptr type="software" xml:id="Toolbox" target="#Toolbox"/><rs - type="soft.name" ref="#Toolbox">Toolbox</rs> + Apparatus Toolbox
@@ -308,8 +308,8 @@
Displaying this code in the Critical - Apparatus <ptr type="software" xml:id="Toolbox" target="#Toolbox"/><rs - type="soft.name" ref="#Toolbox">Toolbox</rs> + Apparatus Toolbox
The use of reading groups is also supported: the content of each @@ -334,8 +334,8 @@
Displaying this code in the Critical - Apparatus <ptr type="software" xml:id="Toolbox" target="#Toolbox"/><rs - type="soft.name" ref="#Toolbox">Toolbox</rs> + Apparatus Toolbox
@@ -351,7 +351,7 @@ the user, and they are displayed in a more prominent fashion, as blocks with a thin blue line representing each break.

So far the Critical Apparatus <ptr type="software" xml:id="Toolbox" - target="#Toolbox"/><rs type="soft.name" ref="#Toolbox">Toolbox</rs> is not + target="softw:Toolbox"/>Toolbox is not very different from other TEI display tools, except perhaps that it can handle a great variety of encoding styles within the Parallel Segmentation method. But its most distinctive feature is the ability to perform automated controls of the encoding.

@@ -360,14 +360,14 @@ Controlling the Consistency of Your Encoding

The preparation of a critical edition involves many sessions of meticulous proofreading, especially to check the accuracy of the apparatus. If the Critical Apparatus <ptr type="software" xml:id="Toolbox" target="#Toolbox" - /><rs type="soft.name" ref="#Toolbox">Toolbox</rs> cannot replace the + level="m">Critical Apparatus Toolbox cannot replace the careful eye of the editor, it offers an efficient way to control the consistency of the encoding by detecting small inevitable mistakes, like a typo in the list of sigla or the failure to record the reading of a particular witness in an apparatus entry.

To perform those controls, the Critical Apparatus <ptr - type="software" xml:id="Toolbox" target="#Toolbox"/><rs type="soft.name" + type="software" xml:id="Toolbox" target="softw:Toolbox"/><rs type="cit:soft.name" ref="#Toolbox">Toolbox</rs> will scan the teiHeader and front sections of the TEI file for a listWit, and find all the sigla of the witnesses. Then, it will compare this list to the manuscripts appearing @@ -384,7 +384,7 @@ easier. It is the type of apparatus that allows for the most efficient consistency checks.

The Critical Apparatus <ptr type="software" xml:id="Toolbox" - target="#Toolbox"/><rs type="soft.name" ref="#Toolbox">Toolbox</rs> can: + target="softw:Toolbox"/>Toolbox can: Highlight apparatus entries that do not use all witnesses: the content of the incomplete app elements which do not explicitly give a text for each witness listed in the listWit will be highlighted in red. @@ -401,8 +401,8 @@

Displaying this code in the Critical - Apparatus <ptr type="software" xml:id="Toolbox" target="#Toolbox"/><rs - type="soft.name" ref="#Toolbox">Toolbox</rs> + Apparatus Toolbox
Highlight apparatus entries that do not use a specific witness: @@ -447,7 +447,7 @@ sure that the editor added the reading of each witness, since unmentioned manuscripts are assumed by default to correspond to the lemma.

In this case, the Critical Apparatus <ptr type="software" - xml:id="Toolbox" target="#Toolbox"/><rs type="soft.name" ref="#Toolbox" + xml:id="Toolbox" target="softw:Toolbox"/><rs type="cit:soft.name" ref="#Toolbox" >Toolbox</rs> can nevertheless highlight apparatus entries that mention a particular witness. Practically, this will highlight only apparatus entries where a witness’s reading differs from the lemma. Example:

Displaying this code in the Critical - Apparatus <ptr type="software" xml:id="Toolbox" target="#Toolbox"/><rs - type="soft.name" ref="#Toolbox">Toolbox</rs>, when highlighting all + Apparatus Toolbox, when highlighting all apparatus entries mentioning witness R2

@@ -470,7 +470,7 @@
Other Controls

The Critical Apparatus <ptr type="software" xml:id="Toolbox" - target="#Toolbox"/><rs type="soft.name" ref="#Toolbox">Toolbox</rs> can + target="softw:Toolbox"/>Toolbox can also highlight apparatus entries that contain a lem, or that contain only rdg elements.

@@ -509,7 +509,7 @@
Application Design

The Critical Apparatus <ptr type="software" xml:id="Toolbox" - target="#Toolbox"/><rs type="soft.name" ref="#Toolbox">Toolbox</rs> is an + target="softw:Toolbox"/>Toolbox is an online application built on a set of XSLT stylesheets served through PHP files, the output of which is made interactive thanks to Javascript and CSS. It makes use of some parts of TEI Boilerplate, most notably its web design. But @@ -540,7 +540,7 @@ >TEI P5 version 2.9.0 release notes

In the future, keeping up with the developments and evolutions of the Critical Apparatus module will be a priority. We hope that the Critical Apparatus <ptr type="software" - xml:id="Toolbox" target="#Toolbox"/><rs type="soft.name" ref="#Toolbox" + xml:id="Toolbox" target="softw:Toolbox"/><rs type="cit:soft.name" ref="#Toolbox" >Toolbox</rs> will be able to adapt to these evolutions: since the functions of the interface are powered by Javascript, updating the XSLT should be enough to adapt to new rules or elements in the module.

@@ -549,7 +549,7 @@
Future Developments

The beginning of the development of the Critical Apparatus <ptr - type="software" xml:id="Toolbox" target="#Toolbox"/><rs type="soft.name" + type="software" xml:id="Toolbox" target="softw:Toolbox"/><rs type="cit:soft.name" ref="#Toolbox">Toolbox</rs> was a lonely endeavor, but the project has since benefited from the collaboration of Magdalena Turska

Magdalena Turska created this prototype as part of her work as an Experienced Researcher Fellow of the (DIXIT) program (accessed December 4, 2016).

who wrote a prototype for the integration of the Toolbox into an oXygen + target="softw:Toolbox"/>Toolbox into an oXygen framework. Decisive help was also found via a collaboration with the Erasmus SP+ DEMM program (<ref target="http://www.digitalmanuscripts.eu">Digital Edition of Medieval Manuscripts</ref>).

Accessed December 4, 2016,

For the three years beginning in June 2015 DEMM is holding an annual hackathon event where the Critical - Apparatus <ptr type="software" xml:id="Toolbox" target="#Toolbox"/><rs type="soft.name" + Apparatus <ptr type="software" xml:id="Toolbox" target="softw:Toolbox"/><rs type="cit:soft.name" ref="#Toolbox">Toolbox</rs> is the base application that small, mixed teams of textual scholars and computer scientists try to enhance to meet their particular needs.

The first of these events took place at Queen Mary University London in @@ -573,7 +573,7 @@ Manuscripts, People, accessed December 4, 2016, ).

These events will play an important role in the future developments of the Toolbox, since they confront us directly with the real-life experience and needs of editors.

@@ -585,7 +585,7 @@ to parallel versions of a text with potentially different branches), and the last concentrated on the relationship between the edited text and images. These themes could serve as general directions for enhancing the Critical Apparatus <ptr - type="software" xml:id="Toolbox" target="#Toolbox"/><rs type="soft.name" + type="software" xml:id="Toolbox" target="softw:Toolbox"/><rs type="cit:soft.name" ref="#Toolbox">Toolbox</rs>: offering visualization options for named entities, from a simple index to more elaborate links to maps, when possible; @@ -595,7 +595,7 @@ display of parallel witnesses; adding some options to link the text to its representation, or to images generally. This poses the problem of access to the images: in the current state of - the Toolbox, users upload their TEI XML edition but not the other files potentially linked to it, like images. @@ -609,8 +609,8 @@ Tool.<ref target="http://tapor.uvic.ca/~mholmes/image_markup/">The UVic Image Markup Tool Project</ref>, accessed December 4, 2016, Even if the Critical Apparatus <ptr type="software" xml:id="Toolbox" target="#Toolbox" - /><rs type="soft.name" ref="#Toolbox">Toolbox</rs> is not a publication + level="m">Critical Apparatus Toolbox is not a publication application, such an output would provide users with a ready-to-use static version of their edition, a set of files (HTML, CSS, Javascript, etc.) that they could publish on their website or show in a demo session. While complex projects will always need a @@ -637,7 +637,7 @@ possible.

I am preparing a generic TEI-to-LaTeX and TEI-to-PDF conversion feature that will be implemented in the Critical Apparatus <ptr type="software" - xml:id="Toolbox" target="#Toolbox"/><rs type="soft.name" ref="#Toolbox" + xml:id="Toolbox" target="softw:Toolbox"/><rs type="cit:soft.name" ref="#Toolbox" >Toolbox</rs>. I chose LaTeX as an intermediary format because it offers all the desired options, thanks to the reledmac package especially designed for typesetting critical editions.

The reledmac package, maintained by @@ -683,7 +683,7 @@ target="#turskaCummingsRahtz2016" type="bibl">Turska, Cummings and Rahtz 2016

The Critical Apparatus <ptr type="software" xml:id="Toolbox" - target="#Toolbox"/><rs type="soft.name" ref="#Toolbox">Toolbox</rs> belongs in + target="softw:Toolbox"/>Toolbox belongs in this growing galaxy of lightweight, user-oriented tools. With features demonstrating the immediate benefits of TEI encoding, it is a good tool for TEI training and workshops. But its main purpose remains to facilitate the work of textual scholars, a complex task given diff --git a/data/JTEI/10_2016-19/jtei-10-dumont-source.xml b/data/JTEI/10_2016-19/jtei-10-dumont-source.xml index c161c2a..472fe05 100644 --- a/data/JTEI/10_2016-19/jtei-10-dumont-source.xml +++ b/data/JTEI/10_2016-19/jtei-10-dumont-source.xml @@ -47,19 +47,6 @@ humanities and social sciences, open to quality periodicals looking to publish full-text articles online.

- - -

In the context of this project, private URIs with the prefix softw point to software - items in the software-list.xml file, which are encoded with item elements and - identified in xml:id.

-
- -

In the context of this project, private URIs with the prefix cit point to - category elements in the citation-taxonomy.xml file.

-
-
diff --git a/data/JTEI/10_2016-19/jtei-10-emsley-source.xml b/data/JTEI/10_2016-19/jtei-10-emsley-source.xml index 11bc189..f4a3696 100644 --- a/data/JTEI/10_2016-19/jtei-10-emsley-source.xml +++ b/data/JTEI/10_2016-19/jtei-10-emsley-source.xml @@ -59,19 +59,6 @@ humanities and social sciences, open to quality periodicals looking to publish full-text articles online.

- - -

In the context of this project, private URIs with the prefix softw point to software - items in the software-list.xml file, which are encoded with item elements and - identified in xml:id.

-
- -

In the context of this project, private URIs with the prefix cit point to - category elements in the citation-taxonomy.xml file.

-
-
diff --git a/data/JTEI/10_2016-19/jtei-10-haaf-source.xml b/data/JTEI/10_2016-19/jtei-10-haaf-source.xml index 8de991a..d98ca1c 100644 --- a/data/JTEI/10_2016-19/jtei-10-haaf-source.xml +++ b/data/JTEI/10_2016-19/jtei-10-haaf-source.xml @@ -224,10 +224,10 @@ target="http://www.deutschestextarchiv.de/doku/software#cab"/>. as well as collaborative text correction and annotationSee <ptr type="software" xml:id="R3" target="#dtaq"/><rs type="soft.name" + level="a"><ptr type="software" xml:id="R3" target="softw:dtaq"/><rs type="cit:soft.name" ref="#R3">DTAQ: Kollaborative Qualitätssicherung im Deutschen Textarchiv</rs> (Collaborative Quality Assurance within the DTA), accessed - January 28, 2017, . On the process of quality assurance in the DTA, see, for example, Haaf, Wiegand, and Geyken 2013.) is a matter of supporting @@ -289,7 +289,7 @@ have been manually transcribed, annotated in TEI XML, and published via the DTA infrastructure. Most of these manuscripts were keyed manually by a vendor and published at an early stage in the web-based quality assurance platform DTAQ. There, the + xml:id="R2" target="softw:dtaq"/>DTAQ. There, the transcription as well as the annotation of each document was checked and corrected, if necessary; DTAQ also provided the means to add additional markup, such as the tagging of person names (persName), directly at page level. After the process of quality @@ -1226,7 +1226,7 @@ corpora. Our primary goal is to be as inclusive as possible, allowing for other projects to benefit from our resources (i.e., our comprehensive guidelines and documentation as well as the technical infrastructure that includes Schemas, ODDs, and XSLT scripts) and + xml:id="R1" target="softw:xslt"/>XSLT scripts) and contribute to our corpora. We also want to ensure interoperability of all data within the DTA corpora. The underlying TEI format has to be continuously maintained and adapted to new necessities with these two premises in mind.

diff --git a/data/JTEI/10_2016-19/jtei-10-homenda-source.xml b/data/JTEI/10_2016-19/jtei-10-homenda-source.xml index 60af5b3..6b98500 100644 --- a/data/JTEI/10_2016-19/jtei-10-homenda-source.xml +++ b/data/JTEI/10_2016-19/jtei-10-homenda-source.xml @@ -62,19 +62,6 @@ in the humanities and social sciences, open to quality periodicals looking to publish full-text articles online.

- - -

In the context of this project, private URIs with the prefix softw point to software - items in the software-list.xml file, which are encoded with item elements and - identified in xml:id.

-
- -

In the context of this project, private URIs with the prefix cit point to - category elements in the citation-taxonomy.xml file.

-
-
diff --git a/data/JTEI/10_2016-19/jtei-10-romary-source.xml b/data/JTEI/10_2016-19/jtei-10-romary-source.xml index 7409708..ef1d129 100644 --- a/data/JTEI/10_2016-19/jtei-10-romary-source.xml +++ b/data/JTEI/10_2016-19/jtei-10-romary-source.xml @@ -658,8 +658,8 @@ available at . In our proposal, the etym element has to be made recursive in order to allow the fine-grained representations we propose here. The corresponding ODD customization, together with - reference examples, is available on GitHub. and the fact that a change + reference examples, is available on GitHub. and the fact that a change occurred within the contemporary lexicon (as opposed to its parent language) is indicated by means of xml:lang on the source form.There may also be cases in which it is unknown whether a given etymological process occurred within the @@ -780,7 +780,7 @@ text.The interested reader may ponder here the possibility to also encode scripts by means of the notation attribute instead of using a cluttering of language subtags on xml:lang. For more on this issue, see the proposal in - the TEI GitHub (). This is why we have extended the notation attribute to orth in order to allow for better representation of both language @@ -999,7 +999,7 @@

The dateThe element date as a child of cit is another example which does not adhere to the current TEI standards. We have allowed this within our ODD document. A feature request proposal will be made on the GitHub page and this feature may or may not appear in future versions of the TEI Guidelines. element is listed within each etymon block; the values of attributes notBefore and notAfter specify the range of time @@ -1498,9 +1498,9 @@ extent of knowledge that is truly necessary to create an accurate model of metaphorical processes. In order to do this, it is necessary to make use of one or more ontologies, which could be locally defined within a project, and of external linked open data sources - such as DBpedia and Wikidata, or some combination thereof. Within TEI dictionary markup, URIs for existing ontological entries can be referenced in the sense, usg, and ref elements as the value of @@ -1510,8 +1510,8 @@ reference to the source entry’s unique identifier (if such an entry exists within the dataset). In such cases, the etymon pointing to the source entry can be assumed to inherit the source’s domain and sense information, and this information can be automatically - extracted with a fairly simple XSLT program; thus the encoders may choose to leave some + extracted with a fairly simple XSLT program; thus the encoders may choose to leave some or all of this information out of the etymon section. However, in the case that the dataset does not actually have entries for the source terms, or the encoder wants to be explicit in all aspects of the etymology, as mentioned above, the source domain and the @@ -1572,7 +1572,7 @@ URI is referenced in oRef and pRef as the value of corresp (@corresp="#animal").

In sense, the URI corresponding to the DBpedia entry for horse is + target="softw:dbpedia"/>DBpedia entry for horse is the value for the attribute corresp. Additionally, the date notBefore="…" element–attribute pairing is used to specify that the term has only been used for the horse since 1517 at maximum (corresponding to the first Spanish @@ -2502,7 +2502,7 @@

For the issues regarded as the most fundamentally important to creating a dynamic and sustainable model for both etymology and general lexicographic markup in TEI, we have submitted formal requests for changes to the TEI GitHub, and will continue to + target="softw:github"/>GitHub, and will continue to submit change requests as needed. While this work represents a large step in the right direction for those looking for means of representing etymological information, there are still a number of unresolved issues that will need to be addressed. These remaining issues diff --git a/data/JTEI/10_2016-19/jtei-10-viglianti-source.xml b/data/JTEI/10_2016-19/jtei-10-viglianti-source.xml index 274d931..8a48419 100644 --- a/data/JTEI/10_2016-19/jtei-10-viglianti-source.xml +++ b/data/JTEI/10_2016-19/jtei-10-viglianti-source.xml @@ -51,19 +51,6 @@ in the humanities and social sciences, open to quality periodicals looking to publish full-text articles online.

- - -

In the context of this project, private URIs with the prefix softw point to software - items in the software-list.xml file, which are encoded with item elements and - identified in xml:id.

-
- -

In the context of this project, private URIs with the prefix cit point to - category elements in the citation-taxonomy.xml file.

-
-
diff --git a/data/JTEI/11_2019-20/jtei-11-intro-source.xml b/data/JTEI/11_2019-20/jtei-11-intro-source.xml index f61d488..869a261 100644 --- a/data/JTEI/11_2019-20/jtei-11-intro-source.xml +++ b/data/JTEI/11_2019-20/jtei-11-intro-source.xml @@ -90,19 +90,6 @@ humanities and social sciences, open to quality periodicals looking to publish full-text articles online.

- - -

In the context of this project, private URIs with the prefix softw point to software - items in the software-list.xml file, which are encoded with item elements and - identified in xml:id.

-
- -

In the context of this project, private URIs with the prefix cit point to - category elements in the citation-taxonomy.xml file.

-
-
diff --git a/data/JTEI/11_2019-20/jtei-cc-ra-bermudez-sabel-137-source.xml b/data/JTEI/11_2019-20/jtei-cc-ra-bermudez-sabel-137-source.xml index 6553385..dc4e137 100644 --- a/data/JTEI/11_2019-20/jtei-cc-ra-bermudez-sabel-137-source.xml +++ b/data/JTEI/11_2019-20/jtei-cc-ra-bermudez-sabel-137-source.xml @@ -123,11 +123,11 @@ ways in which the variant taxonomy may be linked to the body of the edition.

Although this paper is TEI-centered, other XML technologies will be mentioned. includes a brief commentary on using XSLT to + type="software" xml:id="R1" target="softw:xslt"/>XSLT to transform a TEI-conformant definition of constraints into schema rules. However, the greatest attention to an additional technology is in , which discusses the use of XQuery to retrieve particular + target="softw:xquery"/>XQuery to retrieve particular loci critici and to deploy quantitative analyses.

@@ -225,10 +225,10 @@ neutralized.This statement is especially significant when dealing with corpora that have been compiled over a long period of time. As is clearly explained in the introduction to the Helsinki Corpus that Irma Taavitsainen and Päivi Pahta prepared for - the Corpus Resource Database (CoRD) (Corpus Resource Database (CoRD) (Placing the Helsinki Corpus Middle English Section Introduction into Context, I use - XSLT to process the feature structure declaration in order to create all required Schematron rules that will constrict the feature library accordingly. I am currently working on creating a more generic validator (see my Github repository, Github repository, ).
@@ -558,14 +558,14 @@ 2016, 12.2.3) seems to be a popular encoding technique for multi-witness editions, in terms of both the specific tools that have been created for this method and the number of projects that apply it.Tools include Versioning Machine, CollateX (both the Java and Python versions), and Java and Python versions), and Juxta. For representative projects using the parallel segmentation method see GitHub, but also in an GitHub, but also in an eXistdb-powered web application. This is by any standard a wonderful development for a collection of textual data—and one that would not have been possible had the abstracts not been published under an open license, especially since their authors come @@ -459,18 +459,18 @@
Submission of the Abstracts and the joys of ConfTool

As it is every year, the conference management software ConfTool ProConfTool Conference - Management Software, ConfTool GmbH, accessed August 23, 2019, . was used for the submission of the abstracts of the 2016 TEI conference. When the Vienna team received access to the ConfTool system, the instance for the 2016 conference had been equipped with default settings based on previous TEI conference settings. As ConfTool is not + xml:id="R7" target="softw:conftool"/>ConfTool is not the most intuitive system to handle for a first-time administrator,The chair of the 2017 TEI conference program committee Kathryn Tomasek has described the rather tricky structure of the system as the joys of ConfTool (email + target="softw:conftool"/>ConfTool (email message to author, April 11, 2017). one aspect was overlooked when setting up the system for the 2016 conference: the Copyright Transfer Terms and Licensing Policy that contributors had to agree to when submitting an abstract @@ -545,20 +545,20 @@ was made available via the conference website under the same license (Resch, Hannesschläger, and Wissik 2016b).

The page proofs that were transformed into this PDF had been created with Adobe + type="software" xml:id="R9" target="softw:indesign"/>Adobe InDesign. The real fun started when the InDesign file was exported to + target="softw:indesign"/>InDesign file was exported to XML and transformed back into single files (one file per abstract). These files were - edited with the Oxygen XML editor to become proper TEI files with extensive headers. Finally, they were published as a repository together with the TEI schema on GitHub (Hannesschläger and Schopper 2017), again under the same license. This allowed Martin Sievers, one of the abstract authors, to immediately correct a typing error in his abstract that the editors had overlooked (see history of Hannesschläger - and Schopper 2017 on GitHub).

+ and Schopper 2017 on GitHub).

But the story did not end there. The freely available and processable collection of abstracts inspired Peter Andorfer, a colleague of the editors at the Austrian Centre for Digital Humanities, to use this text collection to built an eXistdb-powered web diff --git a/data/JTEI/12_2019-20/jtei-12-intro-source.xml b/data/JTEI/12_2019-20/jtei-12-intro-source.xml index 17e84f1..d6d92f2 100644 --- a/data/JTEI/12_2019-20/jtei-12-intro-source.xml +++ b/data/JTEI/12_2019-20/jtei-12-intro-source.xml @@ -64,19 +64,6 @@ humanities and social sciences, open to quality periodicals looking to publish full-text articles online.

- - -

In the context of this project, private URIs with the prefix softw point to software - items in the software-list.xml file, which are encoded with item elements and - identified in xml:id.

-
- -

In the context of this project, private URIs with the prefix cit point to - category elements in the citation-taxonomy.xml file.

-
-
diff --git a/data/JTEI/12_2019-20/jtei-cc-ra-bauman-170-source.xml b/data/JTEI/12_2019-20/jtei-cc-ra-bauman-170-source.xml index decf682..739a827 100644 --- a/data/JTEI/12_2019-20/jtei-cc-ra-bauman-170-source.xml +++ b/data/JTEI/12_2019-20/jtei-cc-ra-bauman-170-source.xml @@ -388,8 +388,8 @@ - XSLT template that converts an + XSLT template that converts an lb into a space.

@@ -453,7 +453,7 @@ differentiate it from the main title. She plans to use TEI P5, so her first thought is Does the TEI title element have a type attribute? So she switches into oXygen, creates a + target="softw:oxygen"/>oXygen, creates a new tei_all document, puts her cursor immediately before the closing angle bracket of the title start-tag

The TAGC in SGML nomenclature.

and types a space. The result ( Choosing the type attribute from a - drop-down in oXygen + drop-down in oXygen
In this fictional—but completely believable—example the encoder has used the schema (tei_all) through a tool (oXygen) as a way of finding out about the markup language (TEI). Yes, she could just as well have read the demonstrates oXygen helping an encoder. Here oXygen is not just answering a common question: is the TEI element for a notes statement notesStmt or noteStmt? It is also requiring that the user enter only one of three elements allowed by the schema (or a comment, etc.).
Inserting an element in oXygen
By using an editor that understands the schema, an encoder can avoid making common mistakes (like misspelling an element name) at the time @@ -811,8 +811,8 @@ would ignore it, would not cause a problem if it were left.

Either way, in order to avoid this potential maintenance nightmare, tei_customization.odd is not a static file, but rather is - generated by running an XSLT program that reads as its input the source + generated by running an XSLT program that reads as its input the source to TEI P5

Remember, the TEI Guidelines are written in TEI. The source to all of P5 is a single TEI document, although for convenience it is split into well over 850 separate files.

and writes @@ -905,11 +905,11 @@
How to Get it and Use it -

The The XSLT program used to generate tei_customization.odd can be found in the TEI GitHub repository. It is currently called TEI-to-tei_customization.xslt. The generated tei_customization ODD file and the schemas generated from it can @@ -921,14 +921,14 @@ target="http://www.tei-c.org/Vault/P5/3.3.0/xml/tei/custom/schema/relaxng/tei_customization.rnc" />.

Furthermore, the current version of tei_customization is available - from within oXygen as part of the TEI + from within oXygen as part of the TEI oXygen framework. However, the RELAX NG schema (tei_customization.rng or tei_customization.rnc) has the behavior discussed in . While this is not a bug or broken in any way, it is likely to be confusing and problematic for most users of oXygen. The + xml:id="R12" target="softw:oxygen"/>oXygen. The TEI_Council is interested in finding a way around this difficulty.

@@ -1167,8 +1167,8 @@ customization that she has to do this. The consequences of failing to turn it off are severe, though: although completion pop-up boxes still work, validation (both automatic validation as you type and static validation, for example ⌘-⇧-V) completely - stops working. Furthermore, oXygen leaves this feature on by default for a + stops working. Furthermore, oXygen leaves this feature on by default for a reason. Even though ID/IDREF checking itself is of almost no use to a user working with TEI P5 documents,

Because P5 does not use the ID/IDREF mechanism, the only one of the three added constraints that is useful is (2), that the value @@ -1182,7 +1182,7 @@ several possible solutions to this problem, each of which has its drawbacks. The TEI Council will hopefully implement one of them soon, making use of tei_customization from the oXygen + target="softw:teioxygenframework"/>oXygen framework much less problematic.

diff --git a/data/JTEI/12_2019-20/jtei-cc-ra-flanders-176-source.xml b/data/JTEI/12_2019-20/jtei-cc-ra-flanders-176-source.xml index 5bd9b90..4ada771 100644 --- a/data/JTEI/12_2019-20/jtei-cc-ra-flanders-176-source.xml +++ b/data/JTEI/12_2019-20/jtei-cc-ra-flanders-176-source.xml @@ -141,7 +141,7 @@ pedagogy TAPAS - XSLT validation @@ -159,16 +159,16 @@
-

This paper focuses on recent work by the TEI Archiving, Preservation, and Access Service +

This paper focuses on recent work by the TEI Archiving, Preservation, and Access Service (TAPAS) on pedagogy with TEI, and specifically on a recent initiative called - TAPAS Classroom focused on exploring pedagogical uses of - TEI. We provide some background on TAPAS, describe several case studies involving - pedagogical partners who used TAPAS in teaching, and finally describe the TAPAS + TEI. We provide some background on TAPAS, describe several case studies involving + pedagogical partners who used TAPAS in teaching, and finally describe the TAPAS Classroom initiative and its outcomes.

@@ -176,8 +176,8 @@
Introduction and Background

TAPAS

TAPAS

.

originated with the needs of humanities scholars and specifically with the rise (in the first decade of the twenty-first century) of scholarly interest in text encoding as a core humanities method. With expertise supported by the proliferation @@ -188,26 +188,26 @@ (Flanders and Hamlin 2013), infrastructure for TEI publication was at that time (and still remains to some extent) challenging for individual scholars and small projects, because of the costs and logistics of maintaining - servers, developing and supporting XSLT stylesheets, and maintaining technical expertise + servers, developing and supporting XSLT stylesheets, and maintaining technical expertise for troubleshooting and longer term support. TAPAS was developed to provide an + target="softw:tapas"/>TAPAS was developed to provide an alternative in the form of a web-based service for publishing TEI data. It offers a growing infrastructure of TEI publishing tools, a publishing venue that highlights the value of using TEI, and a long-term guarantee of visibility and access to the TEI data. - TAPAS was originally hosted at Brown University and is now hosted at Northeastern University, which also provides a guarantee of long-term repository support for TAPAS - data. TAPAS + data. TAPAS has been generously funded by the TEI Consortium and by an initial planning grant from the Institute for Museum and Library Services, a digital humanities startup grant from the NEH, a research and development grant from the NEH, and most recently a digital humanities advancement grant which has supported TAPAS Classroom.

+ target="softw:tapas"/>TAPAS Classroom.

- TAPAS Classroom and TEI Pedagogy

What do we envision when we think of TEI in the classroom? Three scenarios are particularly prominent. The first involves students contributing to @@ -263,10 +263,10 @@ Maybe Maybe - During the development of TAPAS Classroom, several scholars were already - experimenting with using TAPAS pedagogically. The following case studies + During the development of TAPAS Classroom, several scholars were already + experimenting with using TAPAS pedagogically. The following case studies describe how these different scenarios played out in practice.

Scenario 1: A Course Designed Around Student Participation in a Faculty Research @@ -274,7 +274,7 @@

Karen Bourrier, Digital Dinah Craik ()

Karen Bourrier is a faculty member at the University of Calgary and has been using TAPAS for her project, Digital Dinah Craik; a presentation at TEI 2017 by her research assistant and project editor Kailey Fukushima described the project’s work in detail. This project includes close to four hundred TEI @@ -282,20 +282,20 @@ group of undergraduate and graduate students who typically have no TEI experience when they start work, and who transcribe, encode, and add enhanced markup to each text as it proceeds through the project’s workflow. They are trained to encode in TEI/XML (using - the Oxygen XML editor) and to use TAPAS once a completed text is ready to + the Oxygen XML editor) and to use TAPAS once a completed text is ready to publish. Professor Bourrier’s pedagogical objectives for the course include professionalization of students through involvement in long-term digital projects, and engaging students in collaborative relationships with faculty and scholars in their - field. (Kailey Fukushima reported that TAPAS publication constituted a useful addition + field. (Kailey Fukushima reported that TAPAS publication constituted a useful addition to students’ CVs.) Practically speaking, because of the students’ lack of prior experience with TEI, it was important to have easy-to-use tools so that students could focus on concepts and gain fluency, thereby streamlining the process of training students to work on the project.

Professor Bourrier’s pedagogical experimentation with TAPAS yielded several specific + target="softw:tapas"/>TAPAS yielded several specific design considerations. First, a close linkage between process and outcome is valuable in motivating students and focusing their effort, so a workflow that permits immediate publication (and in which the concept of publication and the @@ -317,7 +317,7 @@ schedule at )

Mackenzie Brooks is an assistant professor and digital humanities librarian at Washington and Lee University, who used TAPAS in two iterations of a + target="softw:tapas"/>TAPAS in two iterations of a one-credit lab course on scholarly text encoding with TEI, accompanying a course in French literature taught by Professor Stephen P. McCormick. The course covered the origins and theoretical orientations of text markup, document @@ -333,8 +333,8 @@ look like. Framing the digital component as a one-credit lab course, taught by library faculty, offered a way to integrate digital approaches more flexibly into the curriculum without creating a burden for disciplinary faculty. In this - context, a platform like TAPAS enables the instructors to focus on the + context, a platform like TAPAS enables the instructors to focus on the challenges of teaching text encoding and to reduce the technical overhead for both students and instructors so that questions relating to editorial theory and practice and literary interpretation can come to the fore.

@@ -345,21 +345,21 @@ Levy at Simon Fraser University and focused on creating a digital edition of Wordsworth’s and Coleridge’s Lyrical Ballads. In this case, all of the students collaborated on a single edition rather than working on their own. TAPAS was one of a range of tools the class experimented with, along with - visualization tools like Tableau, publication tools like Islandora, and - text analysis with Tableau, publication tools like Islandora, and + text analysis with R. Part of the design of the course was thus to experiment with the encoded TEI data through a plurality of tools and to produce a plurality of outcomes. This pedagogical orientation reinforces one of the core tenets (and challenges) of TAPAS’s design, which is to avoid creating orthodoxies in TEI encoding arising from display outcomes that appear authoritative. The diversification of view packages in - TAPAS, and TAPAS’s role as one of many possible venues for + TAPAS, and TAPAS’s role as one of many possible venues for publishing and displaying TEI data, can help encourage students of TEI to focus on the informational strength of their encoding rather than solely on the appearance of the output.

@@ -369,7 +369,7 @@ experimenting with building a program of digital humanities pedagogy using TEI. In spring 2016 she taught a course on Digital Editing, in which her ten undergraduate students created individual digital TEI editions which were published in TAPAS as a private collection visible only to members of the project. The course was aimed at English majors with no encoding background, with a central learning goal of developing students’ skill in XML and TEI. It also attracted students from History and @@ -384,7 +384,7 @@ cultural meaning. Isbell noted that the ability to see their work realized as a readable edition was a crucial motivator for students in pushing through the process of learning and debugging their TEI/XML encoding. Because TAPAS was in beta when Isbell + target="softw:tapas"/>TAPAS was in beta when Isbell ran the course, she resourcefully integrated bug discovery and reporting explicitly into the course, so that students had some exposure to the processes of digital humanities tool development and had a sense of participation in that process—an unintended but very @@ -402,10 +402,10 @@ Digital Scholarship Librarian at Boston College. For this project, Calhoun encoded a proof-of-concept segment of a nineteenth-century dictionary, together with proof-of-concept authority lists for persons, organizations, and historical places. TAPAS served as the data repository for the encoded files, which were also published through a project web site that included an interactive map using CartoDB.

For a more detailed account of the project, see Kijas (2016).

In addition to the outcomes for Calhoun’s own degree program and professional development, Kijas had some @@ -414,14 +414,14 @@ research, at both undergraduate and graduate level, and to introduce and support TEI as part of an institution-level initiative.

Although supporting student-led research was not on the TAPAS project’s + xml:id="R29" target="softw:tapas"/>TAPAS project’s radar as a very likely usage scenario during the initial planning and design process for - TAPAS Classroom, it is an important outcome that underscores the potential - benefits of having access to TAPAS at the institutional level (i.e., as a benefit + benefits of having access to TAPAS at the institutional level (i.e., as a benefit of TEI institutional membership). From the perspective of the library, one important - feature of the TAPAS service is that it provides a low-cost but durable infrastructure easy for novices to master. From the perspective of the student researcher, the important features are the longevity and stability of the data (enabling @@ -431,20 +431,20 @@

Goals and Desiderata -

Based on these experiments, the TAPAS development team set several important design - goals for the Based on these experiments, the TAPAS development team set several important design + goals for the TAPAS Classroom project, in supporting pedagogy with TEI: Make the process easier: For faculty who teach with TEI (either for its own sake or as part of a larger course in digital humanities or humanities), some - supporting infrastructure is needed. TAPAS can ideally streamline or eliminate the + supporting infrastructure is needed. TAPAS can ideally streamline or eliminate the process of getting access to shared server space, setting up student accounts, and ensuring that data is stored safely and can be accessed after the course is over. These functions are valuable for instructors at all levels of technical expertise but are particularly enabling for instructors familiar with TEI but not with its - supporting technologies (XSLT, XML databases, web publishing + supporting technologies (XSLT, XML databases, web publishing frameworks). Expose and demystify: The tool experimentation we saw in the Lyrical Ballads course and in Mary Isbell’s Digital Editing course @@ -452,69 +452,69 @@ publication apparatus as such, and to look under the surface: for instance by offering different stylesheets and viewing options and showing that the formatted display of TEI is independent of the underlying encoding. TAPAS can offer a multiplicity + target="softw:tapas"/>TAPAS can offer a multiplicity of viewing options and an interface that makes it easy to move between them, enabling students to make direct comparisons between different ways of handling the data and offering a clear causal view in which changes to the data have a visible effect on the output (thereby motivating students to experiment with the markup in greater detail). Support the encoding: In these scenarios, TAPAS was used as + xml:id="R38" target="softw:tapas"/>TAPAS was used as an end-point and a publication venue, but it is clear that there is pedagogical value in user-friendly TEI tools earlier in the classroom narrative as well. Mary Isbell, for instance, noted the potential value of student-level error messages, and Karen Bourrier observed how students learned from their encounter with the workflow - management aspects of TAPAS. Through user-friendly validation, + management aspects of TAPAS. Through user-friendly validation, interactive ways to inspect the TEI encoding, and other ways for students to explore their encoding in progress from many perspectives, TAPAS has potential value as a + target="softw:tapas"/>TAPAS has potential value as a source of approachable information that can inform the encoding process. - Under Under TAPAS Classroom, several major new pieces of TAPAS were developed + xml:id="R42" target="softw:tapas"/>TAPAS were developed that respond to these desiderata. First, we developed new stylesheets for viewing TEI files, aimed specifically at pedagogical uses; this grant also gave us an opportunity to develop a fully modular system for managing these view packages (about which more below). Second, we re-examined the ways in which validation functions in - TAPAS, and began developing tools that present validation to students in a specifically pedagogical context rather than as part of TAPAS’s own data management + target="softw:tapas"/>TAPAS’s own data management systems. Third, we developed a set of sample and template files to serve as starting points for students, recognizing that in short courses where the goal is to get students quickly oriented in a new skill (as for instance in the Crompton example described above), it could be helpful to have some pre-established starting points that are well contextualized and self-explanatory. And finally, we developed a community - workshops space within TAPAS that simplifies the process of setting up a + workshops space within TAPAS that simplifies the process of setting up a teaching environment, aimed particularly at instructors teaching shorter workshops that may be repeated over time. In what follows, we focus on two components in particular: the view packages and our initial forays into thinking about validation.

View Packages -

The display of TEI files through the TAPAS interface is handled—as with almost all - modern web display of TEI data—through XSLT stylesheets, and indeed one of The display of TEI files through the TAPAS interface is handled—as with almost all + modern web display of TEI data—through XSLT stylesheets, and indeed one of TAPAS’s most important functions is to enable users to transform and view their TEI - data without having to write or run XSLT on their own. TAPAS’s XSLT stylesheets do + data without having to write or run XSLT on their own. TAPAS’s XSLT stylesheets do not operate in isolation but as part of a more complex view package that includes several distinct components: - One or more One or more XSLT stylesheets which transform the source TEI data into another format (typically XHTML but potentially JSON or other formats that are particularly suited to some viewing application) A CSS stylesheet that provides styling information - Optional JavaScript code to produce features of user + Optional JavaScript code to produce features of user interactivity (such as mouse-overs or selection of specific viewing options) - Optional Optional XProc code which handles the sequential processing of the data by the individual components of the view package Optional RELAX NG or ISO Schematron files that formalize any specific data @@ -522,7 +522,7 @@ a bibliography might include a test to ascertain that the file contains a listBibl) Metadata about the view package that supports its use within the TAPAS system Supporting documentation of the view package, including a manifest that describes its components and the expectations concerning the kinds of data for which it is best @@ -532,22 +532,22 @@ that are appropriate for that scenario. For instance, a view package might focus on the display of heavily revised manuscripts and might offer readers the ability to view successive revision stages or choose which revisers’ work to make visible. (This interface - is not currently available in TAPAS but would be a great future project.) In this + is not currently available in TAPAS but would be a great future project.) In this way, view packages serve as reading environments rather than as individual stylesheets, - and they enable TAPAS to group together multiple viewing options that complement one another within a specific usage scenario, such as the ability to alternate between a normalized and a diplomatic view of a manuscript source. All view packages are maintained - on GitHub, and a future goal for TAPAS is to use GitHub as a collaborative + on GitHub, and a future goal for TAPAS is to use GitHub as a collaborative tool—enabling community members to propose and develop new view packages for inclusion in - TAPAS.

-

From the user perspective, TAPAS contributors can specify a default view package +

From the user perspective, TAPAS contributors can specify a default view package for a given record or collection, so that when a reader visits that material their initial view of the TEI data reflects the contributor’s choice. For instance, the creator of a collection of historic letters (such as the <ref @@ -555,16 +555,16 @@ choose a view package that handles manuscript features well, while the creator of a teaching collection aimed at sparking discussion of XML visualization might instead choose the Hieractivity view package (described below) as the default. However, whatever the - default setting is, <ptr type="software" xml:id="R60" target="#tapas"/><rs - type="soft.name" ref="#R60">TAPAS</rs> readers can select any view package option in the + default setting is, <ptr type="software" xml:id="R60" target="softw:tapas"/><rs + type="cit:soft.name" ref="#R60">TAPAS</rs> readers can select any view package option in the reading interface (even if it goes against the grain of the data), and this may offer readers additional insight into the TEI data that is shared via <ptr type="software" - xml:id="R61" target="#tapas"/><rs type="soft.name" ref="#R61">TAPAS</rs>.</p> + xml:id="R61" target="softw:tapas"/><rs type="cit:soft.name" ref="#R61">TAPAS</rs>.</p> <div xml:id="generic"> - <head><ptr type="software" xml:id="R62" target="#tapas"/><rs type="soft.name" ref="#R62" + <head><ptr type="software" xml:id="R62" target="softw:tapas"/><rs type="cit:soft.name" ref="#R62" >TAPAS</rs> Generic View Package</head> - <p>The basic goal of the <ptr type="software" xml:id="R63" target="#tapas"/><rs - type="soft.name" ref="#R63">TAPAS</rs> Generic view package is to handle almost any + <p>The basic goal of the <ptr type="software" xml:id="R63" target="softw:tapas"/><rs + type="cit:soft.name" ref="#R63">TAPAS</rs> Generic view package is to handle almost any TEI markup in a moderately functional way. It is not intended as a high-function or specialized display; the term <soCalled>generic</soCalled> signals the fact that it displays TEI data in a general-purpose manner and can serve as the default stylesheet @@ -608,61 +608,61 @@ </div> <div xml:id="other"> <head>Other View Packages</head> - <p><ptr type="software" xml:id="R64" target="#tapas"/><rs type="soft.name" ref="#R64" + <p><ptr type="software" xml:id="R64" target="softw:tapas"/><rs type="cit:soft.name" ref="#R64" >TAPAS</rs> currently also offers two other view packages: an XML view package which shows the XML code and a view that uses the open-source <ptr type="software" - xml:id="R98" target="#teiboilerplate"/><rs type="soft.name soft.url" ref="#R98"><ref + xml:id="R98" target="softw:teiboilerplate"/><rs type="cit:soft.name soft.url" ref="#R98"><ref target="http://dcl.ils.indiana.edu/teibp/index.html">TEI Boilerplate</ref></rs> stylesheets.<note><p>TEI Boilerplate: <ptr target="http://dcl.ils.indiana.edu/teibp/index.html"/>. See <rs - type="soft.bib.ref" ref="#R94"><ref target="#walsh2013" type="bibl">Walsh and + type="cit:soft.bib.ref" ref="#R94"><ref target="#walsh2013" type="bibl">Walsh and Simpson (2013)</ref></rs>.</p></note> The XML view is not remarkable in itself, but it is pedagogically useful within the overall ecology of the view packages because it allows students to see the direct connection between the markup and the different outcomes in the other view packages. It is also helpful for troubleshooting and for comparing results with other students. The <ptr type="software" xml:id="R95" - target="#teiboilerplate"/><rs type="soft.name" ref="#R95">TEI Boilerplate</rs> view + target="softw:teiboilerplate"/><rs type="cit:soft.name" ref="#R95">TEI Boilerplate</rs> view package was originally included because it offered an easy-to-install viewing option - early in the <ptr type="software" xml:id="R65" target="#tapas"/><rs type="soft.name" + early in the <ptr type="software" xml:id="R65" target="softw:tapas"/><rs type="cit:soft.name" ref="#R65">TAPAS</rs> development process, but it has shown its value in other ways as well: it offers several proof-of-concept viewing options that help demonstrate the versatility of display through simple changes in CSS, and in the future it might offer a - way for <ptr type="software" xml:id="R66" target="#tapas"/><rs type="soft.name" + way for <ptr type="software" xml:id="R66" target="softw:tapas"/><rs type="cit:soft.name" ref="#R66">TAPAS</rs> to accommodate user-contributed CSS and control over document display by that means. It also connects <ptr type="software" xml:id="R67" - target="#tapas"/><rs type="soft.name" ref="#R67">TAPAS</rs> to other strands of + target="softw:tapas"/><rs type="cit:soft.name" ref="#R67">TAPAS</rs> to other strands of research and development for TEI publication.</p> </div> </div> <div xml:id="validation"> <head>Validation</head> - <p>Validation was a key outcome for <ptr type="software" xml:id="R68" target="#tapas"/><rs - type="soft.name" ref="#R68">TAPAS</rs> Classroom and proved more challenging than we + <p>Validation was a key outcome for <ptr type="software" xml:id="R68" target="softw:tapas"/><rs + type="cit:soft.name" ref="#R68">TAPAS</rs> Classroom and proved more challenging than we anticipated, for reasons that have to do with the way <ptr type="software" xml:id="R69" - target="#tapas"/><rs type="soft.name" ref="#R69">TAPAS</rs> data is ingested into the - <ptr type="software" xml:id="R96" target="#fedora"/><rs type="soft.name" ref="#R96" + target="softw:tapas"/><rs type="cit:soft.name" ref="#R69">TAPAS</rs> data is ingested into the + <ptr type="software" xml:id="R96" target="softw:fedora"/><rs type="cit:soft.name" ref="#R96" >Fedora</rs> repository component and how it is handled once ingested. There are three basic functions validation can potentially play in <ptr type="software" xml:id="R70" - target="#tapas"/><rs type="soft.name" ref="#R70">TAPAS</rs>: <list> + target="softw:tapas"/><rs type="cit:soft.name" ref="#R70">TAPAS</rs>: <list> <item>To test files upon ingestion to ensure they are XML documents, and specifically, TEI; this is a gatekeeping function with a simple yes or no answer. <ptr - type="software" xml:id="R71" target="#tapas"/><rs type="soft.name" ref="#R71" + type="software" xml:id="R71" target="softw:tapas"/><rs type="cit:soft.name" ref="#R71" >TAPAS</rs> currently uses validation in this way.</item> <item>To provide users with a more detailed profile of their data once it gets into <ptr - type="software" xml:id="R72" target="#tapas"/><rs type="soft.name" ref="#R72" + type="software" xml:id="R72" target="softw:tapas"/><rs type="cit:soft.name" ref="#R72" >TAPAS</rs>. This function would go beyond the simple gatekeeping function to give users information about where their TEI data matches or fails to match a given schema. The first step in this detailed profiling would be to provide a more detailed validation report based on validation against the <ptr type="software" xml:id="R73" - target="#tapas"/><rs type="soft.name" ref="#R73">TAPAS</rs> schema (currently + target="softw:tapas"/><rs type="cit:soft.name" ref="#R73">TAPAS</rs> schema (currently <ident>tei_all</ident>), but a further (and extremely valuable) step would be to support this with more detailed validation reporting against user-supplied schemas. - <ptr type="software" xml:id="R74" target="#tapas"/><rs type="soft.name" ref="#R74" + <ptr type="software" xml:id="R74" target="softw:tapas"/><rs type="cit:soft.name" ref="#R74" >TAPAS</rs> is currently developing a validation reporting feature, described in more detail below.</item> <item>To test a given file or set of files against a specific functional scenario. For - instance, in the context of <ptr type="software" xml:id="R75" target="#tapas"/><rs - type="soft.name" ref="#R75">TAPAS</rs> view packages, validation could be used to + instance, in the context of <ptr type="software" xml:id="R75" target="softw:tapas"/><rs + type="cit:soft.name" ref="#R75">TAPAS</rs> view packages, validation could be used to discover and report on whether a given TEI file will work appropriately within a given view package, and if not, what is missing from the encoding.</item> </list> For pedagogical purposes, validation serves a very specific set of needs. First, @@ -679,20 +679,20 @@ settings, the schema may be accorded the status of a standard to be achieved, while in others it may function in more complex and dialogic ways as a hypothesis about texts to be tested and revised.</p> - <p>To support these pedagogical functions, <ptr type="software" xml:id="R76" target="#tapas" - /><rs type="soft.name" ref="#R76">TAPAS</rs> plans to treat validation in a somewhat + <p>To support these pedagogical functions, <ptr type="software" xml:id="R76" target="softw:tapas" + /><rs type="cit:soft.name" ref="#R76">TAPAS</rs> plans to treat validation in a somewhat novel way, using the view package technology to situate validation as a <soCalled>view</soCalled> of the document itself through the lens of a schema. Technically, the process involves: <list> - <item>running a validation process in <ptr type="software" xml:id="R97" target="#xproc" - /><rs type="soft.name" ref="#R97">XProc</rs></item> - <item>running an <ptr type="software" xml:id="R77" target="#xslt"/><rs type="soft.name" + <item>running a validation process in <ptr type="software" xml:id="R97" target="softw:xproc" + /><rs type="cit:soft.name" ref="#R97">XProc</rs></item> + <item>running an <ptr type="software" xml:id="R77" target="softw:xslt"/><rs type="cit:soft.name" ref="#R77">XSLT</rs> stylesheet that integrates the validation output with the TEI document</item> - <item>running an <ptr type="software" xml:id="R78" target="#xslt"/><rs type="soft.name" + <item>running an <ptr type="software" xml:id="R78" target="softw:xslt"/><rs type="cit:soft.name" ref="#R78">XSLT</rs> stylesheet that transforms the whole thing into HTML for - display (with <ptr type="software" xml:id="R79" target="#javascript"/><rs - type="soft.name" ref="#R79">JavaScript</rs> as needed to support interaction such as + display (with <ptr type="software" xml:id="R79" target="softw:javascript"/><rs + type="cit:soft.name" ref="#R79">JavaScript</rs> as needed to support interaction such as navigation between the error report and the document itself).</item> </list> Along the way, the process substitutes error messages that are more intelligible to novices and also more reassuring and detailed, and collapses multiple occurrences of a @@ -700,7 +700,7 @@ less as an authoritative and fearsome report of failure and more as an informational report that describes the relationship between the document and the schema that has been invoked. An initial version of the validation view package was developed for <ptr - type="software" xml:id="R80" target="#tapas"/><rs type="soft.name" ref="#R80">TAPAS</rs> + type="software" xml:id="R80" target="softw:tapas"/><rs type="cit:soft.name" ref="#R80">TAPAS</rs> Classroom and is scheduled for release in 2020.</p> </div> <div xml:id="conclusion"> @@ -709,20 +709,20 @@ TEI plays in a particular classroom setting. In some cases, the goal may be to learn TEI and to excite students about its potential as a digital humanities tool. In other cases, the goal may be to use TEI as a way of framing a conversation about digital editing, data - modeling, or data visualization. <ptr type="software" xml:id="R81" target="#tapas"/><rs - type="soft.name" ref="#R81">TAPAS</rs> (and the features developed through the <ptr - type="software" xml:id="R82" target="#tapas"/><rs type="soft.name" ref="#R82">TAPAS</rs> + modeling, or data visualization. <ptr type="software" xml:id="R81" target="softw:tapas"/><rs + type="cit:soft.name" ref="#R81">TAPAS</rs> (and the features developed through the <ptr + type="software" xml:id="R82" target="softw:tapas"/><rs type="cit:soft.name" ref="#R82">TAPAS</rs> Classroom project) offers a platform for TEI display and publication that allows the complexities of TEI encoding to take a back seat—through encoding templates and a simple upload and publication workflow—in cases where expertise in TEI is not the main goal of the class. But at the same time, it offers the ability to look beneath the surface of those systems at any time, to gain greater insight into the encoding, and to explore what happens if the code or the display choices are changed. <ptr type="software" xml:id="R83" - target="#tapas"/><rs type="soft.name" ref="#R83">TAPAS</rs> also supports varied roles + target="softw:tapas"/><rs type="cit:soft.name" ref="#R83">TAPAS</rs> also supports varied roles for students that give them different forms of scaffolded authority: as contributors to real-world publications, as creators of their own projects, as collaborators and - experimenters in public spaces. <ptr type="software" xml:id="R84" target="#tapas"/><rs - type="soft.name" ref="#R84">TAPAS</rs> thus supports a vision of digital humanities + experimenters in public spaces. <ptr type="software" xml:id="R84" target="softw:tapas"/><rs + type="cit:soft.name" ref="#R84">TAPAS</rs> thus supports a vision of digital humanities pedagogy that is experiential, experimental, and collaborative.</p> </div> </body> @@ -735,9 +735,9 @@ <biblScope unit="volume">24</biblScope>: <biblScope unit="page">467–481</biblScope>. <ptr target="https://doi.org/10.1080/10691316.2017.1326331"/>; <ptr target="https://repository.wlu.edu/handle/11021/33912"/>.</bibl> - <bibl xml:id="flanders2013"><ptr type="software" xml:id="R86" target="#tapas"/><rs - type="soft.bib.ref" ref="#R86"><author>Flanders, Julia</author>, and <author>Scott - Hamlin</author>. <date>2013</date>. <title level="a"><rs type="soft.name" ref="#R86" + <bibl xml:id="flanders2013"><ptr type="software" xml:id="R86" target="softw:tapas"/><rs + type="cit:soft.bib.ref" ref="#R86"><author>Flanders, Julia</author>, and <author>Scott + Hamlin</author>. <date>2013</date>. <title level="a"><rs type="cit:soft.name" ref="#R86" >TAPAS</rs>: Building a TEI Publishing and Repository Service. Journal of the Text Encoding Initiative 5. . . - Walsh, John, and Grant Leyton - Simpson. 2013. <rs type="soft.name" + <bibl xml:id="walsh2013"><ptr type="software" xml:id="R85" target="softw:teiboilerplate"/><rs + type="cit:soft.bib.ref" ref="#R85"><author>Walsh, John</author>, and <author>Grant Leyton + Simpson</author>. <date>2013</date>. <title level="a"><rs type="cit:soft.name" ref="#R85">TEI Boilerplate</rs>. Journal of Digital Humanities 2 (3). rdg or lem elements in the apparatus. Those needs are met by the - TEI - Critical Apparatus Toolbox, which she created.TEI + Critical Apparatus Toolbox, which she created.Marjorie Burkhart et al., TEI Critical Apparatus Toolbox, accessed June 3, 2021, . I would add another prior and @@ -396,15 +396,15 @@ described as the main difficulty confronted by TEI users, partly because of the lack of user-friendly tools (Burghart and Rehbein 2012). One can find tools to annotate images and transcribe sources,Such - tools include Transkribus (from READ-COOP SCE, accessed June 3, 2021, Transkribus (from READ-COOP SCE, accessed June 3, 2021, ) and Archetype - (previously called the DigiPal framework, now supported by King’s Digital Laboratory, King’s College London, - accessed June 3, 2021, Archetype + (previously called the DigiPal framework, now supported by King’s Digital Laboratory, King’s College London, + accessed June 3, 2021, ). but these tools do not enable the scholar to encode variants and the critical apparatus as specified in the TEI Guidelines. They are therefore purely document-centered tools, and we have seen above @@ -412,27 +412,27 @@ text-oriented digital scholarly editions designed to be critical. Collation of several manuscripts can also produce an automatically encoded critical apparatus using the TEI parallel segmentation method, by means of CollateX or CollateX or Juxta software, provided one already has the separate transcriptions at one’s - disposal.Interedition Development Group, + disposal.Interedition Development Group, CollateX – Software for Collating Textual Sources, accessed June 4, 2021, . For a history + type="cit:soft.url" ref="#R10">. For a history of collation tools and especially of the developments leading to CollateX, see + xml:id="R12" target="softw:collatex"/>CollateX, see Nury and Spadini (2020). Juxta is no + xml:id="R13" target="softw:juxta"/>Juxta is no longer maintained; it has been replaced by two programs both developed by Performant Software Solutions: Faircopy (accessed June 4, 2021, Performant Software Solutions: Faircopy (accessed June 4, 2021, ), a transcription interface producing - TEI encoding, and TextLab (accessed June 4, 2021, TextLab (accessed June 4, 2021, ), which seems to be more focused on genetic editing. However, a legacy version of Juxta, accessed - June 4, 2021, can be found at Juxta, accessed + June 4, 2021, can be found at . Roeder (2020) provides a survey of three web-based tools offering automated collation. As such, the result is not a proper @@ -445,30 +445,30 @@ encoding. So the second part of the project consists in creating a panel of tools for the encoding of ancient textual sources in TEI XML. These tools will enhance existing software, and not create it from scratch.

-

The tools include The tools include XSLT stylesheets that convert styled Word or LibreOffice documents to TEI XML.Sample transformations can already be made using OxGarage - (accessed June 4, 2021, OxGarage + (accessed June 4, 2021, ), and XSLT stylesheets can - be found in the TEI Consortium’s GitHub repository, accessed June 4, 2021,XSLT stylesheets can + be found in the TEI Consortium’s GitHub repository, accessed June 4, 2021,. Pre-encoding a text in a word processor can be useful, but frequently a deeper level of encoding is needed, which is difficult to reach working only on the text document. So a second category of tools is a series of frameworks or encoding environments made through customization of two widely used XML editors: XMLmind XML - Editor (XXE),Accessed June 4, 2021, XMLmind XML + Editor (XXE),Accessed June 4, 2021, . Customization of this editor is a central part of the work of the Document numérique team of Maison de la Recherche en Sciences Humaines (MRSH) de Caen (see the Métopes workflow, accessed August 14, 2021, ). XMLmind XML Editor was + target="softw:xmlmind"/>XMLmind XML Editor was used in the team’s collaboration with the Biblissima Equipex (Equipment of Excellence project funded by the French National Research Agency – ANR), for which an encoding environment for ancient library catalogues was issued: Digital @@ -480,19 +480,19 @@ stylesheets, and to automate and speed up the encoding tasks with custom commands. Oxygen XML - Editor,Accessed June 4, 2021, Oxygen XML + Editor,Accessed June 4, 2021, . which supports right-to-left scripts, unlike the current version of XXE (however, + xml:id="R23" target="softw:xmlmind"/>XXE (however, recent steps have been made in this direction by the developers of XXE). A framework similar to XXE’s was developed for - XXE’s was developed for + Oxygen’s Author mode, to be used by projects focusing on Hebrew or Arabic sources.Since version 15.1, Oxygen has handled RTL + target="softw:oxygen"/>Oxygen has handled RTL scripts quite effectively in author mode (but not text mode). See discussions about this issue on the Epidoc and TEI listservs: Hugh Cayless, RTL texts, MARKUP mailing list @@ -534,9 +534,9 @@ helpful for scholars with limited previous knowledge of XML and TEI, who could be confused by the myriad possibilities of the TEI All schema. Nevertheless, the raw encoding can be verified at any moment by switching to the text mode in Oxygen or to the tree view in XXE. + target="softw:xmlmind"/>XXE.

A framework can be configured to offer guidance for the use of elements and attributes, @@ -550,11 +550,11 @@ In the encodingDesc element of the teiHeader, defining a list of values for the place attribute of the note - element (Oxygen framework).

-

In the In the Oxygen framework, inside the text element, the insertion of tags for which the typology of attribute values has been described in the teiHeader (and registered as available in drop-down menus) generates a @@ -563,7 +563,7 @@ In the corpus, choosing one of the defined values for the place attribute of the note element (Oxygen + xml:id="R32" target="softw:oxygen"/>Oxygen framework).

@@ -581,25 +581,25 @@ type="bibl">Burnard 2019).

The challenge in developing encoding frameworks is not so much technical, as they use features available in each application. Frameworks consist of command configuration and - CSS files, sometimes XSLT files. They provide an economic, versatile way to + CSS files, sometimes XSLT files. They provide an economic, versatile way to provide scholars with customized tools for digital scholarly editing.The Ediarum framework for Oxygen has been developed at the Ediarum framework for Oxygen has been developed at the Berlin-Brandenburg Academy of Sciences and Humanities to help scholars create digital editions of historical documents in TEI XML. However, this framework does not enable one to encode variant readings and is therefore not designed - for critical editing of medieval sources. See Mertgens (2019) and the Ediarum website: Ediarum website: . The Caen team Pôle Document numérique has since published a framework for critical editing in TEI Parallel Segmentation based on - XMLmind XML Editor software: see Éditer des sources avec - apparat critique, accessed June 4, 2021, , accessed June 4, 2021, . The challenge is rather about ontology and modeling.I am taking part in a collective effort between scholarly editors and scientific publishing @@ -640,9 +640,9 @@ whether in print or in digital form, can be supported by existing tools which the scholar can configure.See, for example, the Print an edition tool in the TEI Critical Apparatus + target="softw:teicat"/>TEI Critical Apparatus Toolbox by M. Burghart, in which the sample XSLT + xml:id="R4" target="softw:xslt"/>XSLT stylesheet to transform TEI XML into LaTeX, to produce a printable PDF with a traditional critical apparatus, can be downloaded and modified: . Burghart ( The working interface of the XMLmind framework. The + target="softw:xmlmind"/>XMLmind framework. The central pane shows the CSS-customized view of the file containing the central text. The left pane shows the—almost—raw code of the same file. The + command designed in the CSS to be available on the lemma (element term type="lemme") opens a transformed version of the glosses’ thesaurus and allows one to choose the appropriate gloss archetype toward which the target attribute must point (XMLmind framework). + target="softw:xmlmind"/>XMLmind framework). In the file containing the glosses (the glosses’ library or thesaurus), each @@ -773,7 +773,7 @@

Display of the glosses’ thesaurus (XMLmind + xml:id="R40" target="softw:xmlmind"/>XMLmind framework).
@@ -800,8 +800,8 @@
Display Possibilities

A publication prototype has been created using XSLT stylesheets and XSLT stylesheets and Javascript to produce interactive HTML pages. The current version allows one to read the reconstructed text of each manuscript; diff --git a/data/JTEI/13_2020-22/jtei-cc-ra-parisse-182-source.xml b/data/JTEI/13_2020-22/jtei-cc-ra-parisse-182-source.xml index abcc3ae..29aa0da 100644 --- a/data/JTEI/13_2020-22/jtei-cc-ra-parisse-182-source.xml +++ b/data/JTEI/13_2020-22/jtei-cc-ra-parisse-182-source.xml @@ -109,33 +109,33 @@ need to be interoperable and reusable in order to improve research on themes such as phonology, prosody, interaction, syntax, and textometry. To help researchers reach this goal, CORLI has designed a pair of tools: - TEICORPO to assist in the conversion and use of spoken - language corpora, and - TEIMETA for metadata purposes. - TEICORPO is based on the principle of an underlying + target="softw:teicorpo"/> + TEICORPO to assist in the conversion and use of spoken + language corpora, and + TEIMETA for metadata purposes. + TEICORPO is based on the principle of an underlying common format, namely TEI XML as described in its specification for spoken language use (ISO 2016). This tool enables the conversion of transcriptions created with alignment - software such as - CLAN, - Transcriber, - Praat, or ELAN as well as common file formats + software such as + CLAN, + Transcriber, + Praat, or ELAN as well as common file formats (CSV, XLSX, TXT, or DOCX) and the TEI format, which plays the role of a lossless pivot format. Backward conversion is possible in many cases, with limitations inherent in the - destination target format. - TEICORPO can run the - Treetagger part-of-speech tagger and the - Stanford CoreNLP tools on TEI files and can export + destination target format. + TEICORPO can run the + Treetagger part-of-speech tagger and the + Stanford CoreNLP tools on TEI files and can export the resulting files to textometric tools such as TXM, Le Trameur, or - TXM, Le Trameur, or + Iramuteq, making it suitable for spoken language corpora editing as well as for various research purposes.

@@ -203,8 +203,8 @@ limited coverage, even if the corpora involved are very large.

- The - TEICORPO Approach + The + TEICORPO Approach

The goal of the CORLI consortium is to make it easier to deposit, share, and reuse data. With this goal in mind, CORLI has always promoted the use of open public repositories and open formats. Our policy is to advocate for the use of a common single @@ -229,7 +229,7 @@ for the needs of linguists. This means that the goal of the project was not to create and promote new standards, but rather to emphasize the good practices of CORLI’s members and make data sharing with other researchers easier. The goal of the TEICORPO project can thus be summarized as:

creating a tool for conversion between the different pieces of software used for @@ -246,17 +246,17 @@ Similarities with and Differences from Other Approaches

Many software packages dedicated to editing spoken language transcription contain utilities that can convert many formats: for example, - EXMARaLDA ( + EXMARaLDA ( Schmidt 2004 - ; see ), - - Anvil ( - Kipp 2001; see ; see ), + + Anvil ( + Kipp 2001; see ), and ELAN - (Wittenburg et al. 2006; see + type="software" xml:id="R17" target="softw:elan"/>ELAN + (Wittenburg et al. 2006; see ). However, in all cases, the conversions are limited to the features implemented in the tool itself—for example, with a limited set of metadata—and they cannot always be used to prepare data to be used by @@ -270,62 +270,62 @@ TEI is used as a destination format.

The list of tools that are considered in the two projects is nearly the same. The only - tools missing in the - TEICORPO approach are EXMARaLDA and - FOLKER ( + TEICORPO approach are EXMARaLDA and + FOLKER (Schmidt and Schütte - 2010; see ; see ), but this was only because the - conversion tools from and to EXMARaLDA, - FOLKER, and TEI already exist. They are available - as EXMARaLDA, + FOLKER, and TEI already exist. They are available + as XSLT stylesheets in the open-source distribution of EXMARaLDA (EXMARaLDA (). The other common point is the use of the TEI format, and especially the more recent ISO version of TEI for spoken language (ISO/TEI; see ISO 2016). The TEI format produced by the EXMARaLDA and - - FOLKER software fit within the process chain of - + xml:id="R24" target="softw:EXMARaLDA"/>EXMARaLDA and + + FOLKER software fit within the process chain of + TEICORPO. This demonstrates the usefulness of a well-known and efficient format such as TEI.

There are, however, differences between the two projects that make them nonredundant but complementary, each project having specificities that can be useful or damaging depending on the user’s needs. One minor difference is that the - TEICORPO project is not a functionality of an + xml:id="R27" target="softw:teicorpo"/> + TEICORPO project is not a functionality of an editing tool, but is a standalone tool for converting data between one format and another. This had certain effects on the user interface and explains some of the choices made in the development of the two tools.

There are two major differences between - TEICORPO and Schmidt’s approach, which affected + target="softw:teicorpo"/> + TEICORPO and Schmidt’s approach, which affected both the design of the tools and how they can be used. The first difference is that in - developing TEICORPO, it was decided that the conversion between the original formats and TEI had to be lossless (or as lossless as possible) because we wanted to offer a means to store the research data for long-term conservation and dissemination in a standard XML format instead of in proprietary formats such as those used by - CLAN ( + CLAN (MacWhinney 2000; see ), ELAN, - Praat (), ELAN, + Praat (Boersma and van Heuven 2001; see ), and - - Transcriber (), and + + Transcriber (Barras et al. 2000; see and ). These proprietary formats are in XML or Unicode formats so that they can be conserved for the long term. However, they are not all well described or constrained, at least not in the @@ -335,28 +335,28 @@ safe to keep corpora in a format available only for a given tool that may disappear or fall into disuse.

The second major difference is that the - TEICORPO initiative does not target only spoken + target="softw:teicorpo"/> + TEICORPO initiative does not target only spoken language, but all types of annotation, including media of any type. This covers all spoken languages, vocal as well as sign languages, and also gesture and any type of - multimodal coding. The goal of - TEICORPO was not to advocate a linguistic mode of + multimodal coding. The goal of + TEICORPO was not to advocate a linguistic mode of coding spoken data as a transcription convention does, but rather to propose a research model for storing and sharing data about language and other modalities. Consequently, the focus of the work was not on how the spoken data were coded (i.e., the microstructure), nor on the standard that should be used for transcribing in - orthographic format. Instead, the - TEICORPO approach focused on how to integrate + orthographic format. Instead, the + TEICORPO approach focused on how to integrate multiple pieces of information into the TEI semantics (the macrostructure), as this is - possible with tools such as ELAN or - PRAAT. The goal was to be able to convert a file + possible with tools such as ELAN or + PRAAT. The goal was to be able to convert a file produced by these tools so that it can be saved in TEI format for long-term conservation.

-

Data in - PRAAT and ELAN formats can contain +

Data in + PRAAT and ELAN formats can contain information that is different from what is usually present in an ISO/TEI description, but that nonetheless remains within the structures authorized in the ISO/TEI. For example, the information is stored as described below in spanGrp, an element @@ -365,36 +365,36 @@ classical, we mean approaches based on an orthographic transcription represented as a list, as in the script of a play), it will be available for further processing by using the export features of - TEICORPO (see + TEICORPO (see and further below for export functionalities) but other types of information are also available. Compared to - PRAAT and ELAN, the integration of tools - such as - CLAN or - Transcriber was much more straightforward, as the + target="softw:praat"/> + PRAAT and ELAN, the integration of tools + such as + CLAN or + Transcriber was much more straightforward, as the organization of the files is less varied and more classical.

Choice of the Microstructure Representation

Processing of the microstructure, with the exception of information already available in the tools themselves (for example silence in - Transcriber), is not done during the conversion + target="softw:transcriber"/> + Transcriber), is not done during the conversion to TEI. The division into words or other elements such as morphemes or phonemes is not systematically done in any of the tools used by researchers in CORLI. When it exists, it is not included in the main transcription line but most often in dependent lines, as it represents an annotation with its own rules and guidelines. Division into words or other elements is part of the linguistic analysis rather than a simple storage operation.

-

- TEICORPO therefore preserves as long-term storage +

+ TEICORPO therefore preserves as long-term storage data both the original information that was created in the original software—the full unprocessed transcription—and the other linguistically processed transcriptions and - annotations. For - TEICORPO, microstructure processing, such as + annotations. For + TEICORPO, microstructure processing, such as division into words, or text standardization when necessary, belongs to the linguistic analysis of the corpora. Hence, the TEI data file can be used both for data exploration and for scientific purposes. For example, when a researcher needs to parse @@ -406,10 +406,10 @@

- The - TEICORPO Project -

The - TEICORPO project contains two different sets of + The + TEICORPO Project +

The + TEICORPO project contains two different sets of tools. One set focuses on conversion between various software packages used for spoken language coding and TEI. The other set focuses on using the TEI format for linguistic analyses (textometric or grammatical analyses).

@@ -424,25 +424,25 @@

Some common practices have been identified in our community but other uses of the same software are of course possible:

- - Transcriber is widely used in + + Transcriber is widely used in sociolinguistics; - - CLAN is widely used in language acquisition and + + CLAN is widely used in language acquisition and especially in the Talkbank project; Praat is more specialized for phonetic or phonological annotations; - ELAN is recommended for annotating video and particularly multimodality (for example, components such as gazes, gestures, and movements), and is often used for rare languages to describe the organization of the segments.

It should be pointed out here that whereas Transcriber and CLAN files nearly + xml:id="R193" target="softw:clan"/>CLAN files nearly always contain classical orthographic transcriptions, this is not - the case for Praat and ELAN files. As our goal is to provide a generic + the case for Praat and ELAN files. As our goal is to provide a generic solution for long-term conservation and use for any type of project, conversion of all types of files produced by the four tools cited above will be possible. It is up to the user to determine which part of a corpus can be used with a classical approach, which @@ -450,12 +450,12 @@

The list of tools reflects the uses and practices in the CORLI network, and is very similar to the list suggested by Schmidt (2011) with the exception of EXMARaLDA and - FOLKER. These two tools already have built-in + target="softw:exmaralda"/>EXMARaLDA and + FOLKER. These two tools already have built-in conversion features, so adding them to the - TEICORPO project would be easy at a later date.

+ target="softw:teicorpo"/> + TEICORPO project would be easy at a later date.

Alignment applications deal with two main types of data presentation and organization. The presentation of the data has direct consequences for how the data are exploited, and therefore on the design of the tools that are used.

@@ -477,46 +477,46 @@ the production within the same tier sorted by timeline.

No tool offers both types of presentation. ELAN offers some alternatives to + target="softw:elan"/>ELAN offers some alternatives to editing or displaying data with the partition format, but none of the existing tools offer full-fledged list format editing. It is possible to represent the two structures within a similar model, as demonstrated by Bird and Liberman (2001). However, this is not the case for the four tools listed above: each of them represents the data in a unique underlying data structure. - Transcriber and CLAN are organized in list format; Praat and ELAN have a + target="softw:praat"/>Praat and ELAN have a partition format.

Each presentation format has its own pros and cons. Because of the possibilities offered by the presentation formats, and because the same software, even within the same presentation models, rarely provides a solution for all the needs of all users, researchers often have to use two or more pieces of software.

The use of multiple tools is quite common. For example, - Praat and - Transcriber cannot be used when working on video + xml:id="R60" target="softw:praat"/> + Praat and + Transcriber cannot be used when working on video recordings because these programs are limited to audio formats. But if researchers need to conduct spectral analysis for some purpose, they will have to use the - Praat software and convert not only the + type="software" xml:id="R62" target="softw:praat"/> + Praat software and convert not only the transcription, but also the media. In the field of language acquisition, where the - CLAN software is generally used to describe both + type="software" xml:id="R63" target="softw:clan"/> + CLAN software is generally used to describe both the child productions and the adult productions, when researchers are interested in - gestures, they use the ELAN software, importing the CLAN file to add - gesture tiers, as ELAN software, importing the CLAN file to add + gesture tiers, as ELAN is more suitable for the fine-grained analysis of visual data. Another common practice consists in first doing a rapid transcription using only - orthographic annotations in Transcriber and then in a second stage + orthographic annotations in Transcriber and then in a second stage annotating some more interesting excerpts in greater detail including new information. In this case researchers will import the first transcription file into other tools such - as Praat or Praat or ELAN and annotate them partially. It is therefore necessary to import or export files in different formats if researchers need to use different tools for different parts of their work.

@@ -533,8 +533,8 @@ CORLI consortium and in collaboration with the ORTOLANG infrastructure to design a common tool that could be used by the whole linguistic community. The goal was to make open-source software with proper maintenance freely available on - + .

@@ -551,33 +551,33 @@
Basic Structures

Converting the metadata is straightforward, as the four tools ( - CLAN, ELAN, - Praat, and - Transcriber) do not enable a large amount of + xml:id="R68" target="softw:clan"/> + CLAN, ELAN, + Praat, and + Transcriber) do not enable a large amount of metadata to be edited. Most of the metadata available concerns the content of the sequence; some user metadata is also available, especially in - CLAN . The insertion of metadata follows the + xml:id="R72" target="softw:clan"/> + CLAN . The insertion of metadata follows the indications of the ISO/TEI 24624:2016 standard (ISO 2016).

Moreover, some tools, such as - Transcriber, include information about silences, + target="softw:transcriber"/> + Transcriber, include information about silences, pauses, and events in their XML format. This information is also processed within - TEICORPO, once again following the + type="software" xml:id="R73" target="softw:teicorpo"/> + TEICORPO, once again following the recommendations of the ISO/TEI standard.

Conversion of the main data, the transcription and the annotations, cannot always be done solely on the basis of the description provided in the ISO/TEI guidelines. These guidelines do, however, suffice to fully describe the content of the - CLAN and - Transcriber software. We took advantage of the + type="software" xml:id="R74" target="softw:clan"/> + CLAN and + Transcriber software. We took advantage of the new annotationBlock element, which codes several annotation levels, a function that is commonly required in spoken-language annotations.

The annotationBlock contains two major elements: the u element, @@ -587,8 +587,8 @@ elements as required. All span elements have the same type of content, as indicated in the parent spanGrp element. and provide an - example of conversion from a - CLAN file to illustrate how a production + example of conversion from a + CLAN file to illustrate how a production annotated on different levels (orthography, morphosyntax, dependencies) is represented in TEI with a first main utterance element u to which two spanGrps are linked, one for each annotation level, in our case one spanGrp for @@ -606,14 +606,14 @@ mor and gra attribute values represent grammatical knowledge. Using the content of these elements to produce advanced grammatical representation in more elaborate TEI and XML formats is of course possible, but would be a tailored task - which is beyond the scope of the - TEICORPO project.

+ which is beyond the scope of the + TEICORPO project.

*MOT: look at the tree ! 2263675_2265197 %mor: v|look prep|at det|the n|tree ! %gra: 1|0|ROOT 2|1|JCT 3|4|DET 4|2|POBJ 5|1|PUNCT - - CLAN representation of data (first three + + CLAN representation of data (first three lines)
@@ -646,22 +646,22 @@

Although the presentation described above can represent the data of many corpora and tools, a single-level annotation structure within the spanGrp elements is insufficient to represent the complex organization that can be constructed with the - ELAN and - Praat tools. ELAN is a tool used by many + ELAN and + Praat tools. ELAN is a tool used by many researchers to describe data of greater complexity than the data presented in the ISO/TEI guidelines. As the goal of the - TEICORPO project was to convert all types of + target="softw:teicorpo"/> + TEICORPO project was to convert all types of structure used in the spoken language community, including ELAN and - Praat, it was necessary to extend the description + xml:id="R82" target="softw:elan"/>ELAN and + Praat, it was necessary to extend the description method presented in .

-

In ELAN and - Praat, the multitiered annotations can be +

In ELAN and + Praat, the multitiered annotations can be organized in a structured manner. These tools take advantage of the partition presentation of the data, so that the relationship between a parent tier and a child tier can be precisely organized. There are two main types of organization: symbolic @@ -679,8 +679,8 @@ links.

- ELAN annotation with symbolic structures + ELAN annotation with symbolic structures

In temporal division, the association between the main tier and the dependent tiers @@ -688,8 +688,8 @@ is included in the parent if the starting and end points are within the time limits of the starting and end points of the parent tier. provides an example of such an organization created using - Praat. In this example, the tier at the top of + xml:id="R201" target="softw:praat"/> + Praat. In this example, the tier at the top of the representation contains phonemes that are included in the second tier, which contains syllables (for example, S and a are included in Sa). Then the syllables are included in turn in the word-level tier @@ -715,31 +715,31 @@ this type of data is produced by members of the CORLI consortium, it needs to be preserved. Encoding the data in TEI using a standard tool makes the process reproducible, which is one of the goals of - TEICORPO.

+ target="softw:teicorpo"/> + TEICORPO.

Although this type of data is not described in the ISO/TEI guidelines, it is in fact possible to store it in TEI format using current TEI features. TEI provides a general mechanism for storing hierarchically structured data by using the spanGrp and span mechanism. Moreover, the span and spanGrp tags have attributes that can point to other elements or to timelines. Using this coding schema, it is therefore possible to store any type of structure, symbolic and/or temporal, - that can be generated with ELAN or - PRAAT, as described above.

+ that can be generated with ELAN or + PRAAT, as described above.

To do this, each element which is in a symbolic or temporal relation is represented by a spanGrp element of the TEI. The spanGrp element contains as many span elements as necessary to store all the elements present in the ELAN or - PRAAT representation. The parent element of a + type="software" xml:id="R90" target="softw:elan"/>ELAN or + PRAAT representation. The parent element of a spanGrp is the main annotationBlock element when the division in - ELAN or PRAAT is the first division of a main element. The parent element is another span element when the division in ELAN or - PRAAT is a subdivision of another element which + target="softw:elan"/>ELAN or + PRAAT is a subdivision of another element which is not a main element. This XML structure is complemented by explicit information as allowed in TEI. The span elements are linked to the element they depend on, either with a symbolic link using the target attribute of the span @@ -748,7 +748,7 @@

Two examples of how this is displayed in a TEI file are given below. The first example (see and ) corresponds to the ELAN example above (see ELAN example above (see , ). The TEI encoding represents the words of the sentence from left to right (from gahwat to endi in our example). The detail of the @@ -760,8 +760,8 @@ and -DET.

- ELAN example of a symbolic division + ELAN example of a symbolic division
@@ -806,8 +806,8 @@

The second example is structured using time references. This example (see and ) corresponds to the - Praat example above (see ) corresponds to the + Praat example above (see , ). In this case, each part of the transcription is represented according to the timeline, but there is also a hierarchy which is represented by the spanGrp @@ -820,8 +820,8 @@ s37).

- ELAN example of a temporal division + ELAN example of a temporal division
@@ -849,19 +849,19 @@

The spanGrp and span offer a generic representation of data coming from relatively unconstrained representations produced by partition software. The - names of the tiers used in the ELAN and - Praat tools are given in the content of the + names of the tiers used in the ELAN and + Praat tools are given in the content of the type attribute. These names are not used to provide structural information, the structure being represented only by the spanGrp and span hierarchy. However, the organization into spanGrp and span is not always sufficient to represent all the details of the tier organization of each software feature. This is the case for some of the ELAN structures, which can specify the nature of span elements further than in the TEI feature. For example, the timediv - ELAN property specifies that only contiguous temporal division is allowed, whereas the incl property allows non-contiguous elements. It was therefore necessary to include the type of organization in the header of the TEI file, @@ -873,18 +873,18 @@

Exporting to Research Tools -

In the - TEICORPO approach, no modification is made to the +

In the + TEICORPO approach, no modification is made to the original format and conversion remains as lossless as possible. This allows for all types of corpora to be stored for long-term preservation purposes. It also allows the corpora to be used with other editing tools, some of which are suited to specific - processing: for example, - Praat for phonetics/phonology; - Transcriber/ - CLAN for raw transcription; and + Praat for phonetics/phonology; + Transcriber/ + CLAN for raw transcription; and ELAN for gesture and visual coding.

However, a large proportion of scientific research and applications done using corpora requires further processing of the data. For example, although querying or using raw @@ -896,8 +896,8 @@ structure. This microstructure is integrated in Schmidt’s approach, in which the TEI file can contain standardized information about words, specific spoken language information, and sometimes even POS information.

-

This approach was not adopted in - TEICORPO for several reasons. First, we had to +

This approach was not adopted in + TEICORPO for several reasons. First, we had to deal with a large variety of coding approaches, which makes it difficult to conduct work similar to that done in CHILDES (MacWhinney 2000; see ). Second, there was no @@ -912,8 +912,8 @@ span elements without modifying the original u element information. Second, we decided to design another category of tools for processing or making it possible to process the spoken language corpus, and to use powerful tools in corpus - analysis. This part of the - TEICORPO library is described in the next + analysis. This part of the + TEICORPO library is described in the next section.

@@ -938,22 +938,22 @@ Basic Import and Export Functions

The command-line interface (see ) can perform conversions between TEI and the formats used by the following programs: - CLAN, ELAN, - Praat, and - Transcriber. The conversions can be performed on + type="software" xml:id="R110" target="softw:clan"/> + CLAN, ELAN, + Praat, and + Transcriber. The conversions can be performed on single files or on whole directories or on a file tree. The command-line interface is suited to automatic processing in offline environments. The online interface (see - ) + type="software" xml:id="R114" target="softw:teiconvert"/> + ) can convert one or several files selected by the user, but not whole directories. Results appear in the user’s download folder.

In addition to the conversion to and from the alignment software, the online version of - - TEICORPO offers import and export in common + + TEICORPO offers import and export in common spreadsheet formats (.xlsx and .csv) and word processing formats (.docx and .txt). Importing data is useful to create new data, and exporting is used to make reports or examples for a publication and for end users not familiar with transcription tasks or @@ -1006,33 +1006,33 @@ transcription.

Other features are available in both types of interface (command line and web service). - - TEICORPO allows the user to exclude some tiers, + + TEICORPO allows the user to exclude some tiers, for example adult tiers in acquisition research where the user wants to study child production only, or comment tiers which are not necessary for some studies.

Export to Specialized Software

Another kind of export concerns textometric software. - TEICONVERT makes spoken language data available - for TXM (Heiden 2010; see + TEICONVERT makes spoken language data available + for TXM (Heiden 2010; see ), - Le Trameur ( + Le Trameur (Fleury and Zimina 2014; see ), and - Iramuteq (see and ), and + Iramuteq (see and de Souza et al. 2018), providing a dedicated TEI export for these tools. For example, for the TXM software, the + xml:id="R121" target="softw:txm"/>TXM software, the export includes a text element made of utterance elements including age and speaker attributes. presents an example for the - TXM software.

@@ -1064,12 +1064,12 @@ Example of XML for the TXM software + target="softw:txm"/>TXM software
-

An export has been developed for Lexico and Le Trameur textometric +

An export has been developed for Lexico and Le Trameur textometric software with a simple SGML file without timelines (see ).

@@ -1078,23 +1078,23 @@ <loc=MOT>you have to rest now ? <loc=CHI>yes . <loc=MOT>from your big singing extravaganza ? <loc=CHI>yes that was a party . <loc=MOT>woof Example of export for the Lexico or Lexico or Le Trameur software

Likewise, another export is available for the textometric tool Iramuteq + xml:id="R124" target="softw:iramuteq"/>Iramuteq without timelines (see ).

**** -*MOT you have to rest now ? -*CHI yes . -*MOT from your big singing extravaganza ? -*CHI yes that was a party . -*MOT woof . -*MOT that was a party that sure was some party . Example of export for the IRAMUTEQ software + target="softw:iramuteq"/>IRAMUTEQ software
-

In all these cases, - TEICORPO is able to provide an export file and to +

In all these cases, + TEICORPO is able to provide an export file and to remove unnecessary information from the TEI pivot format. This is useful, for example, with textometric software, which works only with orthographic tiers without a timeline or dependent information.

@@ -1107,39 +1107,39 @@ linguistic research. A present difficulty with these grammatical analyzers is that most often they run only on raw orthographic material, excluding other information. Moreover, their results are not always in a format that can be used with traditional spoken - language software such as - CLAN , ELAN, - Praat, - Transcriber, nor of course in TEI format.

-

- TEICORPO provides a way to solve this problem by + language software such as + CLAN , ELAN, + Praat, + Transcriber, nor of course in TEI format.

+

+ TEICORPO provides a way to solve this problem by running analyzers and putting the results from the analysis back into TEI format. Once the TEI format has been enriched with grammatical information, it is possible to use the - results and convert them back to ELAN or - Praat and use the grammatical information in these + results and convert them back to ELAN or + Praat and use the grammatical information in these spoken language software packages. It is also possible to export to TXM and to use the + xml:id="R133" target="softw:txm"/>TXM and to use the grammatical information in the textometric software. Two grammatical analyzers have been - implemented in - TEICORPO: - TreeTagger and - CoreNLP.

+ implemented in + TEICORPO: + TreeTagger and + CoreNLP.

- - TreeTagger -

- TreeTagger -

Accessed March 11, 2021, + TreeTagger +

+ TreeTagger +

Accessed March 11, 2021, .

- (Schmid 1994; (Schmid 1994; 1995) is a tool for annotating text with part-of-speech and lemma information. The software is freely available for research, education, and evaluation. It is available in twenty-five languages, provides @@ -1147,29 +1147,29 @@ done for instance by Benzitoun, Fort, and Sagot (2012) in the PERCEO project. They defined a syntactic model suitable for spoken language corpora, using the training feature of - TreeTagger and an iterative process including + type="software" xml:id="R207" target="softw:treetagger"/> + TreeTagger and an iterative process including manual corrections to improve the results of the automatic tool.

-

The command-line version of - TEICORPO should be used to generate an annotated +

The command-line version of + TEICORPO should be used to generate an annotated file with lemma and POS information based on - TreeTagger. - TreeTagger should be installed separately. The - implementation of - TreeTagger in - TEICORPO includes the ability to use any + target="softw:treetagger"/> + TreeTagger. + TreeTagger should be installed separately. The + implementation of + TreeTagger in + TEICORPO includes the ability to use any syntactic model. For French data, we used the PERCEO model (Benzitoun, Fort, and Sagot 2012).

-

The command line to be used is: - java -cp - TEICORPO.jar fr.ortolang. - TEICORPO. Tei TreeTagger +

The command line to be used is: + java -cp + TEICORPO.jar fr.ortolang. + TEICORPO. Tei TreeTagger filenames... with additional parameters:

@@ -1182,16 +1182,16 @@

-model filename

filename is the full name of the - TreeTagger syntactic model. In our case, + xml:id="R145" target="softw:treetagger"/> + TreeTagger syntactic model. In our case, we use the PERCEO model.

-program filename

filename is the full location of the - TreeTagger program, according to the + xml:id="R146" target="softw:treetagger"/> + TreeTagger program, according to the system used (Windows, MacOS, or Linux).

@@ -1203,8 +1203,8 @@

The environment variable TREE_TAGGER can be used to locate the model and the program. If no -program option is used, the default name for the - TreeTagger program is used.

+ type="software" xml:id="R147" target="softw:treetagger"/> + TreeTagger program is used.

The -model parameter is mandatory.

The resulting filename ends with .tei_corpo_ttg.tei_corpo.xml or a specific name provided by the user (option -o).

@@ -1278,8 +1278,8 @@ possibilities offered by the TEI spanGrp and span elements. It is very powerful as it enables the user to insert as many description levels as necessary (ten levels exist in the current version of CONLL). The implementation of - - CoreNLP (see below) takes full advantage of + + CoreNLP (see below) takes full advantage of these possibilities.
@@ -1320,55 +1320,55 @@
- - Stanford CoreNLP -

- The Stanford Core Natural Language Processing -

Accessed March 11, 2021, + Stanford CoreNLP +

+ The Stanford Core Natural Language Processing +

Accessed March 11, 2021, .

- ( - CoreNLP) package is a suite of tools (Manning et + ( + CoreNLP) package is a suite of tools (Manning et al. 2014) that can be used under a GNU General Public License. The suite provides several tools such as a tokenizer, a POS tagger, a parser, a named entity recognizer, temporal tagging, and coreference resolution. All the tools are available for English, but only some of them are available for all languages. All software libraries are integrated into Java JAR files, so all that is + target="softw:Java"/>Java JAR files, so all that is required is to download JAR files from the CoreNLP website + target="softw:stanfordcorenlp"/>CoreNLP website

Accessed May 5, 2021, .

-
to use them with - TEICORPO. Using the analyzer is similar to using - - TreeTagger . The -model and -syntaxformat +
to use them with + TEICORPO. Using the analyzer is similar to using + + TreeTagger . The -model and -syntaxformat parameters can be used in a similar way to specify the grammatical model to be used and the output format. A command line example is:

-

- java -cp " - teicorpo.jar:directory_for_SNLP/*" - fr.ortolang. - teicorpo.TeiSNLP -syntaxformat svalue -model +

+ java -cp " + teicorpo.jar:directory_for_SNLP/*" + fr.ortolang. + teicorpo.TeiSNLP -syntaxformat svalue -model filename.tei_corpo.xml

The directory_for_SNLP is the name of the location on a computer where - all the - CoreNLP JAR files can be found. Note that using - the - CoreNLP software makes heavy demands on the + all the + CoreNLP JAR files can be found. Note that using + the + CoreNLP software makes heavy demands on the computer’s memory resources and it is necessary to instruct the Java software to + xml:id="R154" target="softw:Java"/>Java software to use a large amount of memory (for example to insert parameter -mx5g before parameter -cp to indicate that 5 GB of memory will be used for a full English analysis).

The -model parameter can take three values: english (use the full English grammar), french (use the full French grammar), or the name of a - CoreNLP parameter file which specifies any type + type="software" xml:id="R214" target="softw:stanfordcorenlp"/> + CoreNLP parameter file which specifies any type of analysis that is available in - CoreNLP.

+ target="softw:stanfordcorenlp"/> + CoreNLP.

The -syntaxformat parameter can take four values: conll (a full analysis with all possible tools: ten levels are produced in this case), dep (a syntactic analysis using a dependency grammar), and ref @@ -1378,24 +1378,24 @@

Exporting the Grammatical Analysis

The results from the grammatical analysis can be used in transcription files such as - those used by - Praat and ELAN. A partition-like visual + those used by + Praat and ELAN. A partition-like visual presentation of data is very handy to represent a part of speech or a CONLL result. The orthographic line will appear at the top with divisions into words, divisions into parts of speech, and other syntactic information below. As the result of the analysis can contain a large number of tiers (each speaker will have as many tiers as there are elements in the grammatical analysis: for example, word, POS, and lemma for - Tree Tagger; ten tiers for - CoreNLP full analysis), it is helpful to limit the + type="software" xml:id="R216" target="softw:treetagger"/> + Tree Tagger; ten tiers for + CoreNLP full analysis), it is helpful to limit the number of visible tiers, either using the -a option of - TEICORPO, or limiting the display with the + xml:id="R157" target="softw:teicorpo"/> + TEICORPO, or limiting the display with the annotation tool.

-

An example is presented below in the ELAN tool (see An example is presented below in the ELAN tool (see ). The original utterance was si c’est comme ça je m’en vais (if that’s how it is, I’m leaving). It is displayed in the first line, highlighted in pink. The analysis into words (second line, consisting of numbers), @@ -1405,22 +1405,22 @@ (is).

- Example of - TreeTagger analysis representation in a + Example of + TreeTagger analysis representation in a partition software program

Export can be done from TEI into a format used by textometric software (see ). This is the case for TXM, -

See the Textométrie website, last updated June 29, 2020, TXM, +

See the Textométrie website, last updated June 29, 2020, .

a textometric software application. In this case, instead of using a partition representation, the information from the grammatical analysis is inserted at the word level in an XML structure. For example, in the case below, the TXM export includes - - TreeTagger annotations in POS, adding + xml:id="R161" target="softw:txm"/>TXM export includes + + TreeTagger annotations in POS, adding lemma and pos attributes to the word element w.

@@ -1452,69 +1452,69 @@ - Example of - TreeTagger analysis representation that can be - imported into Example of + TreeTagger analysis representation that can be + imported into TXM
Comparison with Other Software Suites

The additional functionalities available in the - TEICORPO suite are close to those available in the - - Weblicht web services ( + TEICORPO suite are close to those available in the + + Weblicht web services ( Hinrichs, Hinrichs, and Zastrow 2010). To a certain extent, the two suites of tools ( - Weblicht and - TEICORPO) have the same purpose and + type="software" xml:id="R167" target="softw:weblicht"/> + Weblicht and + TEICORPO) have the same purpose and functionalities. They can import data from various formats, run similar processes on the data, and export the data for scientific uses. In some cases, the services could - complement each other or - TEICORPO could be integrated in the - Weblicht services. This is the case, for example, + complement each other or + TEICORPO could be integrated in the + Weblicht services. This is the case, for example, for handling the CHILDES format, which at the time of writing is more functional in - TEICORPO than in - Weblicht.

+ type="software" xml:id="R171" target="softw:teicorpo"/> + TEICORPO than in + Weblicht.

A major difference between the two suites is in the way they can be used and in the - type of data they target. - TEICORPO is intended to be used not as an + type of data they target. + TEICORPO is intended to be used not as an independent tool, but as a utility tool that helps researchers to go from one type of data to another. For example, the syntactic analysis is intended to be used as a first step before being used in tools such as - Praat, ELAN, or TXM. Our more + target="softw:praat"/> + Praat, ELAN, or TXM. Our more recent developments (see Badin et al. 2021) made it possible to insert metadata stored in CSV files (including participant metadata) into the TEI files. This makes it possible to achieve more powerful corpus analysis - using a tool such as TXM.

+ using a tool such as TXM.

Our approach is somewhat similar to what is suggested in the conclusion of Schmidt, Hedeland, and Jettka (2017), who describe a mechanism that makes it possible to use the power of - Weblicht to process their files that are in the + target="softw:weblicht"/> + Weblicht to process their files that are in the ISO/TEI format. A similar mechanism could be used within - TEICORPO to take advantage of the tools that are - implemented in - Weblicht. However, Schmidt, Hedeland, and Jettka + xml:id="R179" target="softw:teicorpo"/> + TEICORPO to take advantage of the tools that are + implemented in + Weblicht. However, Schmidt, Hedeland, and Jettka (2017) suggest in their conclusion that it would be more interesting to work directly on ISO/TEI files because they contain a richer format. This is exactly what we did in - TEICORPO. Our suggestion would be to use the tools + target="softw:teicorpo"/> + TEICORPO. Our suggestion would be to use the tools created by Schmidt, Hedeland, and Jettka (2017) directly with the - TEICORPO files, so that their work would + >2017) directly with the + TEICORPO files, so that their work would complement ours. Moreover, in this way, the two projects would be compatible and provide either new functionalities when the projects have clearly different goals, or data variants when the goals are closer.

@@ -1522,42 +1522,42 @@
Conclusion -

- TEICORPO is a functional tool, created by the CORLI +

+ TEICORPO is a functional tool, created by the CORLI network and ORTOLANG, that converts files created by software specializing in editing spoken-language data into TEI format. The result is fully compatible with the most recent developments in TEI, especially those that concern spoken-language material.

The TEI files can also be converted back to the original formats or to other formats used in spoken-language editing to take advantage of their functionalities. This makes TEI a - useful pivot format. Moreover, - TEICORPO allows conversion to formats used by tools + useful pivot format. Moreover, + TEICORPO allows conversion to formats used by tools dedicated to corpus exploration and browsing.

-

- TEICORPO exists as a command-line interface as well +

+ TEICORPO exists as a command-line interface as well as a web service. It can thus be used by novice as well as advanced users, or by developers of linguistic software. The tool is free and open source so it can be further used and developed in other projects.

-

- TEICORPO is intended to be part of a large set of +

+ TEICORPO is intended to be part of a large set of tools using TEI for linguistic corpus research. It can be used in parallel with or as a complement to other tools such as Weblicht or the EXMARaLDA tools (see EXMARaLDA tools (see Schmidt, Hedeland, and Jettka 2017). A - specificity of - TEICORPO is that it is more suitable for processing + specificity of + TEICORPO is that it is more suitable for processing extended forms of TEI data (especially forms which are not inside the main u - element in the TEI code). - TEICORPO is also linked to - TEIMETA, a flexible tool for describing spoken + element in the TEI code). + TEICORPO is also linked to + TEIMETA, a flexible tool for describing spoken language corpora in a web interface generated from an ODD file (Etienne, Liégois, and Parisse, accepted). As TEI enables metadata and data to be stored in the same file, sharing this format will promote metadata sharing and will keep metadata linked to their data during the life cycle of the data.

Potential further developments could provide wider coverage of different formats such as CMDI or linked data for editing or data exploration purposes; allow - TEICORPO to work with other external tools such as + xml:id="R191" target="softw:teicorpo"/> + TEICORPO to work with other external tools such as grammatical analyzers; or enable the visualization of multilevel annotations.

@@ -1573,13 +1573,13 @@ Corpus 22. doi:10.4000/corpus.5752. - - - Barras, - Claude, Edouard - Geoffrois, Zhibiao - Wu, and Mark - Liberman. 2000. <rs type="soft.name" + <ptr type="software" xml:id="R224" target="softw:transcriber"/> + <rs type="cit:soft.bib.ref" ref="#R224"> + <bibl xml:id="barras2000"><rs type="cit:soft.agent" ref="#R224"><author>Barras, + Claude</author></rs>, <rs type="cit:soft.agent" ref="#R224"><author>Edouard + Geoffrois</author></rs>, <rs type="cit:soft.agent" ref="#R224"><author>Zhibiao + Wu</author></rs>, and <rs type="cit:soft.agent" ref="#R224"><author>Mark + Liberman</author></rs>. <date>2000</date>. <title level="a"><rs type="cit:soft.name" ref="#R224">Transcriber</rs>: Development and Use of a Tool for Assisting Speech Corpora Production. In Speech Annotation and Corpus Tools, edited by Steven Bird and Jonathan @@ -1601,10 +1601,10 @@ issue, Speech Communication 33 (1–2): 23–60. doi:10.1016/S0167-6393(00)00068-6. - - - Boersma, - Paul. 2001. <rs type="soft.name" + <ptr type="software" xml:id="R225" target="softw:praat"/> + <rs type="cit:soft.bib.ref" ref="#R225"> + <bibl xml:id="boersma2001"><rs type="cit:soft.agent" ref="#R225"><author>Boersma, + Paul</author></rs>. <date>2001</date>. <title level="a"><rs type="cit:soft.name" ref="#R225">Praat</rs>, A System for Doing Phonetics by Computer. Glot International 5 (9/10): . - - - Etienne, - Carole, Loïc - Liégeois, and Christophe - Parisse. Accepted. <rs type="soft.name" ref="#R226" + <ptr type="software" xml:id="R226" target="softw:teimeta"/> + <rs type="cit:soft.bib.ref" ref="#R226"> + <bibl xml:id="etienne"><rs type="cit:soft.agent" ref="#R226"><author>Etienne, + Carole</author></rs>, <rs type="cit:soft.agent" ref="#R226"><author>Loïc + Liégeois</author></rs>, and <rs type="cit:soft.agent" ref="#R226"><author>Christophe + Parisse</author></rs>. Accepted. <title level="a"><rs type="cit:soft.name" ref="#R226" >TEIMETA</rs>: An Evolutive Solution to Describe Spoken Language Corpora in a Web Interface Generated from an ODD File. - - - Fleury, - Serge, and Maria - Zimina. 2014. <rs type="soft.name" + <ptr type="software" xml:id="R227" target="softw:letrameur"/> + <rs type="cit:soft.bib.ref" ref="#R227"> + <bibl xml:id="fleury2014"><rs type="cit:soft.agent" ref="#R227"><author>Fleury, + Serge</author></rs>, and <rs type="cit:soft.agent" ref="#R227"><author>Maria + Zimina</author></rs>. <date>2014</date>. <title level="a"><rs type="cit:soft.name" ref="#R227">Trameur</rs>: A Framework for Annotated Text Corpora Exploration. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations, edited by @@ -1633,10 +1633,10 @@ >57–61. N.p.: Dublin City University and Association for Computational Linguistics. . - - - Heiden, - Serge. 2010. The <rs type="soft.name" + <ptr type="software" xml:id="R228" target="softw:txm"/> + <rs type="cit:soft.bib.ref" ref="#R228"> + <bibl xml:id="heiden2010"><rs type="cit:soft.agent" ref="#R228"><author>Heiden, + Serge</author></rs>. <date>2010</date>. <title level="a">The <rs type="cit:soft.name" ref="#R228">TXM</rs> Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation (PACLIC24), edited by @@ -1646,12 +1646,12 @@ for Digital Enhancement of Cognitive Development, Waseda University. . - - - Hinrichs, - Erhard, Marie - Hinrichs, and Thomas - Zastrow. 2010. <rs type="soft.name" + <ptr type="software" xml:id="R229" target="softw:weblicht"/> + <rs type="cit:soft.bib.ref" ref="#R229"> + <bibl xml:id="hinrichs2010"><rs type="cit:soft.agent" ref="#R229"><author>Hinrichs, + Erhard</author></rs>, <rs type="cit:soft.agent" ref="#R229"><author>Marie + Hinrichs</author></rs>, and <rs type="cit:soft.agent" ref="#R229"><author>Thomas + Zastrow</author></rs>. <date>2010</date>. <title level="a"><rs type="cit:soft.name" ref="#R229">WebLicht</rs>: Web-Based LRT Services for German. In Proceedings of the ACL 2010 System Demonstrations, 25–29. . @@ -1660,10 +1660,10 @@ 2016. Language Resource Management — Transcription of Spoken Language. ISO 24624:2016. . - - - Kipp, - Michael. 2001. <rs type="soft.name" + <ptr type="software" xml:id="R230" target="softw:anvil"/> + <rs type="cit:soft.bib.ref" ref="#R230"> + <bibl xml:id="kipp2001"><rs type="cit:soft.agent" ref="#R230"><author>Kipp, + Michael</author></rs>. <date>2001</date>. <title level="a"><rs type="cit:soft.name" ref="#R230">Anvil</rs>: A Generic Annotation Tool for Multimodal Dialogue. In Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech 2001 Scandinavia), vol. 2, edited by Paul Dalsgaard, @@ -1684,15 +1684,15 @@ byJoan C. Beal, Karen P. Corrigan, and Hermann L. Moisl, 163–80. Houndmills, Basingstoke, Hampshire: Palgrave-Macmillan. - - - Manning, - Christopher D., Mihai - Surdeanu,John - Bauer, Jenny - Finkel, Steven J. - Bethard, and David - McClosky. 2014. <rs type="soft.name" + <ptr type="software" xml:id="R231" target="softw:stanfordcorenlp"/> + <rs type="cit:soft.bib.ref" ref="#R231"> + <bibl xml:id="manning2014"><rs type="cit:soft.agent" ref="#R231"><author>Manning, + Christopher D.</author></rs>, <rs type="cit:soft.agent" ref="#R231"><author>Mihai + Surdeanu</author></rs>,<rs type="cit:soft.agent" ref="#R231"><author>John + Bauer</author></rs>, <rs type="cit:soft.agent" ref="#R231"><author>Jenny + Finkel</author></rs>, <rs type="cit:soft.agent" ref="#R231"><author>Steven J. + Bethard</author></rs>, and <rs type="cit:soft.agent" ref="#R231"><author>David + McClosky</author></rs>. <date>2014</date>. <title level="a"><rs type="cit:soft.name" ref="#R231">The Stanford CoreNLP Natural Language Processing Toolkit</rs>. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, edited by Kalina Bontcheva and @@ -1725,13 +1725,13 @@ the ACL SIGDAT Workshop. Author’s version available at . - + - - Schmidt, + + Schmidt, Thomas 2004. Transcribing and Annotating Spoken Language with - <rs type="soft.name" ref="#R232">EXMARaLDA</rs>. In Proceedings of the + EXMARaLDA. In Proceedings of the LREC-Workshop on XML-based Richly Annotated Corpora. Paris: ELRA. Author’s version available at . @@ -1750,39 +1750,39 @@ Electronic Conference Proceedings 136. Linköping, Sweden: LiU Electronic Press. ; . - - - Schmidt, - Thomas, and Wilfried - Schütte. 2010. <rs type="soft.name" + <ptr type="software" xml:id="R233" target="softw:folker"/> + <rs type="cit:soft.bib.ref" ref="#R233"> + <bibl xml:id="schmidts2010"><rs type="cit:soft.agent" ref="#R233"><author>Schmidt, + Thomas</author></rs>, and <rs type="cit:soft.agent" ref="#R233"><author>Wilfried + Schütte</author></rs>. <date>2010</date>. <title level="a"><rs type="cit:soft.name" ref="#R233">FOLKER</rs>: An Annotation Tool for Efficient Transcription of Natural, Multi-party Interaction. In Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), 2091–96. N.p.: European Language Resources Association (ELRA). . - - - Souza, Marli - Aparecida Rocha de, Marilene Loewen Wall, + + Souza, Marli + Aparecida Rocha de, Marilene Loewen Wall, Andrea Cristina de Morais Chaves Thuler, Ingrid Margareth Voth Lowen, and - Aida Maris Peres. - 2018. The Use of <rs type="soft.name" ref="#R234" + type="cit:soft.agent" ref="#R234"><author>Ingrid Margareth Voth Lowen</author></rs>, and + <rs type="cit:soft.agent" ref="#R234"><author>Aida Maris Peres</author></rs>. + <date>2018</date>. <title level="a">The Use of <rs type="cit:soft.name" ref="#R234" >IRAMUTEQ</rs> Software for Data Analysis in Qualitative Research. Revista da Escola de Enfermagem da USP 52, e03353. doi:10.1590/S1980-220X2017015003353. - - - Wittenburg, - Peter, Hennie - Brugman, Albert - Russel, Alex - Klassmann, and Han - Sloetjes. 2006. <rs type="soft.name + <ptr type="software" xml:id="R235" target="softw:ELAN"/> + <rs type="cit:soft.bib.ref" ref="#R235"> + <bibl xml:id="wittenburg2006"><rs type="cit:soft.agent" ref="#R235"><author>Wittenburg, + Peter</author></rs>, <rs type="cit:soft.agent" ref="#R235"><author>Hennie + Brugman</author></rs>, <rs type="cit:soft.agent" ref="#R235"><author>Albert + Russel</author></rs>, <rs type="cit:soft.agent" ref="#R235"><author>Alex + Klassmann</author></rs>, and <rs type="cit:soft.agent" ref="#R235"><author>Han + Sloetjes</author></rs>. <date>2006</date>. <title level="a"><rs type="cit:soft.name " ref="#R235">ELAN</rs>: A Professional Framework for Multimodality Research. In Proceedings of LREC 2006, Fifth International Conference on Language Resources and Evaluation, See the CEI2TEI GitHub repository, accessed June 25, + >CEI2TEI GitHub repository, accessed June 25, 2021, . is simple (attList items suppressed for brevity: they follow the @@ -589,7 +589,7 @@ proposed vocabulary, provided in SKOS (Simple Knowledge Organization System) format (as part of the project’s - GitHub repository,Accessed July 13, 2021, . at The most recent version of the whole textual database is available in the CBETA XML P5 GitHub repository, accessed April 20, 2020, .

@@ -502,9 +502,9 @@

By the year 2010, the practice of using separate text files for different witnesses of a text had become well established in our workflow. For tracking changes to these files, we had used version control tools from the start. At some point, we realized that the modern - distributed variety of these tools, Git and GitHub, not only had the potential + distributed variety of these tools, Git and GitHub, not only had the potential to solve the problem of keeping track of changes made to a file, but could also be used to hold all witnesses of a text in one repository, each of them represented as a branch. (In the terminology of version control software, a branch @@ -517,13 +517,13 @@ version of a text at least as versatile as a printed scholarly edition. For me, this also included taking ownership of one specific copy of such an edition and tracking the work by adding marginal notes, comments, and references directly into the book. With GitHub - as a repository for texts and Git as a means to control the various maintenance tasks, + type="software" xml:id="R4" target="softw:github"/>GitHub + as a repository for texts and Git as a means to control the various maintenance tasks, researchers interested in a text could clone the text, add their own marginal notes, then make their version of the text available to us or any other researcher to integrate, if we so chose.

-

A Git +

A Git workflow can use any kind of digital material, but it works better with textual material as opposed to images or videos, and even better for texts that use lines as a structural element. This again is where the plain text we used in the Daozang @@ -541,10 +541,10 @@

As described in that talk (published as Wittern 2013), the text format used here is not simply plain text, but rather an extended form of the text format ugit logsed in the Emacs - Orgmode,Accessed May 18, 2020, . in spirit + target="softw:emacs"/>Emacs + Orgmode,Accessed May 18, 2020, . in spirit comparable to the much more frequently seen Markdown, but better. The defining difference here is the more elegant and functional choice of markup elements, and the fact that the format was originally conceived as the base for a note-taking and scheduling application, @@ -552,7 +552,7 @@ the development of the software (which is itself community driven) informs the choices and considerations for markup constructs. For the DZJY project, we added a few more conventions, to accommodate our specific needs, but without changing any of the essential - features. Org mode uses what I called an implicit markup, which is exactly the opposite of XML. Org mode’s markup is as short as possible and in many cases derived from context. An asterisk * followed by a space at the @@ -564,7 +564,7 @@

From the beginning, the DZJY was in my view itself a pilot project for a much larger project, on which preparatory work started in earnest in 2012: the Kanseki Repository (GitHub username + target="softw:github"/>GitHub username @kanripo).Accessed June24, 2020, and . @@ -576,7 +576,7 @@ tradition of scholarly editing and its distinction between documentary edition and interpretative edition. These two types are distinguished through naming conventions for the Git branches. Documentary editions + target="softw:git"/>Git branches. Documentary editions are also represented through digital facsimiles, which can be called up to be displayed side by side with the transcribed text. Interpretative editions may normalize the characters used to modern forms, add punctuation, and also make it possible to add @@ -586,13 +586,13 @@ titles to be included in a first phase of the project; this catalog is also being supplemented by users who deposit whatever texts they are interested in into the repository. Since the initial publication on GitHub in September 2015, and the + target="softw:github"/>GitHub in September 2015, and the launch of a dedicated website in March 2016, usage has been increasing slowly but steadily.

Kanripo Project Details

All the texts are freely available on GitHub in their source form. + target="softw:github"/>GitHub in their source form. This repository of texts can be accessed through the kanripo.org website, but also through a module of the Emacs editor called Mandoku. This allows users to query, access, clone, edit, and @@ -606,7 +606,7 @@

demonstrate the concept and functions of the Kanseki Repository. On the website, users can search for texts or browse the catalog. Once a text is found, the webserver reads it from the GitHub repository and serves it to the user. For most texts, there are different editions to choose from; usually both documentary and interpretative versions exist. For many texts, there is also a digital facsimile, which can be called up alongside the @@ -618,15 +618,15 @@ A text in the Kanseki Repository

In the screenshot in , there is a link at the - top of the page labeled GitHub, from which the source of the text can + top of the page labeled GitHub, from which the source of the text can be directly accessed. A user who wishes to make changes to the text, by correcting, annotating, or even translating it, can transfer a copy of this text from the public @kanripo account, either by cloning it to their own account on GitHub, or by downloading it locally.

The user can also log in to the Kanripo website with their Github + xml:id="R151" target="softw:github"/>Github credentials. When this is done for the first time, the user has to grant the Kanseki Repository access to their repositories. In addition, a new repository KR-Workspace is created; some settings related to the use of the @@ -648,25 +648,25 @@ distant reading, text analysis, and similar purposes, a separate account @kr-shadowAccessed June 24, 2020, has been - created on Github. You will find here the texts of the master branch, which is usually the normalized and edited version of the text in a form that makes it easy to download the whole archive at once.

- Mandoku

As mentioned, the texts can also be accessed from the text editor Emacs, which is + xml:id="R18" target="softw:emacs"/>Emacs, which is available on all major platforms. This is intended for people who work intensely with a text, for example as the topic for a PhD thesis. The Emacs module Emacs module MandokuAccessed May 18, 2020, . provides ways to search the KR, clone texts, create new branches, and many other functions. All other Emacs extensions and modules can also be used. shows an example of a text with its digital facsimile, and shows the same poems, rearranged by line, with a @@ -676,8 +676,8 @@

A text from the Kanseki Repository, side by side with a facsimile, - displayed using the Emacs module Mandoku + displayed using the Emacs module Mandoku
@@ -687,7 +687,7 @@
The text with translation, now pulled from the user’s GitHub account
diff --git a/data/JTEI/14_2021-23/jtei-barabuccietal-196-source.xml b/data/JTEI/14_2021-23/jtei-barabuccietal-196-source.xml index eb03a71..afd959c 100644 --- a/data/JTEI/14_2021-23/jtei-barabuccietal-196-source.xml +++ b/data/JTEI/14_2021-23/jtei-barabuccietal-196-source.xml @@ -126,10 +126,10 @@

This paper describes how we dealt with the encoding and transformation of the punctuation in the Early New High German edition of Marco Polo’s travel account. Technically, we implemented a set of general rules (as XSLT templates) + xml:id="R1" target="softw:xslt"/>XSLT templates) plus various exceptions (as descriptive instructions in XML attributes), and applied - them in an automated fashion (using XProc pipelines). In addition to this, we + them in an automated fashion (using XProc pipelines). In addition to this, we discuss the philological foundation of this method and, contextually, we address the topic of the transformation of a single original source into different transcriptions: from a highly diplomatic edition to an interpretative one, going @@ -161,7 +161,7 @@ the master TEI file became too big and its structure too complex, thus too hard to navigate and maintain, even when using advanced XML editors such as Oxygen XML; normalizing punctuation revealed itself as a complex task that required profound changes to the structure of the edited text. @@ -203,8 +203,8 @@ approach addresses both issues. In particular we show how normalizing punctuation represents a dramatic step beyond the more classical normalization of words. The current implementation of our approach, based on XProc and XProc and XSLT, is also presented in section 3.

Moving to an editorial workflow with such a level of automation requires a reevaluation of the role of the editor, from wordsmith to formalizer of rules (and @@ -274,11 +274,11 @@ text are presented in such a way that the reader is granted more informed access to them.

The edition will be published online using a specifically tailored version of EVT (Edition Visualization TechnologyA light-weight, open source tool specifically designed to create digital editions from XML-encoded texts - ((Rosselli Del Turco et al. 2013).) and will present, on the one hand, each witness in its continuum from facsimile to multiple levels of normalization and, on the other hand, the three main witnesses @@ -802,7 +802,7 @@ Multiple editions will be generated automatically from the master TEI file, with no manual intervention on the resulting files. The generated editions files will conform to the TEI subset understood by - EVT.

Some of these desiderata clash with each other. For instance, the desire to @@ -868,7 +868,7 @@ target="#delturcond">Roberto Rosselli Del Turco (n.d.): here two levels of edition are offered, a diplomatic and a more interpretative one. The user can compare the two editions visualizing them synoptically in the EVT software used for the edition.

@@ -899,9 +899,9 @@ project, are ephemeral and never modified directly.

The implementation consists of a series of XSLT transformations, each + target="softw:xslt"/>XSLT transformations, each representing and implementing a single rule, coordinated by three different XProc pipelines, one for each level of edition. The source code is available at .

This methodology contrasts with the established editorial practice of mingling @@ -955,14 +955,14 @@ system that does not allow for this interaction to happen is not able to deal in properly with normalization in general and punctuation in particular.

Each rule is implemented as a small and self-contained XSLT + xml:id="R8" target="softw:xslt"/>XSLT transformation. At the time of writing, the ENHG Marco Polo project comprises about a hundred rules, grouped in twenty macro categories. On average, the core of each rule is implemented in less than three lines of XSLT.

+ xml:id="R9" target="softw:xslt"/>XSLT.

To give the readers an impression of the simplicity of the rule implementation, we - show here the main parts of the XSLT that implement one of the example + show here the main parts of the XSLT that implement one of the example rules described above.

Example: Rule to Join Words Split at the End of a Line @@ -970,7 +970,7 @@ used to mark that a word has been split at the end of a line. In the diplomatic rendition we want to preserve this word division and the forced line break, while in other renditions we want to reconstruct the complete word.

-

The The XSLT excerpt in Example 3 shows how split words are joined when a middle double oblique hyphen is found. The joining is performed in a lossless way: all @@ -1014,8 +1014,8 @@ - XSLT implementation of the rule Join + XSLT implementation of the rule Join words split with a double oblique hyphen.

The rule in Example 3 is @@ -1107,14 +1107,14 @@ identify level-specific steps.

Each pipeline is implemented as an XProc pipeline. All the + target="softw:xproc"/>XProc pipeline. All the pipelines are simple linear flows (i.e., the output of a rule is the input for the next rule). From a methodological point of view, the XProc + xml:id="R14" target="softw:xproc"/>XProc pipeline is a record of all the operations that the scholar performs on the transcription. The creation of an edition level is equivalent to replaying this record. Example 6 shows an excerpt - of the XProc pipeline used to generate the semidiplomatic edition.

It is important to note that pipelines comprise three kinds of steps:

@@ -1163,12 +1163,12 @@ Excerpt of the XProc pipeline used to + target="softw:xproc"/>XProc pipeline used to generate the semi-diplomatic edition. Steps marked A are steps that implement rules; the step marked B takes care of exceptions.

The fact that the editorial workflows for all the editions are formalized in XProc pipelines makes it possible, for instance, to compare these pipelines and see in detail (and with utmost precision) how they differ and what is, in this project, the difference between the processes needed to establish a @@ -1217,19 +1217,19 @@ like to experiment with creating declarative rule generators. Many rules are repetitive in their nature (for example, the normalization of single characters) and it should be possible to express them in a declarative fashion. These abstract - rules would then be translated into XSLT transformations. Another aspect we + rules would then be translated into XSLT transformations. Another aspect we would like to reflect on is how the transformation process directed by the pipelines influences the various levels of abstraction of the document being transformed, drawing parallels with stratified document models such as CMV+P (Barabucci 2019). A final thing we would like to test - is the replacement of the XProc pipelines with pure XSLT pipelines + is the replacement of the XProc pipelines with pure XSLT pipelines (Birnbaum 2017). Replacing XProc with XSLT pipelines would reduce the number of + type="software" xml:id="R21" target="softw:xproc"/>XProc with XSLT pipelines would reduce the number of technologies that other scholars have to be familiar with in order to understand the editorial process in its entirety.

Another future development that we envision is the deconstruction of the @@ -1317,7 +1317,7 @@ Leo S. Olschki. Birnbaum, David J. 2017. Patterns and Antipatterns in <ptr type="software" xml:id="R23" - target="#xslt"/><rs type="soft.name" ref="#R23">XSLT</rs> + target="softw:xslt"/><rs type="cit:soft.name" ref="#R23">XSLT</rs> Micropipelining. In Proceedings of Balisage: The Markup Conference 2017. Balisage Series on Markup Technologies @@ -1486,11 +1486,11 @@ The Digital Vercelli Book. Beta version. Accessed October 22, 2021. . - Rosselli Del Turco, - Roberto, et al. + Rosselli Del Turco, + Roberto, et al. 2013. Edition Visualization Technology. - Accessed April 19, 2021. Accessed April 19, 2021.. Stella, Francesco, ed. 2020. Corpus Rhythmorum Musicum. Last modified July diff --git a/data/JTEI/14_2021-23/jtei-bleeker-et-al-199-source.xml b/data/JTEI/14_2021-23/jtei-bleeker-et-al-199-source.xml index 17a217b..b1dbc73 100644 --- a/data/JTEI/14_2021-23/jtei-bleeker-et-al-199-source.xml +++ b/data/JTEI/14_2021-23/jtei-bleeker-et-al-199-source.xml @@ -391,7 +391,7 @@ type="bibl">Huitfeldt and Sperberg-McQueen 2003). In GODDAG, all children of the markup nodes are typically ordered, but TexMECS provides a notation to mark certain markup nodes as unordered. The GODDAG + xml:id="R5" target="softw:goddag"/>GODDAG processor ignores the default linear order of these elements’ children, and therefore TexMECS supports the representation of nonlinear structures. No known working implementation of TexMECS, however, is currently available. At @@ -558,7 +558,7 @@ TAGML is more expressive. For instance, in XML all annotation values are of type string, but TAGML offers data-typing of annotations. These data types are expressed in UTF-8 and interpreted by the TAGML parser as + target="softw:tagmlparser"/>TAGML parser as different data types. Encoders can distinguish between integer, string, or Boolean values ().

@@ -589,15 +589,15 @@ process as natural as possible. The markup language has the same compactness as XML and is independent of the user environment.TAGML can be edited in any editor, but the open source text editor Sublime has Sublime has a TAGML syntax highlighting + type="cit:soft.name" ref="#R8">TAGML syntax highlighting package, and the reference - implementation Alexandria can be used to parse and validate TAGML documents and store them as a TAG hypergraph. Following the argument of Sperberg-McQueen and Huitfeldt and @@ -1071,7 +1071,7 @@ retrieving the disjointed quotations as one (merged) utterance would only be possible with additional, vocabulary-specific coding. Processing the two q elements as a single q requires a set of XSLT instructions that check + target="softw:XSLT"/>XSLT instructions that check the values of the xml:id and the next and prev attributes in order to know which q elements should be stitched together. In TAGML, both scenarios would be equally straightforward. The hypergraph can be queried @@ -1113,15 +1113,15 @@ TEI transcription of
To process the text of this fragment correctly, one needs to write a rather - complicated set of XSLT instructions. At the very least, these + complicated set of XSLT instructions. At the very least, these instructions need to match the values of the xml:id and prev in order to process the first part of the deletion, look for the second part of the deletion, and then concatenate their textual content. At the same time, one has to prevent the second part from being processed twice (first as the second part of the deletion, and the second time together with the regular del elements). After some experimenting and consulting several XSLT specialists, we have come + target="softw:XSLT"/>XSLT specialists, we have come to no less than three different sets of instructions.The authors are grateful to Peter Boot, Vincent Neyt, and Frederike Neuber for sharing their expertise and invaluable insights. And considering the ingenuity and technical expertise diff --git a/data/JTEI/14_2021-23/jtei-burnard-shoch-odebrecht-194-source.xml b/data/JTEI/14_2021-23/jtei-burnard-shoch-odebrecht-194-source.xml index 2868bb0..29d2507 100644 --- a/data/JTEI/14_2021-23/jtei-burnard-shoch-odebrecht-194-source.xml +++ b/data/JTEI/14_2021-23/jtei-burnard-shoch-odebrecht-194-source.xml @@ -214,12 +214,12 @@ for one: Projects Using the TEI, accessed May 17, 2021, . More recently, the TEIhub project lists more than 12,500 GitHub-hosted TEI + target="softw:GitHub"/>GitHub-hosted TEI projects (last updated May 11, 2021, ); an associated bot called TEI Pelican provides a daily twitter feed of new - TEI Pelican provides a daily twitter feed of new + GitHub repositories containing a TEI header. We are unaware of any systematic analysis of the application types indicated by these data sources, but a glance gives the impression that traditional editorial and resource-building @@ -243,7 +243,7 @@ issues of sampling and balance were prepared for discussion and approval by the members of WG1, and remain available from the Working Group’s website. These and other documents are available from the Action’s GitHub page, + xml:id="R3" target="softw:GitHub"/>GitHub page, accessed May 17, 2021, .

@@ -668,7 +668,7 @@ components used by any ELTeC schema at any level. This ODD also contains documentation and specifies usage constraints applicable across every schema. This base ODD is then processed using the TEI standard odd2odd + target="softw:odd2odd"/>TEI standard odd2odd stylesheet to produce a stand-alone set of TEI specifications which we call eltec-library. Three different ODDs, eltec-0, eltec-1, and eltec-2, then derive specific schemas and documentation for each of the three ELTeC levels, using this @@ -678,12 +678,12 @@ ODDs, we are then able to produce documentation and formal schemas which reflect exactly the scope of each encoding level.

The ODD sources and their outputs are maintained on GitHub and are also GitHub and are also published on Zenodo (Zenodo (Odebrecht et al. 2019) along with the - ELTeC corpora.The GitHub repository for the ELTeC collection + ELTeC corpora.The GitHub repository for the ELTeC collection (last updated May 17, 2021) is found at ; the Zenodo community within which it is being published (last updated April 11, 2021) lives at , which includes - links to the individual GitHub repositories for each corpus.

+ links to the individual GitHub repositories for each corpus.

As well as continuing to expand the collection, and continuing to fine-tune its composition, we hope to improve the consistency and reliability of the metadata associated with each text, as far as possible automatically. For example, we have @@ -740,8 +740,8 @@ History. Ann Arbor, MI: University of Michigan Press. Burnard, Lou. 2016. ODD Chaining for Beginners. TEI Council Technical Working - Paper. TEI GitHub IO Repository. Available at GitHub IO Repository. Available at . Burnard, Lou. 2019. What Is TEI Conformance, and Why Should You Care? diff --git a/data/JTEI/14_2021-23/jtei-cc-pn-erjavec-195-source.xml b/data/JTEI/14_2021-23/jtei-cc-pn-erjavec-195-source.xml index 56f3d92..1250636 100644 --- a/data/JTEI/14_2021-23/jtei-cc-pn-erjavec-195-source.xml +++ b/data/JTEI/14_2021-23/jtei-cc-pn-erjavec-195-source.xml @@ -221,7 +221,7 @@ The Parla-CLARIN Schema

Parla-CLARIN is written as a TEI ODD document, consisting of the prose guidelines and the schema specification, on the basis of which it is possible, using the standard TEI XSLT stylesheets, to derive an XML schema expressed either as a RelaxNG schema, a DTD, or a W3C schema, which is then used for formal validations of a Parla-CLARIN parliamentary corpus.

@@ -339,20 +339,20 @@ Presentation of Parla-CLARIN

Like the TEI Guidelines, the Parla-CLARIN recommendations are available on GitHub, as a + target="softw:github"/>GitHub, as a projectTomaž Erjavec and Andrej Pančur, Parla-CLARIN project GitHub site, last updated March 17, 2021, . of the CLARIN ERIC collection. The project contains a folder for the schema (i.e., the Parla-CLARIN - ODD document and XML schemas derived from it), a folder for the programs that convert the ODD into the XML schemas and to the HTML of the prose and schema definitions, and a folder for examples, which contains an artificial but fully worked out example of a Parla-CLARIN document and subfolders with various example resources, where each should contain: a sample of a corpus in its source encoding; - XSLT script to convert it into Parla-CLARIN; + XSLT script to convert it into Parla-CLARIN; and the output of the conversion. @@ -512,12 +512,12 @@ especially as the primary encoding standard used by various legislative bodies, so some of AKN’s solutions were used in developing the Parla-CLARIN proposal, in particular the typology of divisions of a document. Also developed was a partial, but non-trivial, conversion from AKN to Parla-CLARIN, which covers several AKN example documents. As mentioned in , the example documents and conversion script can be found in the Examples folder of the Parla-CLARIN Git repository. The akn2tei.xsl script attempts to preserve the IDs of the source AKN document, converts the AKN addressee, role, and questions and answers to Parla-CLARIN, and maps FRBR data (which distinguishes a work from @@ -591,10 +591,10 @@ parliamentary proceedings meant for scholarly investigations. This scheme is currently a straightforward customization of the TEI Guidelines, with the majority of the effort having gone into the writing of the prose guidelines of the Parla-CLARIN recommendations - and into developing the conversion from Akoma Ntoso to Parla-CLARIN. We + and into developing the conversion from Akoma Ntoso to Parla-CLARIN. We have not included examples of the encoding, as these are readily available on the GitHub + type="software" xml:id="R3" target="softw:github"/>GitHub documentation page of the project, and large Parla-CLARIN encoded corpora are openly available.

Apart from the siParl 2.0 corpus mentioned above (As we wanted to have corpora that are not only interchangeable but interoperable as well, we created a bespoke ParlaMint XML schema directly in RelaxNG – the schema is compatible with Parla-CLARIN as it validates a subset of documents that would be validated against - Parla-CLARIN. We produced common scripts that can convert any of the four corpora + Parla-CLARIN. We produced common scripts that can convert any of the four corpora to plain text, to CoNLL-U format as used by the Universal Dependencies project, and to - vertical format as used by the CWBThe IMS Open Corpus Workbench - (CWB), last modified March 30, 2021, CWBThe IMS Open Corpus Workbench + (CWB), last modified March 30, 2021, . and Sketch - EngineAccessed January 13, 2022, Sketch + EngineAccessed January 13, 2022, . (Kilgarriff et al. + type="cit:soft.bib.ref" ref="#R15">Kilgarriff et al. 2014) concordancers, as well as to extract complete speech metadata into TSV files.

In order for Parla-CLARIN to achieve its goal of becoming a widely recognized encoding @@ -645,7 +645,7 @@ specification from the default ones in the TEI Guidelines to ones taken or adapted from the collected parliamentary corpora.

Second, as we have already done for ParlaMint, we plan to add to the GitHub Parla-CLARIN + xml:id="R4" target="softw:github"/>GitHub Parla-CLARIN project more down-conversion scripts with which we would increase the usability of the Parla-CLARIN corpora. As mentioned, work also needs to be done to develop a conversion to RDF.

@@ -817,7 +817,7 @@ Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. The Sketch Engine: Ten Years On. Lexicography: Journal of ASIALEX 1 (1): diff --git a/data/JTEI/14_2021-23/jtei-cc-pn-holmes-193-source.xml b/data/JTEI/14_2021-23/jtei-cc-pn-holmes-193-source.xml index d824ae9..0c171d8 100644 --- a/data/JTEI/14_2021-23/jtei-cc-pn-holmes-193-source.xml +++ b/data/JTEI/14_2021-23/jtei-cc-pn-holmes-193-source.xml @@ -91,15 +91,15 @@ forms of data when building project outputs. This article discusses the Digital Victorian Periodical Poetry (DVPP) project, where metadata on about 15,000 poems from nineteenth-century periodicals is captured in a MySQL database, and periodically + target="softw:MySQL"/>MySQL database, and periodically exported to create a TEI file for each poem. Many of the poems are then transcribed and encoded. The canonical source of metadata is the RDB, while the canonical source of textual data is the TEI file. Metadata in the TEI files must be periodically updated from the RDB, without disturbing the textual encoding. Changes to the RDB data may result in changes to the id and filename of the related TEI file, so any existing TEI data is migrated to a new file, and the Subversion repository must be appropriately updated. All - of this is done with XSLT and Ant.

+ of this is done with XSLT and Ant.

The project described in this paper is supported by a ; in other words, it allows for multiple hierarchies over the same dataset. However, beginning with the work of E. F. Codd in the 1970s and the rise of SQL, + type="software" xml:id="SQL" target="softw:SQL"/>SQL, the relational database model familiar today became dominant, and remained so until the relatively recent popularity of NoSQL approaches.

In modeling humanities datasets, both relational databases and XML have notable @@ -136,7 +136,7 @@ terms of enforceable constraints on linking and data integrity as well as speed, members of the TEI and related communities favor XML, pointing to its flexibility and extensibility. In recent years, the speed and power of XSLT and XQuery tools, the + target="softw:XSLT"/>XSLT and XQuery tools, the development of a rich array of schema and validation tools, and the appearance of XML databases have all but eradicated the traditional advantages claimed for relational databases.

@@ -174,8 +174,8 @@ poem, line, and stanza in the middle of its referential domain of study (para 8). Gibson (2012) describes a similar scenario with mixed RDB and - XML data, and how he used Saxonʼs SQL extension functions to overcome the problem.

+ XML data, and how he used Saxonʼs SQL extension functions to overcome the problem.

However, storing XML data in RDB fields is suboptimal. Most serious encoding projects make use of version-control systems such as Git or Subversion, for very good reasons: in a project with many transcribers and encoders, where multiple waves of encoding and @@ -199,8 +199,8 @@ project. This project began life many years ago as a pure-metadata project, capturing information about tens of thousands of poems that appeared in British periodicals during the nineteenth century. At that time, an RDB system seemed a natural and sufficient tool - for the job, so a MySQL database, along with a data-entry interface, + for the job, so a MySQL database, along with a data-entry interface, was set up for the researchers, and data collection proceeded rapidly (). However, after some years the project gained an additional research focus, and more recently funding from the Social Sciences and Research @@ -212,7 +212,7 @@ A record in the relational database. -

The The MySQL database is relatively straightforward. The main table is the Poems table, in which each record corresponds to a specific poem appearing in a given periodical on a given date. Another table, Organs, contains the list of periodicals, and @@ -221,7 +221,7 @@ authors, translators, or illustrators; people are linked in one-to-many relationships through role tables, so that one poem may have multiple illustrators, and the author of one poem may be the translator of another. The database front end is written in PHP.

Our long-term plan is for the entire dataset to be in the form of TEI XML files, but for the first few years of the project, data will continue to be added to the RDB system, @@ -257,8 +257,8 @@ target="https://hcmc.uvic.ca/svn/dvpp/buildTEI.xml">Subversion repository, and details of the process can be found our project - documentation. and XSLT (). + documentation. and XSLT ().

A simple representation of the metadata integration process. @@ -267,7 +267,7 @@

In the initial part of the process, the current state of the database is dumped into an XML file (the application mysqldump can provide data in XML format). This file is stored in the Subversion repository, giving us at least a semblance of version control over the - SQL data, albeit in a rather impoverished fashion. Each poem record in the database is matched against an equivalent XML file if there is one. If there is no matching file, then one is created. If there is a matching file and no changes are @@ -365,9 +365,9 @@ Conference, Graz, Austria, 19 September 2019. . Gibson, Matthew. 2012. Using <ptr type="software" xml:id="XSLT" target="#XSLT"/><rs - type="soft.name" ref="#XSLT">XSLT</rs>’s <ptr type="software" xml:id="SQL" - target="#SQL"/><rs type="soft.name" ref="#SQL">SQL</rs> Extension with Encyclopedia + level="a">Using <ptr type="software" xml:id="XSLT" target="softw:XSLT"/><rs + type="cit:soft.name" ref="#XSLT">XSLT</rs>’s <ptr type="software" xml:id="SQL" + target="softw:SQL"/><rs type="cit:soft.name" ref="#SQL">SQL</rs> Extension with Encyclopedia Virginia. Code{4}lib Journal 16. diff --git a/data/JTEI/14_2021-23/jtei-cc-ra-mylonas-202-source.xml b/data/JTEI/14_2021-23/jtei-cc-ra-mylonas-202-source.xml index 9de8aef..60850bf 100644 --- a/data/JTEI/14_2021-23/jtei-cc-ra-mylonas-202-source.xml +++ b/data/JTEI/14_2021-23/jtei-cc-ra-mylonas-202-source.xml @@ -632,7 +632,7 @@ target="http://nomisma.org/">Nomisma, and CRMtexCIDOC (International Committee for Documentation) Conceptual Reference Model, + target="softw:omekareference"/>Reference Model, accessed July 4, 2022, ; Nomisma (knowledge organization system for numismatics), accessed July 4, 2022, ; CRMtex model for the study of ancient texts (an @@ -679,8 +679,8 @@ governing body which might provide the certification, and in fact the Guidelines are, as indicated in their name, not a standard. However, if the TEI consortium and its members recommend how to fulfill the FAIR principles by using the teiHeader as discussed - for each metric above, and provide XSLT and Schematron files for validation, the output + for each metric above, and provide XSLT and Schematron files for validation, the output of that file could indicate compliance. As always, it is the responsibility of the encoder and project to make sure that the metadata are accurate and detailed.

@@ -694,7 +694,7 @@ predictable and machine-readable form. Specifically, the TEI Guidelines and schema indicate where and how to encode licensing information, metadata formats, documentation, and identifiers and their presence can be verified using XSLT, XPath, and + xml:id="XSLT" target="softw:XSLT"/>XSLT, XPath, and Schematron. Overall, the affordances of the TEI, best practices of the EpiDoc community, and IIP archival format decisions have resulted in a set of documents that measure up to the requirements of FAIR metrics. Furthermore, the best practices adopted by the IIP diff --git a/data/JTEI/16_2023_spa/jtei-rioriande-torresallen-250-source.xml b/data/JTEI/16_2023_spa/jtei-rioriande-torresallen-250-source.xml index b8beb96..8aaa311 100644 --- a/data/JTEI/16_2023_spa/jtei-rioriande-torresallen-250-source.xml +++ b/data/JTEI/16_2023_spa/jtei-rioriande-torresallen-250-source.xml @@ -181,11 +181,11 @@ este grupo de trabajo en: fecha de consulta 16 de julio de 2023, .
en primer lugar, el desarrollo de una nueva infraestructura, TranslateTEI,TranslateTEI ha sido - desarrollada por Hugh Cayless (Duke + type="software" xml:id="R1" target="softw:translatetei"/>TranslateTEI,TranslateTEI ha sido + desarrollada por Hugh Cayless (Duke University) y está disponible en: fecha de consulta 16 de julio de 2023, para mejorar la experiencia de usuario al colaborar con traducciones multilingües de las especificaciones de la TEI;Las especificaciones de la TEI @@ -252,9 +252,9 @@ investigación que trabajan con la codificación o edición de textos en español con TEI desde cualquier país o institución.

El software utilizado para crear y distribuir la encuesta fue Qualtrics,Qualtrics, fecha de - consulta 16 de julio de 2023, Qualtrics,Qualtrics, fecha de + consulta 16 de julio de 2023, y la licencia de uso fue proporcionada por la Universidad de Miami. 135 participantes iniciaron la encuesta, aunque sólo 107 respondieron a todas las secciones y preguntas. 77 de estas 107 @@ -475,12 +475,12 @@ sus proyectos, proponiéndoles respuestas múltiples que incluían: 1. Personalizaciones de los módulos TEI, 2. Utilización de esquemas (ODD, RelaxNG, etc.), 3. Bases de datos XML, 4. Bases de datos relacionales (MySQL, MySQL, PostgreSQL, etc.), 5. Transformaciones XSLT, 6. - Transformaciones XQuery, 7. Vocabularios controlados o tesauros, + xml:id="R5" target="softw:xslt"/>XSLT, 6. + Transformaciones XQuery, 7. Vocabularios controlados o tesauros, 8. Tecnologías sobre Sistemas de Información Geográfica (SIG), 9. Tecnologías de la información y la comunicación, 9. Tecnologías sobre Procesamiento del Lenguaje Natural (PLN), 10. Tecnologías sobre web semántica, 11. Visualización de datos, @@ -495,26 +495,26 @@ parece haber un mayor uso de bases de datos XML (23) por delante de las bases de datos relacionales más tradicionales (18). Esto está en consonancia con la evolución reciente y la llegada de productos de bases de datos XML de código - abierto como eXist.eXist.Exist Database, fecha de consulta 16 de julio de 2023, + type="cit:soft.url" ref="#R7">

En lo que respecta a la transformación y renderizado de XML, el lenguaje más utilizado parece seguir siendo el ya veterano XSLT (29),XSLT, fecha de consulta 16 de julio de 2023, - XSLT (29),XSLT, fecha de consulta 16 de julio de 2023, + . mientras que las transformaciones XQuery - (11)XQuery, fecha de consulta 16 - de julio de 2023, XQuery + (11)XQuery, fecha de consulta 16 + de julio de 2023, . se utilizan menos. Esto no parece coincidir del todo con la pregunta anterior sobre el uso de bases de datos XML, ya que la mayoría de las bases de datos XML nativas utilizan XQuery como principal herramienta de recuperación de datos, en lugar de - XSLT. Entre los participantes existe una curiosa mezcla entre las formas antiguas y las nuevas.

Otras prácticas que los participantes adoptan cuando trabajan con TEI son la @@ -523,9 +523,9 @@ (11), y el Procesamiento del Lenguaje Natural (12%). En Otros, algunos participantes explicaron que utilizaban scripts de interoperabilidad entre distintos esquemas (DCAT, DDI-CDI), anotación lingüística de corpus, XSLT - LaTex - PDF y Cocoon. + target="softw:apachecocoon"/>Cocoon. Lamentablemente, el 9% eligió otros sin especificar más.

@@ -550,20 +550,20 @@ los participantes para publicar sus archivos TEI. El objetivo era controlar si había alguna plataforma que se destacara por su uso. Por ello, propusimos las siguientes: 1. Infraestructura web creada ad hoc (por ejemplo, XML, - XSLT, PHP, Python, etc.), 2. - Generadores web estáticos (Jekyll, Gatsby, etc.), 3. XSLT, PHP, Python, etc.), 2. + Generadores web estáticos (Jekyll, Gatsby, etc.), 3. Boilerplate, 4. eXist; 5. eXist; 5. TEI Publisher, 6. CETEIcean, 7. CETEIcean, 7. Edition Visualization Technology, y añadimos una opción 8. Otros.

No es sorprendente que la puntuación más alta correspondiera a las infraestructuras creadas ad hoc (43). Así pues, la gran @@ -572,80 +572,80 @@ características del texto y publicación. La relevancia de estos datos radica en la escasez de plataformas que respondan a las necesidades de los profesionales. Sin embargo, entre las plataformas más utilizadas, los participantes eligen TEI Publisher (14),TEI - Publisher, fecha de consulta 16 de julio de 2023, TEI Publisher (14),TEI + Publisher, fecha de consulta 16 de julio de 2023, una herramienta diseñada en 2004 para crear repositorios basados en XML a través de una base de datos XML nativa con eXist que emplea además + target="softw:existdb"/>eXist que emplea además una biblioteca de motor de búsqueda de texto llamada Lucene.Exist Database, - fecha de consulta 16 de julio de 2023, Lucene.Exist Database, + fecha de consulta 16 de julio de 2023, . De hecho, los proyectos que se apoyan en la solución de base de datos eXist (9) se encuentran + target="softw:existdb"/>eXist (9) se encuentran entre las primeras posiciones. A continuación, los generadores web estáticos aparecen como una opción viable y en alza, como Jekyll o Gatsby (11).Para más información véase sobre Jekyll o Gatsby (11).Para más información véase sobre Jekyll, fecha de consulta 16 de julio de 2023, y Gatsby, fecha de consulta 16 de julio de - 2023, y Gatsby, fecha de consulta 16 de julio de + 2023, . El hecho de que los generadores web estáticos formen parte de las opciones mejor rankeadas responde probablemente a dos hechos: primero, el auge de la computación mínima que permite la creación de infraestructuras sin necesidad de servidores comerciales o institucionales (por ejemplo, los sitios estáticos pueden vivir en servicios gratuitos como GitHub y - GitHub Pages)GitHub - Pages, fecha de consulta 16 de julio de 2023, GitHub y + GitHub Pages)GitHub + Pages, fecha de consulta 16 de julio de 2023, . y segundo, la falta de acceso a infraestructuras digitales especialmente en América Latina.Para más información sobre el movimiento de la computación mínima, véase la página del grupo Minimal Computing, fecha de consulta 16 de julio de 2023, . Además, entre las otras plataformas para publicar archivos TEI algunos participantes - respondieron estar usando CETEIcean (6),CETEIcean, fecha de - consulta 16 de julio de 2023, CETEIcean (6),CETEIcean, fecha de + consulta 16 de julio de 2023, . - Edition Visualization Technology (5),Edition Visualization Technology (5),EVT, fecha de consulta 16 de julio de 2023, . mientras que sólo uno marcó la hoy bastante anticuada Boilerplate (1).TEI - Boilerplate, fecha de consulta 16 de julio de 2023, Boilerplate (1).TEI + Boilerplate, fecha de consulta 16 de julio de 2023, . Entre los que respondieron con la opción Otros, los participantes añadieron que reconocen su falta de conocimientos acerca de las tecnologías de transformación y publicación de archivos TEI (4), mientras que otros mencionaron estar utilizando otras opciones como Kiln,Kiln, fecha de consulta 16 de julio de - 2023, Kiln,Kiln, fecha de consulta 16 de julio de + 2023, . - TEITOK,TEITOK, fecha - de consulta 16 de julio de 2023, TEITOK,TEITOK, fecha + de consulta 16 de julio de 2023, . - R scripts,R Scripts, fecha de - consulta 16 de julio de 2023, R scripts,R Scripts, fecha de + consulta 16 de julio de 2023, . o tener en mente para el futuro el uso de Textual CommunitiesTextual CommunitiesTextual Communities, fecha de consulta 16 de julio de 2023, . o TEI Publisher.

+ type="cit:soft.url" ref="#R38">
. o TEI Publisher.

Plataformas y herramientas para transformar y/o publicar @@ -746,8 +746,8 @@ mejorar la enseñanza y el aprendizaje de la TEI? Algunos de los participantes insistieron en la necesidad de recursos y materiales de formación a todos los niveles y en español, incluyendo además temas específicos (por ejemplo, - transformaciones con XSLT), así como otros tipos de recursos, como + transformaciones con XSLT), así como otros tipos de recursos, como referencias bibliográficas. La necesidad de formación formal e informal dentro y fuera del mundo académico surge también como otra preocupación legítima. Entre las demandas de formación aflora el desconocimiento de las etapas finales del proceso @@ -781,7 +781,7 @@ ordenador. El consorcio de la TEI está insuflado de un espíritu de apertura y colaboración que hace que los propios usuarios puedan proponer mejoras y modificaciones a través de su repositorio en GitHub. Este espíritu de + target="softw:github"/>GitHub. Este espíritu de colaboración y difusión se manifiesta también a través de un diálogo continuo mediante una lista de discusión en línea y la organización anual de un congreso internacional. Por último, el consorcio es un gran paraguas que da lugar a diferentes @@ -842,23 +842,23 @@ de forma audible otras dos preocupaciones, en primer lugar, la necesidad de materiales de formación para todo el proceso de labor editorial, con especial insistencia en la transformación del XML-TEI (XSLT, XQuery) y + target="softw:xslt"/>XSLT, XQuery) y publicación de los archivos TEI. Esto significa que la comunidad ya ha superado las fases iniciales de familiarización con la TEI, y que ahora urgen temas más avanzados. En segundo lugar, han aparecido varias voces que defienden la necesidad de contar con editores XML gratuitos. El hecho de que el software más popular sea propietario desanima a algunos usuarios. Ni que decir tiene que ya se han dado algunos pasos para adaptar software de código abierto para trabajar con archivos TEI, como Visual Studio Code, que se ha beneficiado del desarrollo del - plugin Scholarly XML, que permite una codificación básica pero rigurosa - en XML-TEI.Este trabajo se debe a Raffaele + en XML-TEI.Este trabajo se debe a Raffaele Viglianti (Maryland Institute for Technology in the Humanities) y el plugin puede descargarse en línea en: fecha de consulta 16 de julio de 2023, .

diff --git a/data/JTEI/7_2014/jtei-7-carter-source.xml b/data/JTEI/7_2014/jtei-7-carter-source.xml index fffe1af..a19fd08 100644 --- a/data/JTEI/7_2014/jtei-7-carter-source.xml +++ b/data/JTEI/7_2014/jtei-7-carter-source.xml @@ -39,19 +39,6 @@ humanities and social sciences, open to quality periodicals looking to publish full-text articles online.

- - -

In the context of this project, private URIs with the prefix softw point to software - items in the software-list.xml file, which are encoded with item elements and - identified in xml:id.

-
- -

In the context of this project, private URIs with the prefix cit point to - category elements in the citation-taxonomy.xml file.

-
-
diff --git a/data/JTEI/7_2014/jtei-7-dee-source.xml b/data/JTEI/7_2014/jtei-7-dee-source.xml index db3ab1d..2180d19 100644 --- a/data/JTEI/7_2014/jtei-7-dee-source.xml +++ b/data/JTEI/7_2014/jtei-7-dee-source.xml @@ -369,7 +369,7 @@ mostly; Creating digital editions Creating digital collections sic. Sixty-two percent intended to transform their data with an independently-developed stylesheet, 24% with a TEI stylesheet, and only + target="softw:teistylesheets"/>TEI stylesheet, and only 10% plan to leave them as TEI XML.

The respondents from the mailing list gave less consistent answers than the expert users, yet still appeared to be using the TEI primarily in service of interoperability @@ -480,7 +480,7 @@ resources to be coupled with transformation technologies such as XSLT in the manner of Bridget Almas’s NEH Tutorial. As will be discussed later in this subsection, certain users find that the absence of integrated XSLT resources on the <rs type="cit:soft.name" ref="#R3">XSLT</rs> resources on the <title level="m">TEI by Example site falls short of their needs.

@@ -490,9 +490,9 @@ list. Two had reached out to a member of the community for guidance, and both had met their mentors in person, one through a summer school TEI training and one through a university course. Five had used the TEI website; two had not. Online tools they - claimed to have used included OxGarage, Roma, TEI by + claimed to have used included <ptr type="software" xml:id="R4" target="softw:oxgarage"/><rs + type="cit:soft.name" ref="#R4">OxGarage</rs>, <ptr type="software" xml:id="R5" + target="softw:roma"/><rs type="cit:soft.name" ref="#R5">Roma</rs>, <title level="m">TEI by Example, and the TEI Guidelines, as well as the university-hosted resources published by the Brown University Women Writers Project (Bauman and Flanders 2013) and Humboldt University in @@ -563,7 +563,7 @@ Very detailed, there’s a lot to read>too time-consuming Important features clearly marked>very helpful. TEI by example very useful for thinking about which tags we need for our project. The XSLT resources are + xml:id="R6" target="softw:xslt"/>XSLT resources are pretty unhelpful, though; The TEI Guidelines are very comprehensive and provide good examples—but no best practices in areas where there is more than one solution. Unfortunately TEI By @@ -751,18 +751,18 @@ </div> <div xml:id="integratedresources"> <head>Integrated Resources</head> - <p>While initiatives such as <ptr type="software" xml:id="R7" target="#tapas"/><rs - type="soft.name" ref="#R7">TAPAS</rs>, <ptr type="software" xml:id="R8" - target="#teichi"/><rs type="soft.name" ref="#R8">TEICHI</rs>, and <ptr type="software" - xml:id="R9" target="#cwrcwriter"/><rs type="soft.url" ref="#R9"><ref - target="https://sites.google.com/site/cwrcwriterhelp/"><rs type="soft.name" + <p>While initiatives such as <ptr type="software" xml:id="R7" target="softw:tapas"/><rs + type="cit:soft.name" ref="#R7">TAPAS</rs>, <ptr type="software" xml:id="R8" + target="softw:teichi"/><rs type="cit:soft.name" ref="#R8">TEICHI</rs>, and <ptr type="software" + xml:id="R9" target="softw:cwrcwriter"/><rs type="cit:soft.url" ref="#R9"><ref + target="https://sites.google.com/site/cwrcwriterhelp/"><rs type="cit:soft.name" ref="#R9">CWRC-Writer</rs></ref></rs><note><p><title level="a">Welcome to <rs - type="soft.name" ref="#R9">CWRC Writer</rs>, CWRC-Writer Help, accessed September 7, 2013, CWRC Writer, CWRC-Writer Help, accessed September 7, 2013, .

have begun to address to different aspects of these needs (Flanders and - Hamlin 2013; Flanders and + Hamlin 2013; Pape, Schöch, and Wegner 2013; Crane 2010), there has yet to be a deeply comprehensive resource intimately linked to the TEI Guidelines themselves. New technical @@ -815,10 +815,10 @@ level="a">Give us Editors! Re-inventing the Edition and Re-thinking the Humanities. OpenStax CNX, May 13. . - Flanders, - Julia, and Scott - Hamlin. 2013. <rs type="soft.name" + <bibl xml:id="flanders13"><ptr type="software" xml:id="R10" target="softw:tapas"/><rs + type="cit:soft.bib.ref" ref="#R10"><rs type="cit:soft.agent" ref="#R10"><author>Flanders, + Julia</author></rs>, and <rs type="cit:soft.agent" ref="#R10"><author>Scott + Hamlin</author></rs>. <date>2013</date>. <title level="a"><rs type="cit:soft.name" ref="#R10">TAPAS</rs>: Building a TEI Publishing and Repository Service. Journal of the Text Encoding Initiative 5. . @@ -849,11 +849,11 @@ level="m">Introduction to Digital Textual Editing: An UNOFFICIAL Guide to the Value of TEI. Slidecast posted June 30, 2013. . - Pape, - Sebastian, Christof - Schöch, and Lutz - Wegner. 2012. <rs type="soft.name" + <bibl xml:id="pape12"><ptr type="software" xml:id="R11" target="softw:teichi"/><rs + type="cit:soft.bib.ref" ref="#R11"><rs type="cit:soft.agent" ref="#R11"><author>Pape, + Sebastian</author></rs>, <rs type="cit:soft.agent" ref="#R11"><author>Christof + Schöch</author></rs>, and <rs type="cit:soft.agent" ref="#R11"><author>Lutz + Wegner</author></rs>. <date>2012</date>. <title level="a"><rs type="cit:soft.name" ref="#R11">TEICHI</rs> and the Tools Paradox: Developing a Publishing Framework for Digital Editions. Journal of the Text Encoding Initiative diff --git a/data/JTEI/7_2014/jtei-7-pfannenschmidt-source.xml b/data/JTEI/7_2014/jtei-7-pfannenschmidt-source.xml index c78f09f..e7b65ef 100644 --- a/data/JTEI/7_2014/jtei-7-pfannenschmidt-source.xml +++ b/data/JTEI/7_2014/jtei-7-pfannenschmidt-source.xml @@ -44,19 +44,6 @@ humanities and social sciences, open to quality periodicals looking to publish full-text articles online.

- - -

In the context of this project, private URIs with the prefix softw point to software - items in the software-list.xml file, which are encoded with item elements and - identified in xml:id.

-
- -

In the context of this project, private URIs with the prefix cit point to - category elements in the citation-taxonomy.xml file.

-
-
diff --git a/data/JTEI/7_2014/jtei-7-schmidt-source.xml b/data/JTEI/7_2014/jtei-7-schmidt-source.xml index ea37512..0f4b342 100644 --- a/data/JTEI/7_2014/jtei-7-schmidt-source.xml +++ b/data/JTEI/7_2014/jtei-7-schmidt-source.xml @@ -36,19 +36,6 @@ humanities and social sciences, open to quality periodicals looking to publish full-text articles online.

- - -

In the context of this project, private URIs with the prefix softw point to software - items in the software-list.xml file, which are encoded with item elements and - identified in xml:id.

-
- -

In the context of this project, private URIs with the prefix cit point to - category elements in the citation-taxonomy.xml file.

-
-
diff --git a/data/JTEI/7_2014/jtei-7-schreibman-intro-source.xml b/data/JTEI/7_2014/jtei-7-schreibman-intro-source.xml index f620795..fa83a67 100644 --- a/data/JTEI/7_2014/jtei-7-schreibman-intro-source.xml +++ b/data/JTEI/7_2014/jtei-7-schreibman-intro-source.xml @@ -32,19 +32,6 @@ articles online.

- - -

In the context of this project, private URIs with the prefix softw point to software - items in the software-list.xml file, which are encoded with item elements and - identified in xml:id.

-
- -

In the context of this project, private URIs with the prefix cit point to - category elements in the citation-taxonomy.xml file.

-
-
diff --git a/data/JTEI/8_2014-15/jtei-8-barbero-source.xml b/data/JTEI/8_2014-15/jtei-8-barbero-source.xml index f197e68..10d8224 100644 --- a/data/JTEI/8_2014-15/jtei-8-barbero-source.xml +++ b/data/JTEI/8_2014-15/jtei-8-barbero-source.xml @@ -55,19 +55,6 @@ articles online.

- - -

In the context of this project, private URIs with the prefix softw point to software - items in the software-list.xml file, which are encoded with item elements and - identified in xml:id.

-
- -

In the context of this project, private URIs with the prefix cit point to - category elements in the citation-taxonomy.xml file.

-
-
diff --git a/data/JTEI/8_2014-15/jtei-8-berti-source.xml b/data/JTEI/8_2014-15/jtei-8-berti-source.xml index 4981c20..4d348ea 100644 --- a/data/JTEI/8_2014-15/jtei-8-berti-source.xml +++ b/data/JTEI/8_2014-15/jtei-8-berti-source.xml @@ -157,7 +157,7 @@ the study and advancement of the field of Classical textual fragmentary heritage and the development of a collaborative environment for crowdsourced annotations. These goals are being achieved by implementing the Perseids Platform and by + target="softw:perseidsplatform"/>Perseids Platform and by encoding the Fragmenta Historicorum Graecorum, one of the most important and comprehensive collections of fragmentary authors.

@@ -217,11 +217,11 @@ and to produce born-digital editions of fragmentary works . In order to achieve these goals, LOFTS is implementing a Fragmentary - Texts Editor within the Perseids PlatformFor a prototype interface, - see . (Fragmentary + Texts Editor within the Perseids PlatformFor a prototype interface, + see . (Almas and Berti 2013) and is producing a digital edition of the Fragmenta Historicorum Graecorum edited by Karl Müller in the nineteenth century (Müller 1878–85; DFHG @@ -249,11 +249,11 @@ target="#berti14">Berti and Stoyanova 2014).

The present paper is divided into two parts. The first part () describes the implementation of a fragmentary texts - editor within Perseids, the editorial platform developed by the + target="softw:fragmentarytextseditor"/>fragmentary texts + editor within Perseids, the editorial platform developed by the Perseus Project for collaborative annotation of classical source documents. Perseids facilitates the annotation of text reuses, the production of syntactic annotations, the alignment of multiple texts, and the production of digital commentaries on fragmentary works. This section also elaborates on the complex nature of fragmentary @@ -264,18 +264,18 @@ Perseus catalog.

- Perseids Platform -

The Perseids +

The Perseids Platform supports collaborative editing, annotation, and publication - of born-digital editions of source documents in the classics.. - Perseids is not one single application but an integrated environment built from a loose coupling of heterogeneous tools and services from a variety of sources. - The development of the Perseids platform was inspired and motivated by the + The development of the Perseids platform was inspired and motivated by the work of several pre-existing projects: the Tufts Miscellany Collection at Tufts University,. and the Papyri.info project. (Almas - and Beaulieu 2013). The Son of SUDA Online (SoSOL) application sits at the core - of the Perseids platform. SoSOL is a ). The Son of SUDA Online (SoSOL) application sits at the core + of the Perseids platform. SoSOL is a Ruby on Rails. + type="cit:soft.url" ref="#R15">. application, originally developed by the Papyri.info project, that serves as front end for - a GitGit. repository of documents, metadata, and annotations. It includes a workflow engine that enables documents and data of different types to pass through flexible review and approval processes. The SoSOL application + xml:id="R17" target="softw:sosol"/>SoSOL application includes user interfaces for editing XML documents, metadata, and annotations. While it does not include a full-featured XML editor, it supports alternative text-based input of XML markup, and can enforce XML schema validation rules on the documents being edited.

@@ -870,7 +870,7 @@

The editorial board provides the final review for each file. EpiDoc-encoded DFHG files are being progressively added to the DFHG GitHub repository. for everyone to download, improve, and share in accordance with our 28(4): 493–503. doi:10.1093/llc/fqt046. - Almas, - Bridget, and Monica + Almas, + Bridget, and Monica Berti. 2013. Perseids Collaborative Platform for Annotating Text Re-uses of Fragmentary Authors. In DH-Case 2013. Proceedings of the 1st International Workshop on diff --git a/data/JTEI/8_2014-15/jtei-8-blanco-source.xml b/data/JTEI/8_2014-15/jtei-8-blanco-source.xml index 0bb15ad..0aaeae7 100644 --- a/data/JTEI/8_2014-15/jtei-8-blanco-source.xml +++ b/data/JTEI/8_2014-15/jtei-8-blanco-source.xml @@ -422,31 +422,31 @@ RBDMS model, seeing that most of the online repertoires are built on similar systems. The problem lay in choosing a combined system to integrate both types of information: E-R relational structures and text marked up with XML. We compared two of the most popular E-R - database systems: <ptr type="software" xml:id="MySQL" target="#MySQL"/><rs - type="soft.name" ref="#MySQL">MySQL</rs> and Oracle9i XML. Both of them offered the + database systems: <ptr type="software" xml:id="MySQL" target="softw:MySQL"/><rs + type="cit:soft.name" ref="#MySQL">MySQL</rs> and Oracle9i XML. Both of them offered the possibility of adding an XMLType data column, thus offering the possibility of introducing hierarchical structures inside the relational model via the provision of XML nesting structures. The combination of both systems offers great advantages, such as the possibility of making combined queries with <ptr type="software" xml:id="SQL" - target="#SQL"/><rs type="soft.name" ref="#SQL">SQL</rs> and XPath languages.<note>The - disadvantage of <ptr type="software" xml:id="MySQL" target="#MySQL"/><rs - type="soft.name" ref="#MySQL">MySQL</rs> compared to Oracle is that <ptr - type="software" xml:id="MySQL" target="#MySQL"/><rs type="soft.name" ref="#MySQL" + target="softw:SQL"/><rs type="cit:soft.name" ref="#SQL">SQL</rs> and XPath languages.<note>The + disadvantage of <ptr type="software" xml:id="MySQL" target="softw:MySQL"/><rs + type="cit:soft.name" ref="#MySQL">MySQL</rs> compared to Oracle is that <ptr + type="software" xml:id="MySQL" target="softw:MySQL"/><rs type="cit:soft.name" ref="#MySQL" >MySQL</rs> does not offer validation for XML fields, as it considers them just as text fields, and there are no added XSD schemas or any kind of transformational - stylesheets, like <ptr type="software" xml:id="XSLT" target="#XSLT"/><rs - type="soft.name" ref="#XSLT">XSLT</rs>. TEI code has to be composed and validated + stylesheets, like <ptr type="software" xml:id="XSLT" target="softw:XSLT"/><rs + type="cit:soft.name" ref="#XSLT">XSLT</rs>. TEI code has to be composed and validated outside the database by using XML editors (we use Oxygen), since the <ptr - type="software" xml:id="MySQL" target="#MySQL"/><rs type="soft.name" ref="#MySQL" + type="software" xml:id="MySQL" target="softw:MySQL"/><rs type="cit:soft.name" ref="#MySQL" >MySQL</rs> RDBMS exploitation of XML technologies is limited to XPath and can only be recovered through the <ident>extractValue</ident> function of <ptr type="software" - xml:id="MySQL" target="#MySQL"/><rs type="soft.name" ref="#MySQL">MySQL</rs>, which + xml:id="MySQL" target="softw:MySQL"/><rs type="cit:soft.name" ref="#MySQL">MySQL</rs>, which shows important limitations. Oracle Database 9i and 10g, however, are much more powerful in that sense, as they combine their E-R nature with XML technologies, resulting in a much more advanced tool to work with both systems, hierarchical and relational, at the same time.</note></p> <p>This first trial model of ReMetCa, combining the <ptr type="software" xml:id="MySQL" - target="#MySQL"/><rs type="soft.name" ref="#MySQL">MySQL</rs> RDBMS and TEI XML fields, + target="softw:MySQL"/><rs type="cit:soft.name" ref="#MySQL">MySQL</rs> RDBMS and TEI XML fields, is already working online and can be visited at <ptr target="http://ww.remetca.uned.es"/>. To access the database, it is necessary to log in to <soCalled>Área de trabajo</soCalled> and then into <soCalled>base de datos</soCalled>, using a username and password that can @@ -455,8 +455,8 @@ designing the search engine.</p> <p>Oracle databases contain fields defined as <ident>XMLType</ident>, thanks to the XDB component. This component is usually installed by default, but its status can be checked - by using the following <ptr type="software" xml:id="SQL" target="#SQL"/><rs - type="soft.name" ref="#SQL">SQL</rs> statement: <eg><![CDATA[ + by using the following <ptr type="software" xml:id="SQL" target="softw:SQL"/><rs + type="cit:soft.name" ref="#SQL">SQL</rs> statement: <eg><![CDATA[ SQL> select comp_name, status from dba_registry where comp_name=’Oracle XML Database’; COMP_NAME ---------------------------------------------------------------------------------------------- @@ -468,19 +468,19 @@ </p> <p>In the case of ReMetCa, this <ident>XMLType</ident> field needs to appear only once, in the table <ident>Poema</ident>, which contains <gi>lg</gi> elements and their content, as - described above. Oracle <ptr type="software" xml:id="SQL" target="#SQL"/><rs - type="soft.name" ref="#SQL">SQL</rs> Developer also allows updates to contents with its + described above. Oracle <ptr type="software" xml:id="SQL" target="softw:SQL"/><rs + type="cit:soft.name" ref="#SQL">SQL</rs> Developer also allows updates to contents with its unique record view.</p> <figure xml:id="figure3"> <graphic url="images/jtei-8-blanco-figure-03-oracle-editor.png" height="555px" width="626px"/> - <head type="legend">Oracle <ptr type="software" xml:id="SQL" target="#SQL"/><rs - type="soft.name" ref="#SQL">SQL</rs> Developer</head> + <head type="legend">Oracle <ptr type="software" xml:id="SQL" target="softw:SQL"/><rs + type="cit:soft.name" ref="#SQL">SQL</rs> Developer</head> </figure> <p>Oracle stores XSD schemas into its system, so the XML introduced is validated. To test this property, we tried introducing an XSD schema for this <gi>lg</gi> element, adding attributes and constraints, and as figure 4 shows, it was perfectly validated. The system - also accepts <ptr type="software" xml:id="XSLT" target="#XSLT"/><rs type="soft.name" + also accepts <ptr type="software" xml:id="XSLT" target="softw:XSLT"/><rs type="cit:soft.name" ref="#XSLT">XSLT</rs> stylesheets. Oracle registers stylesheets and can apply them to any XMLType field.</p> <figure xml:id="figure4"> @@ -508,7 +508,7 @@ ontology.</p> <p>Although we have adopted Oracle because of its power to work with XML technologies and TEI, there are other open-source systems like <ptr type="software" xml:id="MySQL" - target="#MySQL"/><rs type="soft.name" ref="#MySQL">MySQL</rs> or PostgreSQL which offer + target="softw:MySQL"/><rs type="cit:soft.name" ref="#MySQL">MySQL</rs> or PostgreSQL which offer enough functions to implement a mixed system like the one proposed in this paper. Actually, it is predictable that both systems will include similar tools in the future to work with XMLType fields, as XML is becoming more and more widely used.</p> diff --git a/data/JTEI/8_2014-15/jtei-8-boschetti-source.xml b/data/JTEI/8_2014-15/jtei-8-boschetti-source.xml index 4ee11fd..abb456b 100644 --- a/data/JTEI/8_2014-15/jtei-8-boschetti-source.xml +++ b/data/JTEI/8_2014-15/jtei-8-boschetti-source.xml @@ -100,23 +100,23 @@ collaborative philology, which concerns the social activity of scholars focused on shared philological tasks. We discuss the technologies related to XML markup languages and the processing of marked-up documents. We describe the method used to design and implement the - <ptr type="software" xml:id="R41" target="#teicophilib"/><rs type="soft.name" ref="#R41" + <ptr type="software" xml:id="R41" target="softw:teicophilib"/><rs type="cit:soft.name" ref="#R41" >TeiCoPhiLib</rs>, outlining the design patterns as well as discussing general benefits of the overall architecture. Finally, we present case studies in which some components of - our library currently implemented in <ptr type="software" xml:id="R1" target="#Java"/><rs - type="soft.name" ref="#R1">Java</rs> have been used.</p> + our library currently implemented in <ptr type="software" xml:id="R1" target="softw:java"/><rs + type="cit:soft.name" ref="#R1">Java</rs> have been used.</p> </div> </front> <body> <div xml:id="introduction"> <head>Introduction</head> - <p>The <ptr type="software" xml:id="R42" target="#teicophilib"/><rs type="soft.name" + <p>The <ptr type="software" xml:id="R42" target="softw:teicophilib"/><rs type="cit:soft.name" ref="#R42">TeiCoPhiLib</rs> library is a collection of components currently implemented - in <ptr type="software" xml:id="R2" target="#Java"/><rs type="soft.name" ref="#R2" - >Java</rs> (<rs type="soft.ver" ref="#R2">JSR 270</rs>), which parses documents encoded + in <ptr type="software" xml:id="R2" target="softw:java"/><rs type="cit:soft.name" ref="#R2" + >Java</rs> (<rs type="cit:soft.ver" ref="#R2">JSR 270</rs>), which parses documents encoded according to a basic subset of TEI tags defined in an ODD file<note>The ODD files currently available can be downloaded from <ptr type="software" xml:id="R3" - target="#GitHub"/><rs type="soft.name" ref="#R3">GitHub</rs>: <ptr + target="softw:github"/><rs type="cit:soft.name" ref="#R3">GitHub</rs>: <ptr target="https://github.com/CoPhi"/>. The TEI schema we intend eventually to adopt conforms to the EpiDoc vocabulary, following the policy of the Perseus Catalog (<ref type="bibl" target="#crane14">Crane et al. 2014</ref>).</note> and creates an @@ -124,16 +124,16 @@ The overall architecture is based on the well known Model-View-Controller (MVC) pattern, which separates the representation of data from the rendering and management of the content for the sake of flexibility and reusability. <ptr type="software" xml:id="R43" - target="#teicophilib"/><rs type="soft.name" ref="#R43">TeiCoPhiLib</rs> maps the + target="softw:teicophilib"/><rs type="cit:soft.name" ref="#R43">TeiCoPhiLib</rs> maps the structured document onto an aggregation of objects. The library enables the visualization through a web browser by instantiating a collection of widgets rendered on the client through standard web technologies. Specifically, the server-side environment jointly processes data and visualization templates,<note>Facelets XML templates are used under the - <ptr type="software" xml:id="R4" target="#Java"/><rs type="soft.name" ref="#R4" + <ptr type="software" xml:id="R4" target="softw:Java"/><rs type="cit:soft.name" ref="#R4" >Java</rs> - <ptr type="software" xml:id="R40" target="#serverfaces"/><rs type="soft.name" ref="#R40" + <ptr type="software" xml:id="R40" target="softw:serverfaces"/><rs type="cit:soft.name" ref="#R40" >Server Faces</rs> - <rs type="soft.ver" ref="#R40">2.0</rs> specification.</note> and generates HTML pages + <rs type="cit:soft.ver" ref="#R40">2.0</rs> specification.</note> and generates HTML pages rendered on the client. Special components are devoted to monitoring the behavior and interactions among the objects generated from the input TEI documents.</p> <p>In distributed and collaborative environments, the maintenance of links and relations @@ -154,10 +154,10 @@ >Crane, Seales, and Terras 2009</ref>). In order to face this challenge, our approach exploits software engineering techniques illustrated in <ptr type="crossref" target="#designpatterns"/>, which explains the <ptr type="software" xml:id="R44" - target="#teicophilib"/><rs type="soft.name" ref="#R44">TeiCoPhiLib</rs> design + target="softw:teicophilib"/><rs type="cit:soft.name" ref="#R44">TeiCoPhiLib</rs> design patterns.</p> - <p>For this reason, the design of <ptr type="software" xml:id="R45" target="#teicophilib" - /><rs type="soft.name" ref="#R45">TeiCoPhiLib</rs> widely leverages the stand-off + <p>For this reason, the design of <ptr type="software" xml:id="R45" target="softw:teicophilib" + /><rs type="cit:soft.name" ref="#R45">TeiCoPhiLib</rs> widely leverages the stand-off approaches provided by the TEI Guidelines, that is, both the reference to plain text offsets and the reference to nodes denoted by the <att>xml:id</att> unique identifiers (<ref type="bibl" target="#tei15">TEI Consortium 2015</ref>; <ref type="bibl" @@ -186,27 +186,27 @@ open-source general-purpose framework <ref target="http://cocoon.apache.org/" >Cocoon</ref><note><ptr target="http://cocoon.apache.org/"/>.</note> and the native XML database <ref target="http://exist-db.org/"><ptr type="software" xml:id="R5" - target="#existdb"/><rs type="soft.name" ref="#R5">eXist-db</rs></ref><note><rs - type="soft.url" ref="#R5"> + target="softw:existdb"/><rs type="cit:soft.name" ref="#R5">eXist-db</rs></ref><note><rs + type="cit:soft.url" ref="#R5"> <ptr target="http://exist-db.org/"/></rs>.</note> deserve to be mentioned. Specifically for TEI-annotated documents, <ptr type="software" xml:id="R6" - target="#tustep"/> - <ref target="http://www.tustep.uni-tuebingen.de/tustep_eng.html"><rs type="soft.name" - ref="#R6">TUSTEP</rs></ref>,<note><rs type="soft.url" ref="#R6"><ptr + target="softw:tustep"/> + <ref target="http://www.tustep.uni-tuebingen.de/tustep_eng.html"><rs type="cit:soft.name" + ref="#R6">TUSTEP</rs></ref>,<note><rs type="cit:soft.url" ref="#R6"><ptr target="http://www.tustep.uni-tuebingen.de/tustep_eng.html"/></rs>.</note> - <ptr type="software" xml:id="R7" target="#teiboilerplate"/> - <ref target="http://dcl.ils.indiana.edu/"><rs type="soft.name" ref="#R7" + <ptr type="software" xml:id="R7" target="softw:teiboilerplate"/> + <ref target="http://dcl.ils.indiana.edu/"><rs type="cit:soft.name" ref="#R7" >TEIBoilerplate</rs> - </ref>,<note><rs type="soft.url" ref="#R7"><ptr target="http://dcl.ils.indiana.edu/" + </ref>,<note><rs type="cit:soft.url" ref="#R7"><ptr target="http://dcl.ils.indiana.edu/" /></rs>.</note> - <ptr type="software" xml:id="R8" target="#txm"/> + <ptr type="software" xml:id="R8" target="softw:txm"/> <ref target="http://sourceforge.net/projects/txm/"> - <rs type="soft.name" ref="#R8">TXM</rs></ref>, <note><rs type="soft.url" ref="#R8"><ptr + <rs type="cit:soft.name" ref="#R8">TXM</rs></ref>, <note><rs type="cit:soft.url" ref="#R8"><ptr target="http://sourceforge.net/projects/txm/"/></rs>. </note> and <ptr - type="software" xml:id="R9" target="#tapas"/><ref target="http://tapasproject.org/"><rs - type="soft.name" ref="#R9">TAPAS</rs></ref> + type="software" xml:id="R9" target="softw:tapas"/><ref target="http://tapasproject.org/"><rs + type="cit:soft.name" ref="#R9">TAPAS</rs></ref> <note> - <rs type="soft.url" ref="#R9"><ptr target="http://tapasproject.org/"/></rs>.</note> are + <rs type="cit:soft.url" ref="#R9"><ptr target="http://tapasproject.org/"/></rs>.</note> are prominent projects.</p> <p> For all of these initiatives, the transformation from an XML document structure to another format by XSLT can be considered the focal point.</p> @@ -251,16 +251,16 @@ <item>results are validated by domain expert collaborations and test-driven development (both unit tests and acceptance tests)</item> </list>. The continuous integration and release are supported by open source Integrated - Development Environments (IDEs) like <ptr type="software" xml:id="R10" target="#eclipse"/> - <rs type="soft.name" ref="#R10">Eclipse</rs> or <ptr type="software" xml:id="R11" - target="#netbeans"/> - <rs type="soft.name" ref="#R11">NetBeans</rs> and by a software configuration management - tool such as <ptr type="software" xml:id="R13" target="#svn"/> - <rs type="soft.name" ref="#R13">SVN</rs> or <ptr type="software" xml:id="R12" - target="#git"/> - <rs type="soft.name" ref="#R12">Git</rs> for versioning and revision control.</p> + Development Environments (IDEs) like <ptr type="software" xml:id="R10" target="softw:eclipse"/> + <rs type="cit:soft.name" ref="#R10">Eclipse</rs> or <ptr type="software" xml:id="R11" + target="softw:netbeans"/> + <rs type="cit:soft.name" ref="#R11">NetBeans</rs> and by a software configuration management + tool such as <ptr type="software" xml:id="R13" target="softw:svn"/> + <rs type="cit:soft.name" ref="#R13">SVN</rs> or <ptr type="software" xml:id="R12" + target="softw:git"/> + <rs type="cit:soft.name" ref="#R12">Git</rs> for versioning and revision control.</p> <p>The aforementioned paradigm is applied in the <ptr type="software" xml:id="R46" - target="#teicophilib"/><rs type="soft.name" ref="#R46">TeiCoPhiLib</rs> library by + target="softw:teicophilib"/><rs type="cit:soft.name" ref="#R46">TeiCoPhiLib</rs> library by <list rend="inline ordered"> <item>the implementation of a flexible importing and normalization module in the pre-processing phase, which ensures a coherent abstraction model of the @@ -284,18 +284,18 @@ </list>. It is important to point out that the new data structure is the result of transformations (by XSLT DOM transformations or SAX event-driven transformations) managed during the parsing process. Thus, the current implementation of the <ptr - type="software" xml:id="R47" target="#teicophilib"/><rs type="soft.name" ref="#R47" + type="software" xml:id="R47" target="softw:teicophilib"/><rs type="cit:soft.name" ref="#R47" >TeiCoPhiLib</rs> exposes methods that parse the XML file and create <ptr - type="software" xml:id="R14" target="#Java"/><rs type="soft.name" ref="#R14">Java</rs> + type="software" xml:id="R14" target="softw:java"/><rs type="cit:soft.name" ref="#R14">Java</rs> objects. The resources are stored and maintained in a native XML database management - system (i.e., <ptr type="software" xml:id="R15" target="#existdb"/><rs type="soft.name" + system (i.e., <ptr type="software" xml:id="R15" target="softw:existdb"/><rs type="cit:soft.name" ref="#R15">eXist-db</rs>). The APIs and services provided by <ptr type="software" - xml:id="R55" target="#lucene"/><rs type="soft.name" ref="#R55">Lucene</rs>, a software + xml:id="R55" target="softw:lucene"/><rs type="cit:soft.name" ref="#R55">Lucene</rs>, a software library developed and hosted by the Apache Foundation, have been used for indexing the textual data.</p> <p>For instance, the information conveyed by the following TEI snippet is distributed - among the appropriate <ptr type="software" xml:id="R16" target="#Java"/><rs - type="soft.name" ref="#R16">Java</rs> objects that handle the four levels described + among the appropriate <ptr type="software" xml:id="R16" target="softw:java"/><rs + type="cit:soft.name" ref="#R16">Java</rs> objects that handle the four levels described above: <egXML xmlns="http://www.tei-c.org/ns/Examples" valid="true"> <div type="chapter" n="1" style="font-variant:normal"> [...] <p xml:lang="ita"> <lb n="1"/>Io nacqui veneziano ai 18 ottobre del 1775, giorno <lb n="2" @@ -317,7 +317,7 @@ <item><emph>Style</emph>. The style is managed by separated renderers, which point to textual positions affected by stylistic features. For instance, the information extracted from the <att>style</att> attribute is used to instantiate the <ptr - type="software" xml:id="R17" target="#Java"/><rs type="soft.name" ref="#R17" + type="software" xml:id="R17" target="softw:java"/><rs type="cit:soft.name" ref="#R17" >Java</rs> objects devoted to managing the rendering information.</item> <item><emph>Behavior</emph>. Behaviors are handled by objects that process textual resources according to the current state of the data structure and the rules to manage @@ -353,7 +353,7 @@ of the most suitable algorithm for the current task. The general idea of object-oriented patterns is to encapsulate functionality and data inside an efficient and flexible collection of classes. The current implementation of the prototype exploits the <ptr - type="software" xml:id="R17" target="#Java"/><rs type="soft.name" ref="#R17">Java</rs> + type="software" xml:id="R17" target="softw:java"/><rs type="cit:soft.name" ref="#R17">Java</rs> programming language technologies.</p> <list rend="ordered"> <item>The <emph>Model-View-Controller</emph> (MVC) pattern (<ref type="bibl" @@ -406,7 +406,7 @@ <figure xml:id="figure1"> <graphic url="images/jtei-8-boschetti_01.png" width="3176px" height="1196px"/> <head type="legend">Class diagram of the Observer pattern designed for the <ptr - type="software" xml:id="R48" target="#teicophilib"/><rs type="soft.name" + type="software" xml:id="R48" target="softw:teicophilib"/><rs type="cit:soft.name" ref="#R48">TeiCoPhiLib</rs></head> </figure> <p>Several modules of the library need synchronized data. In particular, annotations @@ -424,13 +424,13 @@ </list>. These two entities provide the flexibility to implement a decoupled notification mechanism. The Subject provides a registration procedure for the Observer object, and the Observer object provides a standard method allowing the - Subject to notify it. <ptr type="software" xml:id="R49" target="#teicophilib"/><rs - type="soft.name" ref="#R49">TeiCoPhiLib</rs> defines objects that can change, such + Subject to notify it. <ptr type="software" xml:id="R49" target="softw:teicophilib"/><rs + type="cit:soft.name" ref="#R49">TeiCoPhiLib</rs> defines objects that can change, such as the Document data type, and objects that need to be notified, such as the Annotation or the Comment data types. Consequently, the Document class implements the Subject interface, whereas the Annotation and Comment classes implement the Observer interface. The following simplified <ptr type="software" xml:id="R18" - target="#Java"/><rs type="soft.name" ref="#R18">Java</rs> snippet illustrates this + target="softw:java"/><rs type="cit:soft.name" ref="#R18">Java</rs> snippet illustrates this concept programmatically. <eg> Observer annotation = new Annotation(); Observer comment = new Comment(); Subject teiDocument = new Document(); teiDocument.subscribe(ObserverType.ANNOTATION, annotation); @@ -450,13 +450,13 @@ <figure xml:id="figure2"> <graphic url="images/jtei-8-boschetti_02.png" height="1775px" width="3759px"/> <head type="legend">Class diagram of the Visitor pattern designed for the <ptr - type="software" xml:id="R50" target="#teicophilib"/><rs type="soft.name" + type="software" xml:id="R50" target="softw:teicophilib"/><rs type="cit:soft.name" ref="#R50">TeiCoPhiLib</rs></head> </figure> <p>The hierarchical nature of the document representation facilitates the data structure traversal in a flexible and customizable way. The Visitor pattern provides a mechanism to extend the functionality of the <ptr type="software" xml:id="R51" - target="#teicophilib"/><rs type="soft.name" ref="#R51">TeiCoPhiLib</rs> by + target="softw:teicophilib"/><rs type="cit:soft.name" ref="#R51">TeiCoPhiLib</rs> by allowing components to perform a client-supplied operation on each node of the document hierarchy. <ptr type="crossref" target="#figure2"/> shows how a client of the data model can traverse the document tree in order to write its textual content. @@ -469,11 +469,11 @@ </item> </list> <p>An example should clarify the aforementioned architecture. The client application that - uses <ptr type="software" xml:id="R52" target="#teicophilib"/><rs type="soft.name" + uses <ptr type="software" xml:id="R52" target="softw:teicophilib"/><rs type="cit:soft.name" ref="#R52">TeiCoPhiLib</rs> APIs invokes the building method of the abstract Builder class. Moreover, the resulting document object is a concretization of an abstract class representing the current structure of the TEI-encoded resource, as illustrated in the - following <ptr type="software" xml:id="R19" target="#Java"/><rs type="soft.name" + following <ptr type="software" xml:id="R19" target="softw:java"/><rs type="cit:soft.name" ref="#R19">Java</rs> statement: <eg> Document teiDocument = AbstractBuilderFactory.buildDocument(new File("features.properties"),new File("teiDocument.xml")); </eg> @@ -511,11 +511,11 @@ <p>The case studies illustrated below have been implemented with the components already developed for our library.</p> <div xml:id="euporia"> - <head><ptr type="software" xml:id="R23" target="#euporiawebapp"/> - <rs type="soft.name" ref="#R23">Euporia</rs>: Visualization, Editing, and Annotation of + <head><ptr type="software" xml:id="R23" target="softw:euporiawebapp"/> + <rs type="cit:soft.name" ref="#R23">Euporia</rs>: Visualization, Editing, and Annotation of Parallel Texts for Didactic Purposes</head> - <p><ptr type="software" xml:id="R24" target="#euporiawebapp"/> - <rs type="soft.name" ref="#R24">Euporia</rs> is a project aimed at visualizing, editing, + <p><ptr type="software" xml:id="R24" target="softw:euporiawebapp"/> + <rs type="cit:soft.name" ref="#R24">Euporia</rs> is a project aimed at visualizing, editing, and annotating bilingual texts displayed in parallel. The original digital resources are stored and maintained in authoritative digital libraries available online, such as the Biblioteca Italiana and the Perseus Digital Library, or they are downloaded from social @@ -616,19 +616,19 @@ </row> </table> <p>Parallel texts are visualized and managed through <ptr type="software" xml:id="R20" - target="#euporiawebapp"/> - <rs type="soft.name" ref="#R20">EuporiaWebApp</rs> (<ptr type="crossref" + target="softw:euporiawebapp"/> + <rs type="cit:soft.name" ref="#R20">EuporiaWebApp</rs> (<ptr type="crossref" target="#figure4"/>), which is a server-side <ptr type="software" xml:id="R21" - target="#Java"/><rs type="soft.name" ref="#R21">Java</rs> web application compliant - with the <rs type="soft.ver" ref="#R21">JSR 314</rs> specification intended for + target="softw:java"/><rs type="cit:soft.name" ref="#R21">Java</rs> web application compliant + with the <rs type="cit:soft.ver" ref="#R21">JSR 314</rs> specification intended for educational purposes. Students, the end users of <ptr type="software" xml:id="R22" - target="#euporiawebapp"/> - <rs type="soft.name" ref="#R22">Euporia</rs>, are allowed to query texts, both jointly + target="softw:euporiawebapp"/> + <rs type="cit:soft.name" ref="#R22">Euporia</rs>, are allowed to query texts, both jointly and independently, through multilingual or monolingual keywords.</p> <figure xml:id="figure4"> <graphic url="images/jtei-8-boschetti_04.png" height="788px" width="1303px"/> - <head type="legend"><ptr type="software" xml:id="R25" target="#euporiawebapp"/> - <rs type="soft.name" ref="#R25">Euporia</rs></head> + <head type="legend"><ptr type="software" xml:id="R25" target="softw:euporiawebapp"/> + <rs type="cit:soft.name" ref="#R25">Euporia</rs></head> </figure> <p>Annotations can also be associated with linked chunks of text (such as a sentence and its translation: see <ptr type="crossref" target="#figure5"/>) or with single, @@ -639,18 +639,18 @@ <figure xml:id="figure5"> <graphic url="images/jtei-8-boschetti_05.png" height="575px" width="998px"/> <head type="legend">Annotation in <ptr type="software" xml:id="R26" - target="#euporiawebapp"/> - <rs type="soft.name" ref="#R26">Euporia</rs></head> + target="softw:euporiawebapp"/> + <rs type="cit:soft.name" ref="#R26">Euporia</rs></head> </figure> </div> <div xml:id="aporia"> - <head><ptr type="software" xml:id="R27" target="#aporia"/> - <rs type="soft.name" ref="#R27">Aporia</rs>: Adapting the Parallel Text Framework to + <head><ptr type="software" xml:id="R27" target="softw:aporia"/> + <rs type="cit:soft.name" ref="#R27">Aporia</rs>: Adapting the Parallel Text Framework to Specific Scientific Requirements</head> - <p><ptr type="software" xml:id="R28" target="#aporia"/> - <rs type="soft.name" ref="#R28">Aporia</rs> is the enhanced version of <ptr - type="software" xml:id="R29" target="#euporiawebapp"/> - <rs type="soft.name" ref="#R29">Euporia</rs>, intended for research purposes. + <p><ptr type="software" xml:id="R28" target="softw:aporia"/> + <rs type="cit:soft.name" ref="#R28">Aporia</rs> is the enhanced version of <ptr + type="software" xml:id="R29" target="softw:euporiawebapp"/> + <rs type="cit:soft.name" ref="#R29">Euporia</rs>, intended for research purposes. Accordingly, the Parallel Text framework has been adapted and extended to meet specific scientific requirements (<ref type="bibl" target="#bozzi13">Bozzi 2013</ref>). An experimental case study has been performed on Theodor Mommsen’s edition of the <title @@ -672,32 +672,32 @@ application.</p> <figure xml:id="figure6"> <graphic url="images/jtei-8-boschetti_06.png" height="414px" width="1390px"/> - <head type="legend"><ptr type="software" xml:id="R30" target="#aporia"/> - <rs type="soft.name" ref="#R30">Aporia</rs></head> + <head type="legend"><ptr type="software" xml:id="R30" target="softw:aporia"/> + <rs type="cit:soft.name" ref="#R30">Aporia</rs></head> </figure> </div> <div xml:id="saussure"> - <head><ptr type="software" xml:id="R32" target="#saussure"/> - <rs type="soft.name" ref="#R32">Saussure Project</rs>: Supporting Genetic + <head><ptr type="software" xml:id="R32" target="softw:saussure"/> + <rs type="cit:soft.name" ref="#R32">Saussure Project</rs>: Supporting Genetic Criticism</head> - <p>The <ptr type="software" xml:id="R33" target="#saussure"/> - <rs type="soft.name" ref="#R33">Saussure Project</rs> exploits the flexibility of <ptr - type="software" xml:id="R31" target="#aporia"/> - <rs type="soft.name" ref="#R31">Aporia</rs> in order to adapt the system to the study of + <p>The <ptr type="software" xml:id="R33" target="softw:saussure"/> + <rs type="cit:soft.name" ref="#R33">Saussure Project</rs> exploits the flexibility of <ptr + type="software" xml:id="R31" target="softw:aporia"/> + <rs type="cit:soft.name" ref="#R31">Aporia</rs> in order to adapt the system to the study of Saussurean autographs, making author’s variants searchable and creating multilingual indexes of ancient terms studied by the linguist.</p> <p>Instead of showing linked texts in parallel, the system shows the image of the manuscript and the related transcription (<ptr type="crossref" target="#figure7"/>).</p> <figure xml:id="figure7"> <graphic url="images/jtei-8-boschetti_07.png" height="743px" width="1430px"/> - <head type="legend"><ptr type="software" xml:id="R34" target="#saussure"/> - <rs type="soft.name" ref="#R34">Saussure Project</rs></head> + <head type="legend"><ptr type="software" xml:id="R34" target="softw:saussure"/> + <rs type="cit:soft.name" ref="#R34">Saussure Project</rs></head> </figure> </div> </div> <div xml:id="conclusion"> <head>Conclusion</head> - <p>The <ptr type="software" xml:id="R53" target="#teicophilib"/><rs type="soft.name" + <p>The <ptr type="software" xml:id="R53" target="softw:teicophilib"/><rs type="cit:soft.name" ref="#R53">TeiCoPhiLib</rs> is a work in progress focused on the creation of a library of software components aimed at managing a limited subset of TEI tags used in the domain of collaborative philology. Because of the increasing complexity of annotations and the @@ -710,18 +710,18 @@ <p>Reusable software components promote the management of stand-off annotation at any level (such as editing, searching, or visualizing), improving the experience of the annotation and use of TEI documents.</p> - <p>The document parsing in the current <ptr type="software" xml:id="R35" target="#Java"/><rs - type="soft.name" ref="#R35">Java</rs> implementation takes place on the server side, - where the <ptr type="software" xml:id="R36" target="#Java"/><rs type="soft.name" + <p>The document parsing in the current <ptr type="software" xml:id="R35" target="softw:java"/><rs + type="cit:soft.name" ref="#R35">Java</rs> implementation takes place on the server side, + where the <ptr type="software" xml:id="R36" target="softw:java"/><rs type="cit:soft.name" ref="#R36">Java</rs> virtual machine runs within the web application environment.</p> <p> The marshalling and unmarshalling process handles the serialization of the object representation of the TEI document, in order to store and retrieve data on the filesystem - or in native XML databases, such as <ptr type="software" xml:id="R37" target="#existdb" - /><rs type="soft.name" ref="#R37">eXist-db</rs>.</p> + or in native XML databases, such as <ptr type="software" xml:id="R37" target="softw:existdb" + /><rs type="cit:soft.name" ref="#R37">eXist-db</rs>.</p> <p>Performance measurement tools such as JMeter will help to optimize the performance of the library components.</p> <p> Software currently under development will be available on <ptr type="software" - xml:id="R38" target="#GitHub"/><rs type="soft.name" ref="#R38">GitHub</rs> at <ptr + xml:id="R38" target="softw:github"/><rs type="cit:soft.name" ref="#R38">GitHub</rs> at <ptr target="https://github.com/CoPhi/cophilib"/>.</p> </div> </body> @@ -736,9 +736,9 @@ Environment: Metadata, Vocabularies and Techniques in the Digital Humanities, article no. 11. New York: ACM. doi:10.1145/2517978.2517990. - - - Bozzi, + + + Bozzi, Andrea. 2013. G2A: A Web Application to Study, Annotate and Scholarly Edit Ancient Texts and Their Aligned Translations. Studia graeco-arabica diff --git a/data/JTEI/8_2014-15/jtei-8-ciotti-source.xml b/data/JTEI/8_2014-15/jtei-8-ciotti-source.xml index 6645de9..e2fda20 100644 --- a/data/JTEI/8_2014-15/jtei-8-ciotti-source.xml +++ b/data/JTEI/8_2014-15/jtei-8-ciotti-source.xml @@ -79,19 +79,6 @@ humanities and social sciences, open to quality periodicals looking to publish full-text articles online.

- - -

In the context of this project, private URIs with the prefix softw point to software - items in the software-list.xml file, which are encoded with item elements and - identified in xml:id.

-
- -

In the context of this project, private URIs with the prefix cit point to - category elements in the citation-taxonomy.xml file.

-
-
diff --git a/data/JTEI/8_2014-15/jtei-8-dumont-source.xml b/data/JTEI/8_2014-15/jtei-8-dumont-source.xml index 0c99d96..d24c835 100644 --- a/data/JTEI/8_2014-15/jtei-8-dumont-source.xml +++ b/data/JTEI/8_2014-15/jtei-8-dumont-source.xml @@ -81,7 +81,7 @@ bottom up usability graphical user interface - ediarum @@ -94,19 +94,19 @@

This article shows how, with a focus on user-friendliness and a bottom-up approach, the - digital work environment ediarum was successfully developed. With ediarum was successfully developed. With ediarum, researchers can comfortably encode and edit in TEI, as well as publish their results in an online or print edition. This solution, developed by the TELOTA initiative of Berlin-Brandenburg Academy of Sciences and Humanities, is based on three - software components: eXistdb, Oxygen XML Author, and eXistdb, Oxygen XML Author, and ConTeXt. These components are combined, supplemented with additional functions, and tailored to fit a project’s needs. After a pilot run, ediarum has been implemented for + target="softw:ediarum"/>ediarum has been implemented for multiple internal and external research projects. The experience that was gained and our self developed program components are available to the Digital Humanities community.

@@ -123,7 +123,7 @@ accessibility and reusability. Furthermore, both XML and TEI are well-established technologies—the TEI Guidelines having been in use for over 25 years. Nevertheless, in Germany as well as internationally, edition projects continue to use programs such as - Microsoft Word to edit their texts. These texts are used to create print editions and may be digitized in a simple form such as PDF.

There is thus a gap between the available, useful technologies and their actual @@ -143,19 +143,19 @@ of them support only the publication of TEI documents and not their creation (see, e.g., Pape, Schöch, and Wegner 2012). Tools that aim to support the actual transcription and editing of texts are less common. One - such tool is from the - Teuchos project, which provides a basic user - interface for certain parts of the manuscript description (Vertan and Reimers 2012). + such tool is from the + Teuchos project, which provides a basic user + interface for certain parts of the manuscript description (Vertan and Reimers 2012). In this case the data is not edited directly in XML but saved in a text format. In addition, it is not possible to enter inline markup. A more functional solution is the - - CWRCWriter, which still requires the user to have - a good knowledge of TEI ( + CWRCWriter, which still requires the user to have + a good knowledge of TEI ( - Rockwell et al. 2012 or Rockwell et al. 2012 or ). A factor contributing to the lack of such a user interface could be the high complexity of a TEI-encoded edition.

@@ -222,39 +222,39 @@

To develop such a solution from scratch requires significant resources. To reduce time and effort of development it is preferable to build on existing software. After evaluating the available software, the team chose the following programs: - - eXistdbeXistdb. for the database in which the TEI files would be stored. A decisive factor was the ability to retrieve and - run XQuery and XSLT scripts from within the - eXistdb and not just externally. + run XQuery and XSLT scripts from within the + eXistdb and not just externally. - Oxygen XML Author - + Oxygen XML Author + . as the user interface for entering and editing the centrally stored TEI files. Decisive were the - extensive functions of - Oxygen XML Author that allowed visual editing of + extensive functions of + Oxygen XML Author that allowed visual editing of XML for persons without technical knowledge. ConTeXt - ConTeXt + . for the formatting language to generate an automatic print-ready proof from the XML files. Among its benefits were its use of the long-established formatting language TeX, the possibility of processing XML directly, and the option to add one’s own functions with the help of the programming language Lua.

-

Although eXistdb and ConTeXt are open source software, this is not the case - for - Oxygen XML Author. A completely open-source solution +

Although eXistdb and ConTeXt are open source software, this is not the case + for + Oxygen XML Author. A completely open-source solution would have been preferable, but there is no software currently available that offers such extensive functions for end users.

Because of the solution’s modular organization, the software components can be replaced @@ -271,40 +271,40 @@ generally suitable for the research project, but customized to increase the user-friendliness of the work environment despite the complexity behind a critical edition. Besides the creation of multiple TEI P5–compatible schemata, the functions and - toolbar of the - Oxygen XML Author were also configured and + toolbar of the + Oxygen XML Author were also configured and supplemented with additional custom functions. XQuery and XSLT scripts were + target="softw:xquery"/>XQuery and XSLT scripts were written for the Web publication and its presentation was designed in HTML and CSS. The automatic generation of the print edition was programmed with the formatting language ConTeXt.

A Central Database Enables Collaborative Work

The digital work environment uses the open-source XML database eXistdb as its + xml:id="R21" target="softw:existdb"/>eXistdb as its central repository for the TEI documents. The database is installed on a server and available online. This gives all the researchers on the project access to the same data collection, allowing them to work collaboratively. The contents of the database are directly available and searchable for users as a collection of files in - Oxygen XML Author . Collaborative access is made + type="software" xml:id="R22" target="softw:oxygenauthor"/> + Oxygen XML Author . Collaborative access is made possible through the WebDAV protocol, which locks files that are in use to avoid conflicts and unintended overwrites.

A User-Friendly Environment with - Oxygen XML Author + target="softw:oxygenauthor"/> + Oxygen XML Author -

- Oxygen XML Author was chosen to create the user +

+ Oxygen XML Author was chosen to create the user interface with which the editors transcribe and edit in the work environment.To this end we used Oxygen Frameworks: that is to say, we developed additional frameworks within the context offered by the program. - Oxygen XML delivers a standard framework for TEI + xml:id="R25" target="softw:oxygenauthor"/> + Oxygen XML delivers a standard framework for TEI documents that provides a toolbar with a few basic TEI elements. However, because of the large number of TEI elements and attributes, the development of an all-inclusive TEI Framework would not be practical. The editor would be overwhelmed with such a @@ -318,7 +318,7 @@ index elements.

- Toolbar for the Schleiermacher edition + Toolbar for the Schleiermacher edition with functions for inserting TEI markup

Above all, however, the end user can enter markup into the manuscript or into the TEI @@ -347,28 +347,28 @@ of the person (as an attribute) to be entered. The index with the actual references to the correct text passages is then automatically created for both the print and Web publications from the marked-up TEI document. This index function is created through a - specially-programmed - Java operation for - Oxygen XML Author and with various + Java operation for + Oxygen XML Author and with various XQuery scripts in the database. When programming the - Java operation, we used parameters and not + xml:id="R28" target="softw:java"/> + Java operation, we used parameters and not constants in the source code. Thus, the parameter values could be comfortably entered in - Oxygen XML Author and changed if necessary.

- Customized - Oxygen XML Author with the index function’s + Customized + Oxygen XML Author with the index function’s dialogue box opened
Website

In addition to the work environment in Oxygen XML Author , we + target="softw:oxygenauthor"/>Oxygen XML Author , we also created a website for the project. On this website the researchers can easily browse through or search the current data collection through a live connection with the database. Access to the website can be either limited to the project team or open to the @@ -378,19 +378,19 @@ Website for the Schleiermacher edition

From a technical viewpoint, the website consists primarily of - XQuery scripts, whose request results are presented + xml:id="R32" target="softw:xquery"/> + XQuery scripts, whose request results are presented as HTML5, through an XSL transformation. The scripts and transformations are requested - from eXist through a REST interface and processed with the eXistdb internal parser—a great advantage of this database. The website is thus generated completely server-side.

Print Edition

A further output option is the print edition, implemented with the help of ConTeXt , which automatically generates a PDF (at any point in the workflow) from a TEI document. With the correct configuration, the format and presentation of the PDF can meet the research project’s exact needs for printed editions. Each TEI element @@ -402,7 +402,7 @@

Print edition for the Schleiermacher edition generated via ConTeXt
@@ -451,28 +451,28 @@ on the complexity and diversity of TEI-encoded editions.

With every new implementation, the work environment developed further and was supplemented with new functions. This applied not just to the Java operations but - also to the Java operations but + also to the XQuery scripts, which are responsible for various fundamental tasks such as making the index file available for the Java operations. For these - frequently used XQuery and XSLT scripts and for further configurations, an Java operations. For these + frequently used XQuery and XSLT scripts and for further configurations, an eXistdb applicationFor further information about apps in eXistdb , see Getting Started with Web Application Development, . was developed in 2013 that allows for a simple and fast installation into the database. - This eXistdb app will soon be available online. New functions are then, when suitable, integrated into the already existing work environment so that after the launch of the work environment the operability continually improves.

After we published a report about report about ediarum,Stefan Dumont and Martin Fechner, Digitale Arbeitsumgebung für das Editionsvorhaben <q>Schleiermacher in Berlin 1808–1834</q>, .</bibl></note> it caught the attention of other research institutions. During this time TELOTA introduced and explained the environment to interested institutions in various presentations. As a result, for example, the Academy of Science and Literature Mainz - adopted the concept of <ptr type="software" xml:id="R56" target="#ediarum"/><rs - type="soft.name" ref="#R56">ediarum</rs> and built their own digital work environment + adopted the concept of <ptr type="software" xml:id="R56" target="softw:ediarum"/><rs + type="cit:soft.name" ref="#R56">ediarum</rs> and built their own digital work environment based on it. Other interested persons can read about the concept in a series of blog entries that are written as tutorials.<note><bibl><author>Stefan Dumont</author>, <title level="a">Tutorial: Wie baue ich ein eigenes Framework für Oxygen XML?, @@ -491,25 +491,25 @@ Geisteswissenschaften (October 30, 2013), , and Stefan Dumont, Tutorial: - Indexfunktionen für <ptr type="software" xml:id="R43" target="#oxygen"/><rs - type="soft.name" ref="#R43">Oxygen XML</rs> Frameworks, <rs + type="cit:soft.name" ref="#R43">Oxygen XML</rs> Frameworks, digiversity: Webmagazin für Informationstechnologie in den Geisteswissenschaften(December 16, 2013), . In addition, the custom Java functions were made available + target="softw:java"/>Java functions were made available by TELOTA on GitHub for the TEI community.; for documentation, see . There the functions can be downloaded and used directly in Oxygen Frameworks or, if necessary, changed in accordance with theGNU LGPL (Lesser General Public License).

At the end of 2013, TELOTA began implementing ediarum for the ediarum for the historical-critical edition of Jeremias Gotthelf.Jeremias Gotthelf: Historisch-kritische Gesamtausgabe (HKG), Forschungsstelle Jeremias Gotthelf, @@ -517,13 +517,13 @@ />. In this collaboration with Bern University, new functions are being developed and existing ones improved. For instance, we plan to build a function to allow end users to insert overlapping markup (using, e.g., anchor elements and spanTo) - in Oxygen XML Author .

Summary and Prospects -

The bottom-up development of ediarum with its strong focus on user-friendliness +

The bottom-up development of ediarum with its strong focus on user-friendliness brings with it certain implications.

It has required a lot of work during the initial implementation and customization of the environment, especially to generate the website and print edition. TELOTA is now @@ -531,10 +531,10 @@ needed for programming. The goal, however, will be not to develop a complete plug-and-play solution. That would be impossible, considering the complexity and diversity of critical editions. The aim for a future version of ediarum will be rather to create + target="softw:ediarum"/>ediarum will be rather to create a solution for encoding editions in TEI which involves the lowest possible amount of time and effort for programming while being customized to the specific needs of a project.

-

The concept of The concept of ediarum’s development with its tailored solution has various advantages: Time and effort for programming is spared through the consistent use of existing @@ -550,7 +550,7 @@

Thus, through addressing the needs of a specific project, we found a solution that can be - used by many: ediarum helps bridge the gap between hesitant users and the many possibilities and advantages of TEI encoding.

@@ -568,25 +568,25 @@ Nielsen, Jakob. 1993. Usability Engineering. Boston: Academic Press. - - - Pape, - Sebastian, Christof - Schöch, and Lutz - Wegner. 2012. <rs type="soft.name" + <ptr type="software" xml:id="R48" target="softw:teichi"/> + <rs type="cit:soft.bib.ref" ref="#R48"> + <bibl xml:id="pape12"><rs type="cit:soft.agent" ref="#R48"><author>Pape, + Sebastian</author></rs>, <rs type="cit:soft.agent" ref="#R48"><author>Christof + Schöch</author></rs>, and <rs type="cit:soft.agent" ref="#R48"><author>Lutz + Wegner</author></rs>. <date>2012</date>. <title level="a"><rs type="cit:soft.name" ref="#R48">TEICHI</rs> and the Tools Paradox. Journal of the Text Encoding Initiative 2 (February). . doi:10.4000/jtei.432. - - - Rockwell, - Geoffrey, Susan - Brown, James - Chartrand, and Susan - Hesemeier. 2012. <rs type="soft.name" + <ptr type="software" xml:id="R49" target="softw:cwrc"/> + <rs type="cit:soft.bib.ref" ref="#R49"> + <bibl xml:id="rockwell12"><rs type="cit:soft.agent" ref="#R49"><author>Rockwell, + Geoffrey</author></rs>, <rs type="cit:soft.agent" ref="#R49"><author>Susan + Brown</author></rs>, <rs type="cit:soft.agent" ref="#R49"><author>James + Chartrand</author></rs>, and <rs type="cit:soft.agent" ref="#R49"><author>Susan + Hesemeier</author></rs>. <date>2012</date>. <title level="a"><rs type="cit:soft.name" ref="#R49">CWRC-Writer</rs>: An In-Browser XML Editor. In Digital Humanities 2012: Conference Abstracts, edited by Jan Christoph Meister, 508–11. diff --git a/data/JTEI/8_2014-15/jtei-8-iglesia-source.xml b/data/JTEI/8_2014-15/jtei-8-iglesia-source.xml index a1d0c5e..0c53096 100644 --- a/data/JTEI/8_2014-15/jtei-8-iglesia-source.xml +++ b/data/JTEI/8_2014-15/jtei-8-iglesia-source.xml @@ -105,8 +105,8 @@ them online, mostly under free licenses. Scholars all over the world are now able to use huge datasets for further research. There are now many digital editions available, but only a few tools to analyze them. This article explores how web technologies (XML and - related technologies as well as JavaScript) can be used to enrich the forthcoming + related technologies as well as JavaScript) can be used to enrich the forthcoming edition of Theodor Fontane’s notebooks with data-driven visualizations of named entities and how at the same time applications can be built on these visualizations which are reusable for other edition projects in the TEI world. Because of the density and @@ -142,13 +142,13 @@ term hybrid edition indicates that it is going to be published both online, in open access form, and as a printed book by Walter de Gruyter. The term Virtual Research Environment in the title of the project refers to TextGrid,TextGrid: A - Virtual Research Environment for the Humanities, TextGrid,TextGrid: A + Virtual Research Environment for the Humanities, which we use in combination with the oXygen XML EditorSyncro Soft, to + target="http://www.oxygenxml.com/">oXygen XML EditorSyncro Soft, to produce our TEI-encoded transcription as well as to store both the digital images and the XML files permanently. The project is funded by the German Research Foundation (DFG) and carried out by the Theodor Fontane Research Centre at Göttingen University, in cooperation @@ -162,7 +162,7 @@ (e.g., sourceDoc and surface) specified by the metadata specialist. This TEI code is then XSL-transformed into HTML and integrated into the project’s website by the IT specialist. This transformation can be invoked from within the TextGrid environment + xml:id="R4" target="softw:textgrid"/>TextGrid environment at any time. Thus, the HTML file not only is used for publication on the website, but also supports the editors in transcribing and encoding the text of the notebooks by providing an on-the-fly visualization of their work which is often easier to check for errors than @@ -296,14 +296,14 @@ historical figures? In the latter case, in which period did they live? Answering these questions will give us an idea of the different historical strata treated in this notebook. Our tool for this purpose will be the Timeline WidgetTimeline WidgetTimeline: Web Widget for Visualizing Temporal Data, . from the SIMILE (Semantic Interoperability of Metadata and Information in unLike Environments) collection of open source data visualization tools, - originally developed at MIT. A SIMILE Timeline consists of events, associated with + originally developed at MIT. A SIMILE Timeline consists of events, associated with either a point in time or a duration, which are plotted on a chronological, in this case horizontal, axis. If we want to visualize the persons mentioned in the notebook on such a timeline, and use their lifespans as durations, where do we get their birth and death @@ -317,9 +317,9 @@ rather than individuals, such as the German emperors or the Thuringian landgraves. Although the GND does provide records for some of these entities, they do not contain much useful information for further investigation.

-

The The SIMILE Timeline widget consists of an HTML document that uses JavaScript to process an XML file written in a simple XML markup language specific to SIMILE.

@@ -331,8 +331,8 @@ Example of SIMILE XML.
-

We have written an XSLT stylesheet in order to produce the HTML document +

We have written an XSLT stylesheet in order to produce the HTML document and at the same time generate the required XML data from the TEI code of our notebook. During the transformation, this stylesheet picks up the GND identifiers of all person entities in the TEI code and looks up the corresponding GND record online at the German @@ -374,8 +374,8 @@ than a notebook, but both contain a sufficient number of references to person entities that we can display on a timeline. To narrow the material down, we selected a single year’s worth of diary entries, for the year 1835, which is the last complete year within - the scope of Godwin’s diaries. The XSLT code to create the timeline has to be adjusted + the scope of Godwin’s diaries. The XSLT code to create the timeline has to be adjusted to the Godwin TEI code, though only slightly. The main difference is that birth and death dates are already contained within the person elements, and do not need to be retrieved from elsewhere. Again, a filter was applied for missing birth and death @@ -386,8 +386,8 @@ of writing.)

- SIMILE Timeline for William Godwin’s diary entries + SIMILE Timeline for William Godwin’s diary entries of the year 1835

The comparison of the two timelines shows a marked difference: while Fontane @@ -412,10 +412,10 @@ automatic systems to identify the historical places would not be feasible, as they are sometimes referred to in the notebooks by uncommon phrases which require human interpretation to match them to corresponding modern-day identifiers. Another XSLT + type="software" xml:id="R12" target="softw:XSLT"/>XSLT script is used to transform the TEI dataset to a Keyhole Markup Language (KML) file, the typical input format for geospatial visualization tools. The XSLT resolves the + xml:id="R13" target="softw:XSLT"/>XSLT resolves the IDs and retrieves the coordinates from the respective database. OSM is able to deliver polygons instead of geographic coordinates from a single URL in the resulting file: for example, for All Saints’ Church in Wittenberg or the Saint Augustine Monastery at @@ -423,14 +423,14 @@ information—KML—is also expressed as XML. This approach tries to provide a possible combination of place names that appear in the neighborhood of date elements. The geospatial-temporal visualization tool of our choice is the DARIAH + xml:id="R14" target="softw:dariahgeobrowser"/>DARIAH Geo-Browser,DARIAH-DE Konsortium, . which offers a timeline with a map interface together with different features for selecting data.

A notable feature is the use of historical maps to provide better context. The selection of historical maps in the Geo-Browser is still + target="softw:dariahgeobrowser"/>Geo-Browser is still limited, but it is also possible to load one’s own overlays. Furthermore, the data are presented in tabular form with a search function. In addition to the mandatory data—at least one place name with latitude and longitude—HTML code can be inserted in the KML @@ -439,10 +439,10 @@ hyper-references will appear directly in the geographical information system. We integrated an embedded version of this tool via html:iframe into our website which is built on an eXist - database.eXist + database.. An XQuery script executes the + target="softw:xquery"/>XQuery script executes the transformation, stores the KML file in the database, and generates the required html:iframe element. The html:src attribute value contains the parameters to control the Geo-Browser. A URL-encoded string which points to the @@ -461,8 +461,8 @@ terms like the Holy Roman Empire (HRR) or Thuringia, and the authority files we use are not able to provide the required data, especially not for the desired time.

The dates returned by this algorithm are also used to select a background map provided - by the Geo-Browser. The arithmetic mean of the temporal data + by the Geo-Browser. The arithmetic mean of the temporal data in this example is 1451.25 CE and 11 out of 17 dates with spatial reference specify the early sixteenth century. A suitable background map with the borders from 1492 can be selected via parameter @@ -472,8 +472,8 @@ selection for notebook C07.

- DARIAH-DE-Geo-Browser’s timeline with the data from + DARIAH-DE-Geo-Browser’s timeline with the data from notebook C07

Placenames corresponding to a selection in the timeline are displayed in a table below, @@ -493,14 +493,14 @@ visualizing social networks extracted from a TEI-encoded corpus (p. 271) consisting of biographic data. The interface is realized with a proprietary plug-in built upon the - PrefuseUC Berkeley Visualization - Lab, . + PrefuseUC Berkeley Visualization + Lab, . software library. One of our goals is to implement the aggregations within the digital edition, and for this we would like to use web technologies only. The D3.js (Data Driven Documents Javascript - library) created by Mike Bostock provides a + target="http://d3js.org/">D3.js (Data Driven Documents Javascript + library) created by Mike Bostock provides a framework for different visualizations. The list of examples. is a good starting @@ -510,14 +510,14 @@ target="http://bl.ocks.org/mbostock/4062045"/>. and the Hierarchical Edge Bundling. - examples. Again we use an XSL transformation to extract the data from our + examples. Again we use an XSL transformation to extract the data from our source.

The transformation script matches all entities and generates the required documents. The first document is the HTML file, which contains the needed JavaScript and - a reference to the external D3.js library. The second is a JSON file, which + xml:id="R23" target="softw:JavaScript"/>JavaScript and + a reference to the external D3.js library. The second is a JSON file, which contains one object per entity and one associated array per object that includes a list of connected entities. The tree-like structure of XML allows the transformation of any document to a network graph by selecting elements that share the same ancestor. The only @@ -529,8 +529,8 @@ the surface elements that are direct children of sourceDoc, because more than one surface may be part of a single page, for example where there are glued-in newspaper articles.

-

As in the Geo-Browser example above, we assume that the +

As in the Geo-Browser example above, we assume that the proximity of two references to entities suggests a semantic connection. Naturally, such a connection may also exist between two entity occurrences separated by a page break. Therefore, better criteria for connectedness could be proposed, such as co-occurrence @@ -558,8 +558,8 @@ headline is Thüringens Geschichte (History of Thuringia), which is also the topic of the following pages. The benefit of the network is that a major topic can be identified with a single view.

-

The output of this D3.js application is an SVG graphic which can be +

The output of this D3.js application is an SVG graphic which can be further transformed. svg:title elements are used to store the node names, which modern browsers should display on mouseover. To get a better overview of the entities in the notebook, the node names should actually be inserted as nodes, but since there is @@ -572,8 +572,8 @@ attribute contains the leaf number with a letter r for recto and v for a verso side and this attribute is transformed in a html:id, so we can go back from a single entity to the leaf of its first occurrence by generating a - hyperlink with the help of JavaScript. If this part is left out, the objects + hyperlink with the help of JavaScript. If this part is left out, the objects will be sorted in alphabetical order and the network will contain more edges to link those entities that co-occur on one page. Applying the hierarchy allows these edges to be deleted, because the categorization lets the nodes appear together and a bigger gap @@ -594,7 +594,7 @@ way to integrate this entity into the network might be the use of another apportionment, as the fact that two entities are referenced on the same page may be regarded as artificial. Only minor changes would have to be made to the XSLT code to use + xml:id="R28" target="softw:xslt"/>XSLT code to use other divisions of text, such as chapters. As notebooks are not typically organized in chapters, we can use paragraphs (encoded with milestone here) instead. Instead of one specific element, a distinctive number of blank surfaces in series can be @@ -628,12 +628,12 @@ and services. The additional effort required to make them work with the TEI data from the forthcoming edition of Theodor Fontane’s notebooks (and from the edition of William Godwin’s Diary) was minimal. The necessary scripting (mainly XSLT) and code + xml:id="R29" target="softw:xslt"/>XSLT) and code customization were easily carried out in addition to our regular work within the Fontane edition project. These efforts were facilitated by a spirit of openness shared by all - parties involved: both the D3.js library and the SIMILE Timeline widget are + parties involved: both the D3.js library and the SIMILE Timeline widget are open-source software released under a BSD license; the data sources GND, GeoNames, and OpenStreetMap have permissive licenses—Creative Commons Zero (CC0), Creative Commons Attribution (CC BY), and Open Data Commons Open Database License (ODbL), respectively; and diff --git a/data/JTEI/8_2014-15/jtei-8-intro-source.xml b/data/JTEI/8_2014-15/jtei-8-intro-source.xml index 6ed117b..f9199ff 100644 --- a/data/JTEI/8_2014-15/jtei-8-intro-source.xml +++ b/data/JTEI/8_2014-15/jtei-8-intro-source.xml @@ -186,11 +186,11 @@ Iglesia and Göbel’s work uses XML and related Web technologies to process semantic entities and produce data-driven visualizations of the forthcoming edition of Theodor Fontane’s notebooks as well as comparisons with other edition projects using TEI. Rosselli - Del Turco et al. describe EVT (Edition Visualization Technology), a TEI based + Del Turco et al. describe EVT (Edition Visualization Technology), a TEI based scholarly publishing tool under development since about 2010. Initially designed as a specific solution for the Digital Vercelli Book project, EVT has + type="software" xml:id="R2" target="softw:evt"/>EVT has evolved into a flexible tool to create Web-based digital editions incorporating transcription files encoded in TEI XML and digital facsimiles of the source texts. Portela and Rito Silva’s article presents the rationale and the technical approaches adopted for @@ -201,9 +201,9 @@ paper illustrates the user-friendly and bottom-up design principles that have guided the development of the ediarum TEI editing and publishing platform. Similarly, Boschetti and Del Grosso describe the design and development of <ptr type="software" xml:id="R3" target="#teicophilib"/><rs type="soft.name" + level="m"><ptr type="software" xml:id="R3" target="softw:teicophilib"/><rs type="cit:soft.name" ref="#R3">TeiCoPhiLib</rs>, a library of Java software components devoted to + target="softw:java"/>Java software components devoted to editing, processing, and visualising TEI documents in the domain of philological studies. This tool is particularly suited to fostering collaborative philological work. This cluster is closed by Dalmau and Hawkins’s article, the focus of which is not on specific diff --git a/data/JTEI/8_2014-15/jtei-8-moerth-source.xml b/data/JTEI/8_2014-15/jtei-8-moerth-source.xml index 268b686..c52f8af 100644 --- a/data/JTEI/8_2014-15/jtei-8-moerth-source.xml +++ b/data/JTEI/8_2014-15/jtei-8-moerth-source.xml @@ -334,7 +334,7 @@ sentences were originally located directly inside the body, we have now settled for a clearer distinction between the two. Since most of our dictionary data is created and maintained using the Viennese Lexicographic + target="softw:vienneseeditor"/>Viennese Lexicographic Editor and held in a relational database (described in Budin, Majewski, and Mörth 2012), format changes like this can be applied transparently on all of our dictionaries, making it easy to ensure @@ -819,10 +819,10 @@

The query element contains the query string. It should also have a type attribute indicating the applied query language. In the example - above, CQP (Corpus Query Processor) refers to the query language of the IMS Institut für Maschinelle Sprachverarbeitung, Stuttgart. - Corpus + Corpus Workbench. The element evalMode can be filled with either none, which implies that the data was retrieved automatically, or manual, which should be applied when some kind of postprocessing has been @@ -861,8 +861,8 @@ offering functions to embed them in the markup. Until implementations have reached this level of integration, we have to rely on a combination of components to support this functionality. An example for this is the commercial product Sketch - Engine, Sketch + Engine,.which includes a web interface for querying language corpora. In particular, it offers the ability to download the resulting frequency data (as well as concordances) in its own, diff --git a/data/JTEI/8_2014-15/jtei-8-munoz-source.xml b/data/JTEI/8_2014-15/jtei-8-munoz-source.xml index 379b43d..fa1861f 100644 --- a/data/JTEI/8_2014-15/jtei-8-munoz-source.xml +++ b/data/JTEI/8_2014-15/jtei-8-munoz-source.xml @@ -356,16 +356,16 @@ while producing data, use of the milestone strategy for multiple hierarchies (ontologies) decreases the reusability of the textual data produced. The project relies on an automatic process to convert the document-focused encoding into a text-focused one. This process - consists of - a - set of XSLT transformations authored by + a + set of XSLT transformations authored by Wendell Piez, who served as a consultant to the S-GA project in 2013. These transformations are structured as a pipeline—progressively remodeling the document-focused TEI data to a more familiar text-focused TEI.The automated transformation workflow - was originally managed using XProc to compose the various stylesheets but was - converted to an Apache Cocoon block for ease of maintenance by project + was originally managed using XProc to compose the various stylesheets but was + converted to an Apache Cocoon block for ease of maintenance by project staff at MITH. Some of the stages involved in this process include, for example, identifying chapter boundaries that span across multiple surfaces (which for convenience are maintained in separate files), and then combining the content of these diff --git a/data/JTEI/8_2014-15/jtei-8-rosselli-source.xml b/data/JTEI/8_2014-15/jtei-8-rosselli-source.xml index ec46578..49c951f 100644 --- a/data/JTEI/8_2014-15/jtei-8-rosselli-source.xml +++ b/data/JTEI/8_2014-15/jtei-8-rosselli-source.xml @@ -130,10 +130,10 @@ the constant search for an effective price/result ratio and the local availability of technical skills, have led to a remarkable fragmentation: publishing solutions range from simple HTML pages produced using the - TEI stylesheets (or the - TEI Boilerplate software) to very complex frameworks + target="softw:teistylesheets"/> + TEI stylesheets (or the + TEI Boilerplate software) to very complex frameworks based on CMS and SQL search engines. Researchers of the Digital Vercelli Book project started looking into a simple, user-friendly solution and eventually decided to build their own: EVT (Edition Visualization Technology) has been under development since about @@ -174,35 +174,35 @@ />.

in favor of a web-based publication. While this decision was critical in that it allowed us to select the most supported and widely-used medium, we soon discovered that it did not make choices any simpler. On the one hand, the XSLT stylesheets provided by TEI are great for HTML rendering, but do not include support for image-related features (such as the text-image linking available thanks to the P5 version of the TEI schema) and tools (including zoom in/out, magnifying lens, and hot spots) that represent a significant part of a digital facsimile and/or diplomatic edition; other features, such as an XML search engine, would have to be integrated separately, in any case. On the other hand, there are powerful frameworks - based on CMS

The Omeka framework (

The Omeka framework () supports publishing TEI documents; see - also Drupal () and TEICHI ( + />‎) and TEICHI ( ).

and other web - technologies

Such as the - eXist XML database,

Such as the + eXist XML database, .

which looked far too complex and expensive, particularly when considering future maintenance needs, for our project’s purposes. Other solutions, such as the - + EPPT software

Edition Production and Presentation Technology, - .

developed by K. Kiernan - or the + .

developed by K. Kiernan + or the - Elwood viewer

Elwood Viewer, + Elwood viewer

Elwood Viewer, .

created by G. Lyman, either were not yet ready or were unsuitable for other reasons (proprietary software, user interface @@ -223,8 +223,8 @@
First Experiments -

At first, however, - EVT was more an experimental research project for +

At first, however, + EVT was more an experimental research project for students at the Informatica Umanistica course of the University of Pisa

BA course, .

than a real @@ -246,12 +246,12 @@
- The Current - EVT Version + The Current + EVT Version
- - EVT - v. 2.0: Rebooting the Project + + EVT + v. 2.0: Rebooting the Project

To get out of the impasse we decided to completely reboot the project, removing secondary features and giving priority to fundamental ones. We also found a solution for the data-loading problem: instead of finding a way to load the data into the software we @@ -259,20 +259,20 @@ starting point means that the editor can focus on his work, marking up the transcription text, with very little configuration needed to create the edition. This approach also allowed us to quickly test XML files belonging to other edition projects, to check if - EVT could go beyond being a project-specific tool. The inspiration for these changes came from work done in similar projects developed within the TEI community, - namely + namely - TEI Boilerplate,

TEI Boilerplate, - + TEI Boilerplate,

TEI Boilerplate, + .

- - John A. Walsh’s collection of + John A. Walsh’s collection of XSLT stylesheets,

- tei2html - , XSLT stylesheets,

+ tei2html + , .

and Solenne Coutagne’s work for the Berliner @@ -280,9 +280,9 @@ level="m">Briefe und Texte aus dem intellektuellen Berlin um 1800, .

Through this approach, we achieved two important results: first, usage of - EVT is quite simple—the user applies an XSLT + type="software" xml:id="R13" target="softw:EVT"/> + EVT is quite simple—the user applies an XSLT stylesheet to their already marked-up file(s), and when the processing is finished they are presented with a web-ready edition; second, the web edition that is produced is based on a client-only architecture and does not require any additional kind of server @@ -291,10 +291,10 @@ public).

To ensure that it will be working on all the most recent web browsers, and for as long as possible on the World Wide Web itself, - EVT is built on open and standard web technologies - such as HTML, CSS, and JavaScript. Specific features, such as the magnifying + target="softw:EVT"/> + EVT is built on open and standard web technologies + such as HTML, CSS, and JavaScript. Specific features, such as the magnifying lens, are entrusted to jQuery plug-ins, again chosen from the best-supported open-source ones to reduce the risk of future incompatibilities. The general architecture of the software, in any case, is modular, so that any component which may cause trouble or turn @@ -304,20 +304,20 @@ How it Works

Our ideal goal was to have a simple, very user-friendly drop-in tool, requiring little work and/or knowledge of anything beyond XML from the editor. To reach this goal, - EVT is based on a modular structure where a single + type="software" xml:id="R109" target="softw:evt"/> + EVT is based on a modular structure where a single stylesheet (evt_builder.xsl) starts a chain of XSLT - 2.0 transformations calling in turn all the other + xml:id="R17" target="softw:XSLT"/>XSLT + 2.0 transformations calling in turn all the other modules. The latter belong to two general categories: those devoted to building the HTML site, and the XML processing ones, which extract the edition text lying between folios using the pb element and format it according to the edition level. All XSLT + type="software" xml:id="R18" target="softw:XSLT"/>XSLT modules live inside the builder_pack folder, in order to have a clean and well-organized directory hierarchy.

- The - EVT + The + EVT builder_pack directory structure.

@@ -331,18 +331,18 @@ evt_builder-conf.xsl, to specify for example the number of edition levels or presence of images; you can then apply the evt_builder.xsl stylesheet to your TEI XML - document using the Oxygen XML editor or another XSLT - 2–compliant engine. + document using the Oxygen XML editor or another XSLT + 2–compliant engine.
- The - EVT data directory structure. + The + EVT data directory structure.

-

When the When the XSLT processing is finished, the starting point for the edition is the index.html file in the root directory, and all the HTML pages resulting from the transformations will be stored in the output_data folder. You @@ -351,25 +351,25 @@ the assigned places.

- The The XSLT stylesheets

The transformation chain has two main purposes: generate the HTML files containing the edition and create the home page which will dynamically recall the other HTML files.

-

The - EVT builder’s transformation system is composed of - a modular collection of XSLT +

The + EVT builder’s transformation system is composed of + a modular collection of XSLT 2.0 stylesheets: these modules are designed to permit scholars to freely add their own stylesheets and to manage the different desired levels of the edition without influencing other parts of the system, for instance the generation of the home page.

The transformation is performed applying a specific XSLT stylesheet + target="softw:XSLT"/>XSLT stylesheet (evt_builder.xsl) which includes links to all the other stylesheets that are part of the transformation chain and that will be applied to the TEI XML document containing the transcription.

-

- EVT can be used to create image-based editions with +

+ EVT can be used to create image-based editions with different edition levels starting from a single encoded text. The text of the transcription must be divided into smaller parts to recreate the physical structure of the manuscript. Therefore, it is essential that paginated XML documents are marked using @@ -398,8 +398,8 @@ described in the software documentation. Adding another edition level requires providing the corresponding stylesheet.

Once the XML file is ready and the parameters are set, the - EVT builder’s transformation system uses a + xml:id="R111" target="softw:EVT"/> + EVT builder’s transformation system uses a collection of stylesheets to divide the XML file containing the text of the transcription into smaller portions, each one corresponding to the content of a folio, recto or verso, of the manuscript. For each of these text fragments it creates as many @@ -413,19 +413,19 @@ xsl:apply-templates select="current-group()" mode="dipl" instruction before its content is inserted into the diplomatic output file.

-

Using Using XSLT modes it is possible to separate the rules for the different transformations of a TEI element and to recall other XSLT stylesheets in order to + target="softw:XSLT"/>XSLT stylesheets in order to manage the transformations or send different parts of a document to different parts of the transformation chain. This permits the extraction of different texts for different edition levels (diplomatic, diplomatic-interpretative) processing the same XML file, and to save them in the HTML site structure, which is available as a separate XSLT + type="software" xml:id="R30" target="softw:XSLT"/>XSLT module.

The use of modes also allows users to separate template rules for the different transformations of a TEI element and to place them in different XSLT files or in + xml:id="R31" target="softw:XSLT"/>XSLT files or in different parts of a single stylesheet. So templates such as the following and personalize the edition generation parameter as shown above; - copy their own XSLT files containing the template rules to + copy their own XSLT files containing the template rules to generate the desired edition levels in the directory that contains the stylesheets used for TEI element transformation (builder_pack/modules/elements); @@ -459,26 +459,26 @@

For the time being, this kind of customization has to be done by hand-editing the configuration files, but in a future version of - EVT we plan to add a more user-friendly way to + target="softw:EVT"/> + EVT we plan to add a more user-friendly way to configure the system.

Features -

At present, - EVT can be used to create image-based editions with +

At present, + EVT can be used to create image-based editions with two possible edition levels: diplomatic and diplomatic-interpretative; this means that a transcription encoded using elements belonging to the appropriate TEI module

See chapter 11, Representation of Primary Sources, in the TEI Guidelines.

should already be - compatible with - EVT, or require only minor changes to be made + compatible with + EVT, or require only minor changes to be made compatible. The Vercelli Book transcription schema is based on the standard TEI schema, with no custom elements or attributes added: our tests with similarly encoded texts showed a high grade of compatibility. A critical edition level is currently being researched and it will be added in the future.

-

When the website produced by - EVT is loaded in a browser, the viewer will be +

When the website produced by + EVT is loaded in a browser, the viewer will be presented with the manuscript image on the left side, and the corresponding text on the right: this is the default view, but on the main toolbar at the top right corner of the browser window there are icons to access all the available views: @@ -505,8 +505,8 @@ required by the editor. The only necessary requirement at the encoding level, in fact, is that the editor should encode folio numbers by means of the pb element including r and v letters to mark recto - and verso pages, respectively. - EVT will take care of automatically associating + and verso pages, respectively. + EVT will take care of automatically associating each folio to the images copied in the input_data/images folder using a verso-recto naming scheme (for example: 104v-105r.png). It is of course possible that in some cases the @@ -516,16 +516,16 @@ the transformation process is started and can be customized by the editor.

Although the different views access different kinds of content, such as single side and double side images, the navigation algorithms used by - EVT allow the user to move from one view to another + target="softw:EVT"/> + EVT allow the user to move from one view to another without losing the current browsing position.

All content is shown inside HTML frames designed to be as flexible as possible. No matter what view one is currently in, one can expand the desired frame to focus on its specific content, temporarily hiding the other components of the user interface. It is furthermore possible to collapse the frame toolbars to increase the space devoted to content visualization; it is important to notice, however, that we recommend using - EVT in full-screen mode to see images and text at + type="software" xml:id="R112" target="softw:EVT"/> + EVT in full-screen mode to see images and text at the maximum possible screen resolution. The collapse and restore actions are triggered by icons embedded in the interface, but one can also press the Esc key to instantly return to the default layout.

@@ -549,14 +549,14 @@

The image-text feature is inspired by - Martin Holmes’s - Image Markup Tool

The UVic Image Markup Tool Project, + Martin Holmes’s + Image Markup Tool

The UVic Image Markup Tool Project, .

- software and was implemented in XSLT and CSS; all the other features are achieved by + software and was implemented in XSLT and CSS; all the other features are achieved by using jQuery plug-ins.

In the text frame tool bar you can see three drop-down menus which are useful for choosing texts, specific folios, and edition levels, and an icon that triggers the @@ -567,25 +567,25 @@

A First Use Case

On December 24, 2013, after extensive testing and bug fixing work, the - EVT team published a beta version of the + EVT team published a beta version of the Digital Vercelli Book edition,

Full announcement on the project blog, . The beta edition is directly accessible at .

soliciting feedback from all interested parties. Shortly afterwards, the version of the - - EVT software we used, improved by more bug fixes + + EVT software we used, improved by more bug fixes and small enhancements, was made available for the academic community on the project’s SourceForge site.

Edition Visualization Technology: Digital edition visualization - software, .

The Digital Vercelli Book edition based on - EVT + xml:id="R42" target="softw:EVT"/> + EVT v. 0.1.48. Image-text linking is active.

@@ -593,15 +593,15 @@
Future Developments -

- EVT development will continue during 2014 to fix bugs +

+ EVT development will continue during 2014 to fix bugs and to improve the current set of features, but there are also several important features that will be added or that we are currently considering for inclusion in - EVT. Some of the planned features will require + type="software" xml:id="R44" target="softw:EVT"/> + EVT. Some of the planned features will require fundamental changes to the software architecture to be implemented effectively: this is - probably the case for the - Digital Lightbox (see + Digital Lightbox (see ), which requires a client-server architecture (), instead of the current client-only model, to perform some of the existing and planned actions. The currently developed search engine ( New Layout

One important aspect that has been introduced in the current version of - EVT is a completely revised layout: the current + type="software" xml:id="R45" target="softw:EVT"/> + EVT is a completely revised layout: the current user interface includes all the features which were deemed necessary for the Digital Vercelli Book beta, but it also is ready to accept the new features planned for the short and medium terms. Note that nontrivial changes to the general appearance and @@ -623,22 +623,22 @@

Search Engine -

The - EVT search engine is already working and being +

The + EVT search engine is already working and being tested in a separate development branch of the software; merging into the main branch is expected as soon as the user interface is finalized. It was implemented with the goal of keeping it simple and usable for both academics and the general public.

To achieve this goal we began by studying various solutions that could be used as a basis for our efforts. In the first phases of this study we looked at the principal XML - databases, such as of - BaseX, - eXist, etc., and we found a solution by envisioning - - EVT as a distributed application using the + databases, such as of + BaseX, + eXist, etc., and we found a solution by envisioning + + EVT as a distributed application using the client-server architecture. For this test we selected the - + eXist

eXist-db, .

open source XML database, and in a relatively short time we created, sometimes by trial-and-error, a prototype that @@ -647,18 +647,18 @@ felt that it was not sufficiently user-friendly, which is a critical goal of the entire project. In fact, forcing the editor to install and configure specific server software is a cumbersome requirement. Moreover, thanks to its client-only architecture, up to - this point - EVT could work as a desktop application or an + this point + EVT could work as a desktop application or an off-line web application that could be accessed anywhere, and possibly distributed in optical formats (CD or DVD). Forcing the prerequisites of an Internet connection and of dependency on a server-based XML database would have undermined our original goal. Going the database route was no longer an option for a client-only - EVT and we immediately felt the need to go back to + xml:id="R51" target="softw:EVT"/> + EVT and we immediately felt the need to go back to our original architecture to meet this standard. This sudden turnaround marked another chapter in the research process and brought us to the current implementation of - EVT Search.

+ type="software" xml:id="R115" target="softw:EVT"/> + EVT Search.

However, this new vision also had obvious limitations and issues. An XML database could provide us with crucial functionality typical of every information retrieval system: an indexer, a powerful and flexible server-side language (XQuery), and some useful built-in @@ -669,57 +669,57 @@ expected by the user. Essentially, we found that at least two of them were needed in order to make a functional search engine: free-text search and keyword highlighting. To implement them we looked at existing search engines and plug-ins programmed in the most - popular client-side web language: JavaScript. In the end, our search produced two - answers: - Tipue Search and DOM + popular client-side web language: JavaScript. In the end, our search produced two + answers: + Tipue Search and DOM manipulation.

- - Tipue Search + + Tipue Search

- Tipue search

- Tipue Search, + Tipue search

+ Tipue Search, .

is a jQuery plug-in search engine released under the MIT license and aimed at indexing and searching large collections of web pages. It can function both offline and online, and it does not necessarily require a web server or a server-side programming/query language (such as SQL, PHP, or Python) in order to work. While technically a plug-in, its architecture is quite interesting and versatile: - Tipue uses a combination of client-side + Tipue uses a combination of client-side JavaScript for the actual bulk of the work, and JSON (or JavaScript + xml:id="R58" target="softw:JavaScript"/>JavaScript object literal) for storing the content. By accessing the data structure, this engine is able to search for a relevant term and bring back the matches.

-

- Tipue Search operates in three modes: + Tipue Search operates in three modes: in Static mode, - Tipue Search operates without a web server by + target="softw:tipuesearch"/> + Tipue Search operates without a web server by accessing the contents stored in a specific file (tipuedrop_content.js); these contents are presented in JSON format; in Live mode, - Tipue Search operates with a web server by + target="softw:tipuesearch"/> + Tipue Search operates with a web server by indexing the web pages included in a specific file (tipuesearch_set.js); in JSON mode, - Tipue Search operates with a web server by + target="softw:tipuesearch"/> + Tipue Search operates with a web server by using AJAX to load JSON data stored in specific files (as defined by the user).

This plug-in suited our needs very well, but had to be modified slightly in order to accommodate the requirements of the entire project. Before using - Tipue to handle the search we needed to generate + xml:id="R63" target="softw:tipuesearch"/> + Tipue to handle the search we needed to generate the data structure that was going to be used by the engine to perform the queries. We explored some existing XSL stylesheets aimed at TEI to JSON transformation, but we found them too complex for the task at hand. So we modified our own stylesheets to @@ -732,24 +732,24 @@

These files are produced by including two templates in the overall flow of XSLT transformations that extract crucial data from the TEI documents and format them with JSON syntax. The procedure complements well the entire logic of automatic self-generation that characterizes - EVT.

+ target="softw:EVT"/> + EVT.

After we managed to extract the correct data structure, we began to include the - search functionality in - EVT. By using the logic behind - Tipue JSON mode, we implemented a trigger (under + search functionality in + EVT. By using the logic behind + Tipue JSON mode, we implemented a trigger (under the shape of a select tag) that loaded the desired JSON data structure to handle the search (diplomatic or facsimile, as mentioned above) and a form that managed the query strings and launched the search function. Additionally, we decided to provide the user with a simple virtual keyboard composed of essential keys related to the Anglo-Saxon alphabet used in the Vercelli Book.

-

The performance of - Tipue Search was deemed acceptable and our tests +

The performance of + Tipue Search was deemed acceptable and our tests showed that even large collections of data did not pose any particular problem.

@@ -760,16 +760,16 @@ Keyword Highlighting through DOM Manipulation

The solution to keyword highlighting was found while searching many plug-ins that deal with this very problem. All these plug-ins use JavaScript and DOM + target="softw:JavaScript"/>JavaScript and DOM manipulation in order to wrap the HTML text nodes that match the query with a specific tag (a span or a user-defined tag) and a CSS class to manage the style of the highlighting. While this implementation was very simple and self-explanatory, making use of simple recursive functions on relevant HTML nodes has proved to be very difficult to apply to the textual contents handled by - EVT.

-

HTML text within - EVT is represented as a combination of text nodes + xml:id="R70" target="softw:EVT"/> + EVT.

+

HTML text within + EVT is represented as a combination of text nodes and span elements. These spans are used to define the characteristics of the current selected edition. They contain both philological information about the inner workings of the text and information about its visual representation. Very often the @@ -818,15 +818,15 @@ information about the image, but is placed inside a zone element, which defines two-dimensional areas within a surface, and is transcribed using one or more line elements.

-

Originally - EVT could not handle this particular encoding - method, since the XSLT stylesheets could only process TEI XML +

Originally + EVT could not handle this particular encoding + method, since the XSLT stylesheets could only process TEI XML documents encoded according to the traditional transcription method. Since we think that this is a concrete need in many cases of study (mainly epigraphical inscriptions, but also manuscripts, at least in some specific cases), we recently added a new - feature that will allow - EVT to handle texts encoded according to the + feature that will allow + EVT to handle texts encoded according to the embedded transcription method. This work was possible due to a small grant awarded by EADH.

See EADH Small Grant: Call for Proposals, Support for Critical Edition

One important feature whose development will start at some point this year is the support for critical editions, since at the present moment - EVT allows dealing only with diplomatic and + xml:id="R76" target="softw:EVT"/> + EVT allows dealing only with diplomatic and interpretative ones. We aim not only to offer full support for the TEI Critical Apparatus module, but also to find an innovative layout that can take advantage of the digital medium and its dynamic properties to go beyond the traditional, static, @@ -865,8 +865,8 @@

Some of the problems related to this approach are related to the user interface and the way it should be designed in order to be usable and useful: how to conceive and where to place the graphical widgets holding the critical apparatus, how to integrate - these UI elements in - EVT, how to contextualize the variants and + these UI elements in + EVT, how to contextualize the variants and navigate through the witnesses’ texts, and more. There are other problems, for instance scalability issues (how to deal with very big textual traditions that count tens or even hundreds of witnesses?) or the handling of texts produced by collation @@ -878,16 +878,16 @@

- - Digital Lightbox + + Digital Lightbox

Developed first at the University of Pisa, and then at King’s College London as part of the DigiPal

DigiPal: Digital Resource and Database of Palaeography, Manuscript Studies and Diplomatic, .

project, the - Digital Lightbox

A beta version is - available at + Digital Lightbox

A beta version is + available at .

is a web-based visualization framework which aims to support historians, paleographers, art historians, and others in analyzing and studying digital reproductions of cultural heritage objects. @@ -899,8 +899,8 @@ may be obtained at this time are still significantly less precise (with regard to specific image features, at least) than those produced through human interpretation.

Initially developed exclusively for paleographic research, the - Digital Lightbox may be used with any type of image + xml:id="R80" target="softw:digitallightbox"/> + Digital Lightbox may be used with any type of image because it includes a set of general graphic tools. Indeed, the application allows a detailed and powerful analysis of one or more images, arranged in up to two available workspaces, providing tools for manipulation, management, comparison, and transformation @@ -924,8 +924,8 @@

Collaboration is a very important characteristic of - Digital Lightbox: what makes this tool stand apart + target="softw:digitallightbox"/> + Digital Lightbox: what makes this tool stand apart from all the image-editing applications available is the possibility of creating and sharing the work done using the software framework. First, you can create collections of images and then export them to the local disk as an XML file; this feature not only @@ -938,56 +938,56 @@ effective and easy. Thanks to a new HTML5 feature, it is possible to support the importing of images from the local disk to the application without any server-side function.

-

- Digital Lightbox has been developed using some of +

+ Digital Lightbox has been developed using some of the latest web technologies available, such as HTML5, CSS3, the front-end framework Bootstrap,

Bootstrap, .

and the JavaScript (ECMAScript 6) programming language, - in combination with the jQuery + target="softw:Bootstrap"/>Bootstrap, .

and the JavaScript (ECMAScript 6) programming language, + in combination with the jQuery library.

.

The code architecture has been designed to be modular and easily extensible by other developers or third parties: indeed, it has been released as open - source software on GitHub,

- Digital Lightbox, GitHub,

+ Digital Lightbox, .

and is freely available to be downloaded, edited, and tinkered with.

-

The - Digital Lightbox represents a perfect complementary - feature for the - EVT project: a graphic-oriented tool to explore, +

The + Digital Lightbox represents a perfect complementary + feature for the + EVT project: a graphic-oriented tool to explore, visualize, and analyze digital images of manuscripts. While - EVT provides a rich and usable interface to browse + xml:id="R91" target="softw:EVT"/> + EVT provides a rich and usable interface to browse and study manuscript texts together with the corresponding images, the tools offered by - the - Digital Lightbox allow users to identify, gather, + the + Digital Lightbox allow users to identify, gather, and analyze visual details which can be found within the images, and which are important for inquiries relating, for instance, to the style of the handwriting, decorations on manuscript folia, or page layout.

An effort to adapt and integrate the - Digital Lightbox into - EVT is already underway, making it available as a + target="softw:digitallightbox"/> + Digital Lightbox into + EVT is already underway, making it available as a separate, image-centered view, but there is a major hurdle to overcome: some of the - DL features are only possible within a - client-server architecture. Since - EVT or, more precisely, a separate version of - EVT will migrate to this architecture, at some + type="software" xml:id="R118" target="softw:digitallightbox"/> + DL features are only possible within a + client-server architecture. Since + EVT or, more precisely, a separate version of + EVT will migrate to this architecture, at some point in the future it will be possible to integrate a full version of the - DL. Plans for the current, client-only version + type="software" xml:id="R119" target="softw:digitallightbox"/> + DL. Plans for the current, client-only version envision implementing all those features that do not depend on server software: even if this means giving up interesting features such as collaborative work and annotation, we believe that even a subset of the available tools will be an invaluable help for @@ -1000,33 +1000,33 @@ target="http://claviusontheweb.it/web/index.php">Clavius on the Web project

See . A preliminary test using a previous version of - EVT is available at + EVT is available at .

to discuss a possible use of - - EVT in order to visualize the documents that they + + EVT in order to visualize the documents that they are collecting and encoding; the main goal of the project is to produce a web-based edition of all the correspondence of this important sixteenth–seventeenth-century mathematician.

Currently preserved at the Archivio della Pontificia Università Gregoriana.

The integration of - EVT with another web framework used in the project, + xml:id="R97" target="softw:EVT"/> + EVT with another web framework used in the project, the eXist XML database, will require a very important change in how the software works: - as mentioned above, everything from XSLT processing to browsing of the resulting + as mentioned above, everything from XSLT processing to browsing of the resulting website has been done on the client side, but the integration with - eXist will require a move to the more complex + xml:id="R99" target="softw:existdb"/> + eXist will require a move to the more complex client-server architecture. A version of - EVT based on this architecture would present + target="softw:EVT"/> + EVT based on this architecture would present several advantages, not only the integration of a powerful XML database, but also the implementation of a full version of the - Digital Lightbox. We will try to make the move as + target="softw:digitallightbox"/> + Digital Lightbox. We will try to make the move as painless as possible and to preserve the basic simplicity and flexibility that has been - a major feature of - EVT so far. The client-only version will not be + a major feature of + EVT so far. The client-only version will not be abandoned, though for quite some time there will be parallel development with features trickling from one version to the other, with the client-only one being preserved as a subset of the more powerful one.

@@ -1038,14 +1038,14 @@ to the publishing of TEI-encoded digital editions, this software has grown to the point of being a potentially very useful tool for the TEI community: since it requires little configuration, and no knowledge of programming languages or web frameworks except for what - is needed to apply an XSLT stylesheet, it represents a user-friendly method + is needed to apply an XSLT stylesheet, it represents a user-friendly method for producing image-based digital editions. Moreover, its client-only architecture makes it very easy to test the edition-building process (one has only to delete the output folders and start anew) and publish preliminary versions on the web (a shared folder on any cloud-based service such as Dropbox is all that is needed).

-

While - EVT has been under development for 3–4 years, it was +

While + EVT has been under development for 3–4 years, it was thanks to the work and focus required by the Digital Vercelli Book release at end of 2013 that we now have a solid foundation on which to build new features and refine the existing ones. Some of the future expansions also pose important research questions: this is the @@ -1063,8 +1063,8 @@ 2005 and 2013, Rosselli Del Turco, forthcoming.

The collaborative work features of the - Digital Lightbox are also critical to the way modern + target="softw:digitallightbox"/> + Digital Lightbox are also critical to the way modern scholars interact and share their research findings. Finally, designing a user interface capable of hosting all the new features, while remaining effective and user-friendly, will itself be very challenging.

@@ -1072,14 +1072,14 @@
- The - EVT Team + The + EVT Team Roberto Rosselli Del Turco Roberto Rosselli Del Turco, Julia Kenny, and Raffaele Masotti - Julia Kenny and Raffaele Masotti @@ -1089,8 +1089,8 @@ Giancarlo Buomprisco -

- EVT website: + EVT website:

diff --git a/data/JTEI/9_2016-17/jtei-9-armaselu-source.xml b/data/JTEI/9_2016-17/jtei-9-armaselu-source.xml index 2f6cdaf..5dcc965 100644 --- a/data/JTEI/9_2016-17/jtei-9-armaselu-source.xml +++ b/data/JTEI/9_2016-17/jtei-9-armaselu-source.xml @@ -637,7 +637,7 @@

The so called decoding phase, for corpus analysis and interpretation, consisted of importing and processing the TEI XML annotated documents within a specialized platform, TXM (Heiden 2010),. that allows the analysis of a large body of texts by means of lexicometrical and statistical methods. The previous @@ -645,11 +645,11 @@ or structural elements needed for analysis.

Importing -

Since Since TXM supports XSLT transformation at the moment of import (XML/w+CSV option), an XSLT stylesheet was created to accommodate particular formats or conversions required by the software. Therefore, it was not necessary to store different versions of the corpus, - one for TXM analysis, the other for Web publication.

First, a lowercase conversionAll the examples of analysis presented in the paper will consequently be displayed in lowercase. was provided for consistency @@ -667,7 +667,7 @@ cons and tituant as the software would treat it without a w tag).

Part-of-speech tagging via the TreeTagger module integrated into TXM was also applied + xml:id="TXM" target="softw:TXM"/>TXM was also applied to the corpus at import in order to allow lemma and part-of-speech statistics and queries.

@@ -677,15 +677,15 @@ contained 6,512 items (unique words) with 76,558 occurrences in the text.A subcorpus based on the @lang="fr" property (an attribute of the text element) was created in TXM for the analysis of the + target="softw:TXM"/>TXM for the analysis of the documents’ content, excluding the data from the teiHeader. The whole corpus (teiHeader included) comprised 7,015 items and 105,897 occurrences.

Partitioning

Given the identification and annotation of different semantic and structural elements - in the encoding phase, TXM allows the creation of partitions (TXM allows the creation of partitions (Textométrie 2014, section Construire une partition) by selecting a Structure unit and a corresponding Property (i.e., an XML element and one of its attributes) from the list @@ -723,13 +723,13 @@ comparison of the vocabularies: what is specific (either as overuse or deficit) in a part of a partition, as compared with the parent corpus and a certain threshold.In TXM, it is called the banality threshold, fixed by default at the value of +/- 2.0 for positive and negative specificities scores, respectively. In , the banality thresholds are rendered by (red) horizontal lines. The feature is based on a probabilistic model (Lafon 1980) used in TXM to compute a + xml:id="TXM" target="softw:TXM"/>TXM to compute a log10 specificity score of a word property (e.g., word form, lemma, or part of speech) for a given part. In the analysis of the WEU-Diplo corpus, it was assumed that the specificity score may draw attention to forms specific to the @@ -1103,8 +1103,8 @@

Results Discussion -

The TEI XML encoding and TXM analysis related to the research questions on +

The TEI XML encoding and TXM analysis related to the research questions on arms design, production, and control within the WEU have enabled a set of more or less predictable results, the latter needing further examination. Among the former, we can mention those referring to the SAC and ACA roles. Arms production and control was a @@ -1134,7 +1134,7 @@ Commonwealth Office, Western Organisations Department: Registered Files (W and WD Series). Western European Union (WEU). Future of Standing Armaments Committee of Western European Union. 01/01/1975–31/12/1975, FCO 41/1749 (Former Reference Dep: WDU 11/1 PART B). The interpretation of the less predictable results is not straightforward, since they may have been determined by an under- or overrepresentation of certain elements in the discourse, @@ -1145,11 +1145,11 @@ controls and the need to avoid making statements on the subject. Since the size of the corpus was relatively small, and not all the information for the documents on the selected topic and their types in the WEU archive was available, extrapolations about - the TXM probabilistic model and the observed linguistic patterns at a larger scale than the pilot sample should be avoided at this stage.

-

The TEI XML combined with the TXM analysis tools can also reveal inconsistencies +

The TEI XML combined with the TXM analysis tools can also reveal inconsistencies which may draw attention to the need for further encoding and testing additional documents. On the other hand, it is also important to take into consideration how far (or how well) the researcher/user knows the content of the documents, as a lack of @@ -1238,7 +1238,7 @@ Chicago: University of Chicago Press. Excerpt: . Heiden, Serge. 2010. The <ptr type="software" xml:id="TXM" target="#TXM"/><rs type="soft.name" + level="a">The <ptr type="software" xml:id="TXM" target="softw:TXM"/><rs type="cit:soft.name" ref="#TXM">TXM</rs> Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, @@ -1296,8 +1296,8 @@ and Pascaline Winand, 187–234. Brussels: PIE-Peter-Lang. Textométrie. 2014. Manuel de <ptr type="software" xml:id="TXM" target="#TXM"/><rs - type="soft.name" ref="#TXM">TXM</rs>, Version 0.7. Accessed March 6, 2016. + level="m">Manuel de TXM, Version 0.7. Accessed March 6, 2016. . Thornborrow, Joanna Sarah. 2002 Power Talk: Language and Interaction in Institutional diff --git a/data/JTEI/9_2016-17/jtei-9-ciotti-source.xml b/data/JTEI/9_2016-17/jtei-9-ciotti-source.xml index da8faea..f9e8d21 100644 --- a/data/JTEI/9_2016-17/jtei-9-ciotti-source.xml +++ b/data/JTEI/9_2016-17/jtei-9-ciotti-source.xml @@ -664,7 +664,7 @@ <p>In order to finalize the model from an LOD cloud perspective—as regards the collection of TEI-based documents—various methods will have to be explored, beginning with the creation of an RDF triple store by converting some pertinent elements of the refined TEI XML files - into RDF through <ptr type="software" xml:id="XSLT" target="#XSLT"/><rs type="soft.name" + into RDF through <ptr type="software" xml:id="XSLT" target="softw:XSLT"/><rs type="cit:soft.name" ref="#XSLT">XSLT</rs>. Experiments in converting XML files into RDF have already been undertaken: <quote source="#quoteref7">a transformation to RDF has to create the URIs of its resources and connect them through the RDF triple structure consisting of subject, @@ -677,7 +677,7 @@ The topic is difficult and we are now trying to address this complexity. A first approach we are attempting is the following. In general we can assert that: TEI elements are <ident>rdf:description</ident> about a node id (e.g., through an <att>ref</att>) that we - could manage in <ptr type="software" xml:id="XSLT" target="#XSLT"/><rs type="soft.name" + could manage in <ptr type="software" xml:id="XSLT" target="softw:XSLT"/><rs type="cit:soft.name" ref="#XSLT">XSLT</rs> for transforming the <att>ref</att> value into a URI. This approach yields: <eg> SUBJECT = rdf:description about a TEI element (the @ref value) PREDICATE = an attribute of the element in the subject (for managing cross-references) @@ -691,7 +691,7 @@ </egXML> That is: an entity (person) with a value (a fragment) referring to a <att>xml:id</att> (<val>persona01</val>) to be converted in dereferenceable URI (e.g., <ident>http://www.person.it/about#persona01</ident>) through <ptr type="software" - xml:id="XSLT" target="#XSLT"/><rs type="soft.name" ref="#XSLT">XSLT</rs>, a predicate + xml:id="XSLT" target="softw:XSLT"/><rs type="cit:soft.name" ref="#XSLT">XSLT</rs>, a predicate corresponding to the child element (<ident>persname</ident>) and a literal as the object (Vespasiano). Another fundamental issue is the identification of pertinent authorities for the data matching (e.g., VIAF, Geonames, Worldcat, SNAP, or DBpedia). In order for the @@ -753,7 +753,7 @@ />.</bibl> <bibl xml:id="breitling09"><author>Breitling, Frank</author>. <date>2009</date>. <title level="a">A Standard Transformation from XML to RDF via <ptr type="software" - xml:id="XSLT" target="#XSLT"/><rs type="soft.name" ref="#XSLT">XSLT</rs>. + xml:id="XSLT" target="softw:XSLT"/>XSLT. Astronomische Nachrichten 330(7): 755–60. SARIT or Buddhist Stonesutras or experiments with EEBO-TCPEarly English Books Online eXist-db app, accessed February 11, 2016, . are more than promising (see, for example, Wicentowski and diff --git a/data/JTEI/rolling_2021/jtei-vagionakis-204-source.xml b/data/JTEI/rolling_2021/jtei-vagionakis-204-source.xml index 9b0e388..46142f2 100644 --- a/data/JTEI/rolling_2021/jtei-vagionakis-204-source.xml +++ b/data/JTEI/rolling_2021/jtei-vagionakis-204-source.xml @@ -86,14 +86,14 @@

The paper presents the database Cretan Institutional Inscriptions, which was created as part of a PhD research project carried out at the University of Venice Ca’ Foscari. The database, built using the EpiDoc Front-End Services (EFES) platform, collects the EpiDoc editions of six hundred inscriptions that shed light on the institutions of the political entities of Crete from the seventh to the first century BCE. The aim of the paper is to outline the main issues addressed during the creation of the database and the encoding of the inscriptions and to illustrate the core features of the database, with an emphasis on the advantages deriving from the combined use of the TEI-EpiDoc standard and of the EFES platform.

@@ -137,15 +137,15 @@ document. The editions of these inscriptions, along with a collection of the most relevant literary sources, have been collected in the database Cretan Institutional Inscriptions, which I created using the EpiDoc Front-End Services (EFES) platform. To facilitate consulting the epigraphic records, the database also includes, in addition to the ancient sources, two catalogs providing information about the Cretan political entities and the institutional elements considered.

The aim of this paper is to illustrate the main issues tackled during the creation of the database and to examine the choices made, focusing on the advantages offered by - the use of EpiDoc and EFES.

+ the use of EpiDoc and EFES.

Cretan Epigraphy and Cretan Institutions @@ -259,7 +259,7 @@
Towards the Creation of a Born-Digital Epigraphic Collection with EFES

Once the relevant material had been defined, another major issue that I had to face was to decide how to deal efficiently with it. While I was in the process of starting @@ -282,21 +282,21 @@ collection of editions of the previously selected six hundred inscriptions to creating it as a born-digital epigraphic collection because of another event that also happened in 2017: the appearance of a powerful new tool for digital epigraphy, - EpiDoc Front-End Services (EFES).

GitHub repository, accessed July - 21, 2021, EFES).

GitHub repository, accessed July + 21, 2021, .

Although I was already aware of the many benefits deriving from a semantic markup of the inscriptions,

On which see and .

what really persuaded me to adopt a TEI-based approach for the creation of my epigraphic editions was actually the great facilitation that EFES offered in using + target="softw:efes"/>EFES offered in using TEI-EpiDoc, which I will discuss in the following section.

- The Benefits of Using EpiDoc and EFES + The Benefits of Using EpiDoc and EFES

I was already familiar with the epigraphic subset of the TEI standard, EpiDoc,

EpiDoc: Epigraphic Documents in TEI XML, accessed July 21, 2021,

This is particularly true for the creation of publishable output of the encoded - inscriptions. The EpiDoc Reference XSLT Stylesheets, created for + inscriptions. The EpiDoc Reference XSLT Stylesheets, created for transformation of EpiDoc XML files into HTML,

Accessed July 21, 2021, - .

require relatively advanced knowledge of XSLT to use them to produce a satisfying HTML edition for online publication or to generate a printable PDF. Not to mention the @@ -351,66 +351,66 @@ their research work on a collection of ancient documents, without aiming at the publication of the encoded inscriptions. The querying of a set of EpiDoc inscriptions is possible to some extent even without technical support: in some advanced XML - editors, particularly - Oxygen, it is possible to perform XPath queries + editors, particularly + Oxygen, it is possible to perform XPath queries that allow the identification of all the occurrences of specific features in the epigraphic collection according to their markup. The XPath queries in an advanced XML editor also allow the creation of lists of specific elements mentioned in the inscriptions, but to my knowledge the creation of proper indexes—before EFES—was almost impossible to achieve without the help of an IT expert.

Thus, despite the many benefits that EpiDoc encoding potentially offers, epigraphists might often be discouraged from adopting it by the amount of time that such an approach requires, combined with the fact that in many cases these benefits become tangible only at the end of the work, and only if one has IT support.

In light of these limitations, it is easy to understand how deeply the release of - EFES has transformed the field of digital epigraphy. EFES, developed + xml:id="R13" target="softw:efes"/> EFES, developed at the Institute of Classical Studies of the School of Advanced Study of the University of London as the epigraphic specialization of the - Kiln platform ,

New + xml:id="R14" target="softw:kiln"/> + <rs type="cit:soft.name" ref="#R14">Kiln platform</rs> ,<note><p><title level="a">New Digital Publishing Tool: EpiDoc Front-End Services, September 1, 2017, ; see also the Kiln GitHub repository, accessed July 21, 2021, + type="cit:soft.url" ref="#R14"> .

is a platform that simplifies the creation and management of databases of inscriptions encoded following the EpiDoc Guidelines. More specifically, - EFES was developed to make it easy for EpiDoc + xml:id="R15" target="softw:efes"/> + EFES was developed to make it easy for EpiDoc users to view a publishable form of their inscriptions, and to publish them online in a full-featured searchable database, by easily ingesting EpiDoc texts and providing formatting for their display and indexing through the EpiDoc + xml:id="R16" target="softw:epidocxslt"/> EpiDoc reference XSLT stylesheets. The ease of configuration of the XSLT transformations, and the possibility of already having, during construction, an immediate front-end visualization of the desired final outcome of the TEI-EpiDoc marked-up documents, allow smooth creation of an epigraphic database even without a large team or in-depth IT skills. Beyond this, EFES is also remarkable for + target="softw:efes"/>EFES is also remarkable for the ease of creation and display of the indexes of the various categories of marked-up terms, which significantly simplifies comparative analysis of the data - under consideration. EFES is thus proving to be an extremely useful + under consideration. EFES is thus proving to be an extremely useful tool not only for publishing inscriptions online, but also for studying them before their publication or even without the intention of publishing them, especially when dealing with large collections of documents and data sets.

See Bodard and Yordanova (2020).

-

Some of these useful features of EFES are common to other existing tools, - such as TEI Publisher,

Accessed July 21, 2021, +

Some of these useful features of EFES are common to other existing tools, + such as TEI Publisher,

Accessed July 21, 2021, .

- TAPAS,

Accessed July 21, 2021, .

or Kiln itself, which is - EFES’s direct ancestor. What makes EFES unique, however, is the + target="softw:efes"/>EFES unique, however, is the fact that it is the only one of those tools to have be designed specifically for epigraphic purposes and to be deeply integrated with the EpiDoc Schema/Guidelines and with its reference stylesheets. Not only does it use, by default, the EpiDoc @@ -426,21 +426,21 @@ abbreviations, and uninterpreted text fragments. New facets and indexes can easily be added even without mastering XSLT, along the lines of the existing ones and by following the detailed instructions provided in the EFES Wiki + target="softw:efes"/>EFES Wiki documentation.

Accessed July 21, 2021, . Creation of new facets, last updated April 11, 2018: . Creation of new indexes, last updated May 27, 2020: .

- Furthermore, EFES makes it possible to create an epigraphic concordance of the various editions of each inscription and to add information pages as TEI XML files (suitable for displaying both information on the database itself and potential additional accompanying information).

Against this background, the combined use of the EpiDoc encoding and of the - EFES tool seemed to be a promising approach for + type="software" xml:id="R26" target="softw:efes"/> + EFES tool seemed to be a promising approach for the purposes of my research project, and so it was.

I initially aimed to create updated digital editions of the inscriptions mentioning Cretan institutional elements that could be used to facilitate a comparative analysis @@ -449,7 +449,7 @@ inscriptions in EpiDoc, totally met my needs, and helped me very much in the identification of recurring patterns. As I was expected to submit my doctoral thesis in PDF format, I also needed to convert the epigraphic editions into PDF, and by - running EFES locally I have been able to view their transformed HTML versions on a browser and to naively copy and paste them into a Microsoft Word file.

I am very grateful to Pietro Maria Liuzzo for teaching me how to @@ -457,10 +457,10 @@ directly from the raw XML files. The use of XSL-FO, however, requires some additional skills that are not needed in the copy-and-paste-from-the-browser process.

Although I had not planned it from the beginning, EFES also proved to be useful in the (online) publication of the results of - my research. The ease with which EFES allows the creation of a searchable + my research. The ease with which EFES allows the creation of a searchable epigraphic database, in fact, spontaneously led me to decide to publish it online once completed, making available not only the HTML editions—which can also be downloaded as printable PDFs—but also the raw XML files for reuse. The aim of the @@ -471,8 +471,8 @@
Cretan Institutional Inscriptions: An Overview of the Database -

The core of the EFES-based database Cretan + <p>The core of the <ptr type="software" xml:id="R30" target="softw:efes"/><rs + type="cit:soft.name" ref="#R30">EFES</rs>-based database <title level="m">Cretan Institutional Inscriptions consists of the EpiDoc editions of the previously selected six hundred inscriptions, which can be exported both in PDF and in their original XML format. Each edition is composed of an essential descriptive @@ -511,7 +511,7 @@ level="a">Literary sources, and Bibliographic references, have been added to the database as pages generated from TEI XML files, which could be natively included in EFES.

+ target="softw:efes"/>EFES.

As mentioned above, the database also includes several thematic indexes listing the marked-up terms along with the references to the inscriptions in which they occur, divided into institutions, toponyms and ethnic adjectives, lemmata (both of @@ -687,8 +687,8 @@ type="crossref"/> (I.Cret. II 23 5). -

Given the markup described above, EFES was able to generate detailed indexes +

Given the markup described above, EFES was able to generate detailed indexes having the appearance of rich tables, where each piece of information is displayed in a dedicated column and can easily be combined with the other ones at a glance.

In the most complex case, that of the institutions, the index displays for each @@ -726,8 +726,8 @@ An excerpt from the prosopographical index.

In addition to the more tabular institutional and - prosopographical indexes, EFES facilitated the creation of other more + prosopographical indexes, EFES facilitated the creation of other more traditional indexes, including the indexed terms and the references to the inscriptions that mention them. The encoding of the most significant words with w lemma="" led to the creation of a word index of relevant @@ -745,7 +745,7 @@

Conclusions

In conclusion, I would like to emphasize how particularly efficient the combined use - of EpiDoc and EFES has proven to be for the creation of a thematic database like Cretan Institutional Inscriptions. By collecting in a searchable database all the inscriptions pertaining to the Cretan institutions, records that were hitherto @@ -779,13 +779,13 @@ London: Ubiquity Press. https://doi.org/10.5334/bat.d. - - + + - Bodard, Gabriel, Yordanova, Polina. + Bodard, Gabriel, Yordanova, Polina. 2020. Publication, Testing and Visualization - with <rs type="soft.name" ref="#R35">EFES</rs>: A Tool for All Stages of the + with <rs type="cit:soft.name" ref="#R35">EFES</rs>: A Tool for All Stages of the EpiDoc XML Editing Process. Studia Universitatis Babeș-Bolyai Digitalia, no. 1: 17–35. Transkribus sofware by READ + type="software" xml:id="Transkribus" target="softw:transkribus"/>Transkribus sofware by READ Coop.Accessed February 2, 2022, .

The transcription matches and complements the cataloguing efforts of the project, @@ -164,7 +164,7 @@ an historical catalogue that involves copying from the former cataloguer transcription. Having a new transcription, based on autopsy or at least on the images of the manuscript would be preferable and technology as Transkribus allows one to obtain this transcription in an almost entirely automated way. Additionally, most of the internal referencing within a manuscript is done with the indication of the ranges of folios, and in TEI with @@ -215,7 +215,7 @@

The following steps have been taken to carry out an investigation of the possibilities for the automated production of text transcriptions based on images of manuscripts, before we opted for Transkribus + target="softw:transkribus"/>Transkribus and its integration in the workflow to make texts available in the Beta maṣāḥǝft research environment.

@@ -304,8 +304,8 @@ one script.

- Transkribus + Transkribus

This software is freely accessible and has a subscription model based on credits. The platform was created within the framework of the EU projects tranScriptorium and READ (Recognition and Enrichment of Archival Documents - @@ -317,8 +317,8 @@ platform. The Pattern Recognition and Human Language Technology (PRHLT) group of the Universitat Politècnica de València and the CITlab group of the University of Rostock should be mentioned in particular.

-

Transkribus comes as an +

Transkribus comes as an expert tool in its downloadable version and its online version,Accessed February 2, 2022, . and it allows to upload @@ -350,13 +350,13 @@ Ethiopic handwriting character recognition.

Thus, the first stage for developing a model was gathering the data and preparing an initial dataset. Also for this aspect, Transkribus + target="softw:transkribus"/>Transkribus proved superior to all other options offering support also for this step. Colleagues which we called to contribute could be added to a collection, share their images without publishing them and add their transcriptions in the tool with a very mild learning curve.

-

Within Transkribus we have trained a model +

Within Transkribus we have trained a model called Manuscripts from Ethiopia and Eritrea in Classical Ethiopic (Gǝʿǝz).See, accessed February 2, 2022,

Training a model in Transkribus

Gathering data to train an HTR model in Transkribus + target="softw:transkribus"/>Transkribus was not easy. Researchers were directly asked to contribute images of which they had already done the correct transcription. Sets of images with the relative transcription was thus obtained thanks to the generosity of contributors listed @@ -450,7 +450,7 @@ for the available time of the colleagues to fix the work of the machine, since we intended to train the model again. After three months with a full-time dedicated person, we had more than 50k words in the Transkribus + target="softw:transkribus"/>Transkribus expert tool, and we could train a model which could be made public, since this is the unofficial threshold to make a model available to everyone.

The features of the final model can be seen in

Adding transcriptions to Beta maṣāḥǝft from Transkribus

Even if a user already worked through each page of a manuscript to produce a transcription, doing it again with Transkribus + target="softw:transkribus"/>Transkribus and checking it has many advantages, chiefly the alignment of the text regions and lines on the base image to the transcription.Guidelines are provided for this steps to the users in theproject Guidelines, @@ -483,7 +483,7 @@ />.

With the transcribed images, either by hand with the help of the tool, or using the HTR model, the export functionalities of the Transkribus tool, allow to download a TEI encoded version of this transcription where we encourage users to use Line Breaks (lb) instead of l and preserve the coordinates of the boxes.

@@ -497,10 +497,10 @@ manuscript and not of the image set. Most of this can be fixed by preparing the image set accurately, but we assume in most real-life use cases this will not be the case.

-

We have then prepared a bespoke XSLT transformation which can be used to +

We have then prepared a bespoke XSLT transformation which can be used to transform the rich TEI from Transkribus, + target="softw:transkribus"/>Transkribus, called transkribus2Beta maṣāḥǝft.xsl. This transformation, given a few parameters, @@ -523,8 +523,8 @@

Conclusions -

Working with Transkribus for the Beta maṣāḥǝft project +

Working with Transkribus for the Beta maṣāḥǝft project gives the community of users a way to support the process of transcribing to the text on source manuscripts without typing it down. This is not intended to substitute the work of the editor of a text, but to support it, producing a transcription that still @@ -557,12 +557,12 @@ 2018–. Beta Maṣāḥǝft Guidelines. . - - Wick, - Christoph, Christian Reul, and + Wick, + Christoph, Christian Reul, and Frank Puppe. 2020. <rs type="soft.name" ref="#R1">Calamari</rs> − A High-Performance + level="a"><rs type="cit:soft.name" ref="#R1">Calamari</rs> − A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition. Digital Humanities Quarterly, , Herbert Wurster, and Konstantinos Zagoris. 2019. Transforming scholarship in the archives through handwritten text recognition: <ptr type="software" - xml:id="Transkribus" target="#transkribus"/><rs type="soft.name" + xml:id="Transkribus" target="softw:transkribus"/><rs type="cit:soft.name" ref="#Transkribus">Transkribus</rs> as a case study. Journal of Documentation, 75 (5) ), an initiative launched in 2016 under the auspices of the DARIAH Working Group on Lexical Resources, which aims to define a pivot format for the integration and querying of heterogeneous TEI-based lexical resources.See the - project’s GitHub repository, accessed June 17, 2022, .

@@ -1192,7 +1192,7 @@

In the following examples, we have tried to illustrate interesting cases of etymological processes that show how TEI Lex-0 Etym can seamlessly take into account a variety of situations. All examples have been validated and included in the TEI - Lex-0 GitHub environment.