Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Apmaranca committed Feb 26, 2024
0 parents commit 1b3640a
Show file tree
Hide file tree
Showing 9 changed files with 501 additions and 0 deletions.
19 changes: 19 additions & 0 deletions .github/workflows/text-validation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: Text validation

on: [push, pull_request]

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v1
- name: Set up Python 3.8
uses: actions/setup-python@v1
with:
python-version: 3.8
- name: Run text
run: |
pip install text-validator
validate-text text-validator.toml text/*.txt
427 changes: 427 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# {{ NAME OF TEXT }}

{{ describe what is being done, the process being followed, and who is involved in the work }}

This text is being prepared as part of the [Greek Learner Texts Project](https://greek-learner-texts.org/).

## Contributors

* {{ list of people who have contributed to this repo }}

## Source

{{ indicate original source(s) of text: scans or existing transcriptions }}

## Progress

{{ indicate progress, or remove entire section if done }}

## License

This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/).
1 change: 1 addition & 0 deletions analysis/README
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Token-level analysis like lemmatisation or postagging can go here.
1 change: 1 addition & 0 deletions docs/README
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Generated HTML versions of the texts should go here.
1 change: 1 addition & 0 deletions orig/README
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Any original files (scans and transcriptions) can be placed here.
1 change: 1 addition & 0 deletions scripts/README
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Any scripts used in the preparation of the texts can go here.
28 changes: 28 additions & 0 deletions text-validator.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
["text_validator.plugins.whitespace"]
CHECK_CRLF = true
CHECK_TABS = true
CHECK_TRAILING_WHITESPACE = true
CHECK_NO_EOF_NEWLINE = true

["text_validator.plugins.unicode"]
CONFIRM_UTF_8_NFC = true

["text_validator.plugins.ref_line_format"]
REF_REGEX = "\\d+\\.\\d+$"

["text_validator.plugins.characters"]
REPLACE_CHARS = [
# bad character, suggested replacement
["\u02BC", "\u2019"],
["\u1FBF", "\u2019"],
["\u037E", "\u003B"],
["\u0387", "\u00B7"],
["\u0374", "\u02B9"],
["\u03D5", "\u03C6"],
["\u03D1", "\u03B8"],
]
TOKEN_REGEXES = [
# each whitespace-separated token must match one of these regexes
"\\d+\\.\\d+$",
"[«(]*[\u0370-\u03FF\u1F00-\u1FFF]+\u2019?[.,:;»)·]*$",
]
2 changes: 2 additions & 0 deletions text/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
* the prepared text in our textpart-per-line format with dotted ref
* can be multiple *.txt files but, if there is an inherent order to the files, this should be reflected in the sort order of the filenames

0 comments on commit 1b3640a

Please sign in to comment.