Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup of ARIE source code #274

Open
10 tasks
arlogriffiths opened this issue Apr 4, 2024 · 3 comments
Open
10 tasks

cleanup of ARIE source code #274

arlogriffiths opened this issue Apr 4, 2024 · 3 comments
Assignees

Comments

@arlogriffiths
Copy link
Collaborator

arlogriffiths commented Apr 4, 2024

I am starting to take notes on this subject:

  • many explicit hyphens should be replaced by <lb break="no"/>, e.g. "Vaiśā-kha" and "French Settle-ments" in https://erc-dharma.github.io/arie/docs/arie_073_1954-1955.html
  • same in GitHub/arie-corpus/ARIE_1960-2019/arie_094_1975-1976.html
  • I have also seen non-ASCII characters that clearly represent some meaning on the part of Malten at time of data entry but do not correspond with anything that could have been printed (and that should be displayed) as such — alas, I can't find a specimen now but I am sure @michaelnmmeyer can identify all non-ASCII characters in the datasets we have
  • there are uncoverted characters like E1lēśvaram in https://erc-dharma.github.io/arie/docs/arie_073_1954-1955.html
  • in cases like this, "Uggavalah2", I presume h2 stands for ḥ
  • ELLAPPUḌAIYĀN6PAṬṬI in GitHub/arie-corpus/ARIE_1960-2019/arie_094_1975-1976.html: where N6 should be Ṉ
  • in GitHub/arie-corpus/ARIE_1960-2019/arie_098_1979-1980.html the name of the appendix B is, undesirably, made explicit
    image
    only B is required: "ARIE/1979-1980/B/1979-1980/232 (page 87)"
  • ibidem in ARIE 1891-1892:
    Arie_cleanArie_clean
  • "h2" to be replace by "ḥ" most of the time (but beware it is found in Arabic terms, for which "ḥ" might not be relevant.
  • <arie n="080" ref="ARIE1961-1962"> not displayed as not in XML source file. @manu: check TXT source file.
@michaelnmmeyer
Copy link
Member

For this I need to consult the printed version. Do we have PDFs somewhere?

@arlogriffiths
Copy link
Collaborator Author

arlogriffiths commented Apr 4, 2024

I didn't mean that the work needs to start immediately. Please consult with @manufrancis first on when and how to plan, then execute, this work.

Yes, we have pdfs on Sharedocs in PDF Library, then J-ARIE.

I will keep adding tasks to the list above as I find new things that need correction.

@manufrancis
Copy link
Collaborator

manufrancis commented Apr 4, 2024

@ Arlo:
Thanks for setting up this list, which I will also use.

@ Michaël
Let us have a first look at this next week, when we meet Tuesday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants