Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Characters that are not used for Arabic/Persian #277

Open
xfq opened this issue May 19, 2024 · 2 comments
Open

Characters that are not used for Arabic/Persian #277

xfq opened this issue May 19, 2024 · 2 comments
Assignees
Labels
question s:arab Arabic script (Used for arb + pes)

Comments

@xfq
Copy link
Member

xfq commented May 19, 2024

https://www.w3.org/TR/alreq/#h_character_tables_punctuation_and_symbols

There are some characters that are not used for Arabic (like U+0020 SPACE and U+002A ASTERISK), and some characters that are not used for Persian (like U+0022 QUOTATION MARK). I wonder what the criteria are for selecting these characters?

@xfq xfq added the question label May 19, 2024
@shervinafshar shervinafshar self-assigned this May 20, 2024
@shervinafshar
Copy link
Contributor

For Persian, there is a standard—ISIRI-9147, pp. 17-19 of PDF—available. For Arabic, we couldn't surface such document and if I recall it correctly, we relied on CLDR data and the case of U+0020 for Arabic seems to be an error. We probably need to revisit this section for Arabic.

Also, if you were unaware, we provisionally recorded our non-normative references in a spreadsheet here with the objective of migration the content eventually to the document. I added #278.

@r12a r12a added the s:arab Arabic script (Used for arb + pes) label Jun 29, 2024
@avidseeker
Copy link

The following tables list Unicode characters used for Arabic script.

What does this mean? Is it that these characters are available in Arabic keyboard layouts? Or that they're commonly used in online Arabic texts? and what sort of Arabic (Classical Arabic or Modern Standard Arabic)? This needs to be clarified.

  • For example U+0671 ARABIC LETTER ALEF WASLA is used in Quran and Classical Arabic manuscripts, but not in MSA. Same goes for U+0653-U+670

Until such standard for Arabic is published, it is safe to dismiss U+0020 SPACE and U+002A ASTERISK as not being used. It is easy to verify that those characters are used in numerous Arabic books and online webpages.

As for why Persian doesn't use U+0022 QUOTATION MARK, this is because quotation marks differ depending on the locale. See this Wikipedia table. They also need not to be used in Arabic text, but since they're in the default layout, they are used as semantic quotes rather than typographical quotes (e.g: see jgm/pandoc#10013).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question s:arab Arabic script (Used for arb + pes)
Projects
None yet
Development

No branches or pull requests

4 participants