Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extractor configurations for Amharic Language #759

Conversation

Meti-Adane
Copy link
Contributor

No description provided.

Copy link
Member

@jimkont jimkont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 👍 , except for a minor comment on the date time parser

"ሰኔ" -> 10,
"ሐምሌ" -> 11,
"ነሐሴ" -> 12,
"ጳጉሜ" -> 13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 13 would be an invalid month value here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in addition, google translate provides this as a translation to the above

"September" -> 1,
 "October" -> 2,
 "November" -> 3,
 "December" -> 4,
 "January" -> 5,
 "February" -> 6,
 "March" -> 7,
 "April" -> 8,
 "May" -> 9,
 "June" -> 10,
 "July" -> 11,
 "August" -> 12,
 "Pagume" -> 13

not sure how accurate this translation is or if there is a calendar difference but, in the general case January should be mapped to 1 not September (and accordingly the other values)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ethiopian calendar system has 13 months. we use a different calendar system :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I thought so... :)
the thing is that this needs to map to the Gregorian calendar. Values with month 13 will fail to get parsed and we will loose that triple, maybe map that to 12 or skip that line? Not sure what would result in fewer errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Among other things...

google translate provides this as a translation

It would seem that this ought to result in a bug report (from/by a properly knowledgeable reporter such as @Meti-Adane) against Google Translate, as the 13 Amharic month names (with twelve lengths being 30 days and one length being 5 or 6 days; and with New Year's Day on September 11 or 12, depending on leap years) obviously cannot be directly translated to the 12 Gregorian month names (with lengths of 28, 29, 30, or 31 days; and with New Year's Day always on January 1).

As to the mapping here, the handling of Hebrew months (closely tied to the lunar cycle; twelve months of 29 or 30 days, plus a thirteenth leap month every few years) might provide some valuable hints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimkont — "this needs to map to the Gregorian calendar"... but why? It seems to me that this PR reveals a significant flaw in a number of systems. There ought to be a way to express a date in any calendar, which some systems might offer to translate to one or more other calendars, including Gregorian, Amharic, Hebrew, etc. Is translation among some of these easy? Lots of systems will support them. Is translation among some of these hard? Fewer systems will support them, or there will be a few libraries produced to handle such translations.

Net of all this -- I think there ought to be an issue about non-Gregorian calandar ingestion and preservation. This is not too different from the (ongoing) efforts to losslessly handle data using multiple geocoordinate systems for Earth plus other celestial bodies (Moon, Mars, etc.).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the prompt @TallTed , I created an issue here: #761

@Meti-Adane we could still merge this PR and possibly only remove the date mappings until the issue is resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimkont Well noted. I have implemented a date converter for Ethiopian calendar (Ethiopian to Gregorian). I will push the change list in a separate PR to make the review easier.

@Meti-Adane Meti-Adane marked this pull request as draft July 11, 2024 10:15
@Meti-Adane Meti-Adane closed this Aug 13, 2024
@Meti-Adane Meti-Adane deleted the Amharic-Chapter-Add-Extractor-Configs branch August 13, 2024 19:22
@Meti-Adane
Copy link
Contributor Author

@jimkont @TallTed I have disintegrated this PR to smaller ones.

I have created this PR to handle nuances in the Ethiopian Calendar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants