Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul the search infrastructure, add stress underlining, and add RN support #133

Draft
wants to merge 59 commits into
base: master
Choose a base branch
from

Conversation

Willem3141
Copy link
Owner

@Willem3141 Willem3141 commented Mar 24, 2024

Until now, searching used to be done using a single JS object indexed by word:type strings. That means, to search for a noun we'd do two searches into this object: query:n and query:n:pr. And to search for a verb, we'd do many searches (query:v:in, query:v:tr, ...). And worse, to search for anything else, we'd just linear search through the list. This clearly is suboptimal, but was grown this way for historical reasons.

This PR is a big refactor to improve this. Now, the word database is simply an array. A word can be indexed in two ways: (1) with their index in the array, and (2) with a word/type pair like before, but now we do a preprocessing step to create an object that maps words (independently of their type) to a list of array indexes. This way we can answer questions like finding a word in essentially O(1) time instead of O(n).

A secondary benefit (actually the one because of which I finally decided to get started with this big refactor) is that now that we are not using the "na'vi" field for searching anymore, it becomes possible to put some extra data in the "na'vi" field. Specifically Reykunyu has pronunciation and infix data, but these are separate fields from the "na'vi" field, so it is not (easily) possible to show infix dots or underline the stressed syllable in the lemma itself. This PR allows indicating the syllable boundaries with "/" and the stressed syllable with "[...]"; this is then shown in the lemma display, without affecting searches. Also, we can now put ù in the field so that we can automatically create Reef Na'vi forms, if the user wants to see RN. As such, this PR also adds a (as of yet still non-functional) setting to the UI to switch between dialect modes: FN, RN, or a combined mode.

Right now all of this is very experimental. Many features (e.g., the all words page and the editor) don't work. I still have to reconsider how to exactly deal with the FN/RN distinction. For example, in word links what dialect should be used? I don't want to have to write [[to]/rùk:n] for a word link to [toruk:n]. So, should word links just use FN spelling?

This PR also adds help and API pages, and custom HTTP error pages.

When this all works, this should address #18, #70, #77, #85, #105, and #135. Eventually, it could also enable addressing #36, #42, and #57.

Until now, searching used to be done using a single JS object indexed by
word:type strings. That means, to search for a noun we'd do two searches
into this object: query:n and query:n:pr. And to search for a verb, we'd
do many searches (query:v:in, query:v:tr, ...). And worse, to search for
anything else, we'd just linear search through the list. This clearly is
suboptimal, but was grown this way for historical reasons.

This commit is a big refactor to improve this. Now, the word database is
simply an array. A word can be indexed in two ways: (1) with their index
in the array, and (2) with a word/type pair like before, but now we do a
preprocessing step to create an object that maps words (independently of
their type) to a list of array indexes. This way we can answer questions
like finding a word in essentially O(1) time instead of O(n).

A secondary benefit (actually the one because of which I finally decided
to get started with this big refactor) is that now that we are not using
the "na'vi" field for searching anymore, it becomes possible to put some
extra data in the "na'vi" field. Specifically Reykunyu has pronunciation
and infix data, but these are separate fields from the "na'vi" field, so
it is not (easily) possible to show infix dots or underline the stressed
syllable in the lemma itself. This commit allows indicating the syllable
boundaries with "/" and the stressed syllable with "[...]"; this is then
shown in the lemma display, without affecting searches. Also, we can now
put ù in the field so that we can automatically create Reef Na'vi forms,
if the user wants to see RN. As such, this commit also adds a (as of yet
still non-functional) setting to the UI to switch between dialect modes:
FN, RN, or a combined mode.

Right now all of this is very experimental. Many features (e.g., the all
words page and the editor) don't work. I still have to reconsider how to
exactly deal with the FN/RN distinction. For example, in word links what
dialect should be used? I don't want to have to write [[to]/rùk:n] for a
word link to [toruk:n]. So, should word links just use FN spelling?

When this all works, this should address #70, #77, and #105. Eventually,
it could also enable addressing #36, #42, and #57.
Willem3141 and others added 7 commits May 24, 2024 23:27
Word data that is independent of the user query is now processed when
the dictionary file is loaded, on Reykunyu startup, instead of every
time the user does a query.

Fixes #138.
Move word data processing out of the postprocessing step
@Willem3141 Willem3141 changed the title Overhaul the search infrastructure Overhaul the search infrastructure, add stress underlining, and add RN support May 27, 2024
And show only the pronunciation IPA of the selected dialect.
@Willem3141 Willem3141 linked an issue May 27, 2024 that may be closed by this pull request
@Willem3141
Copy link
Owner Author

New plan: it would take a very long time to implement all intricacies of RN, so let's move that out of this PR. Instead, let's just merge the current limited RN support, and show a warning when the user selects RN mode.

@Willem3141
Copy link
Owner Author

Willem3141 commented Jun 9, 2024

New to-do list:

  • Add warning to the RN mode
  • Make the rhyme search work again
  • Make the editor work again
  • Make the source and etymology editors work again
  • Ask people to do beta testing
  • Figure out why nouns ending on -ng aren't found
  • Figure out why Aonungit isn't found
  • Figure out why proper names don't have a conjugation table anymore (Conjugation tables don't get shown for proper nouns in the web frontend #148)
  • Make sure the Discord bot doesn't completely break

Because the dialect was only set after the on-page-load search was
triggered, the search would always use the combined dialect.
This way, clicking them doesn't result in a page reload.
Fixes #148. The proper noun doesn't properly get capitalized yet, but
that's another issue.
The problem was an overzealous parsing rule. If a noun ends on -g (after
other affixes, such as -ìl, already were removed) then the rule replaced
the -g by -kx, and because that's the only possible option, I didn't
even make it bother to check the variant with -g. That doesn't work
however for nouns ending in -ng, because it would become -nkx, which
obviously doesn't make sense.

Fixed this by just also checking the unchanged variant (exactly how all
the other rules work).
Tatlam ke lengu oe tsùlfätu lì'fyaye wione.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment