Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy up sais module #53

Merged
merged 2 commits into from
Feb 6, 2025
Merged

Tidy up sais module #53

merged 2 commits into from
Feb 6, 2025

Conversation

ajalab
Copy link
Owner

@ajalab ajalab commented Feb 5, 2025

Related: #48

This PR

  • moves sais module under suffix_array module
  • renames sais() function to build_suffix_array for better code readability.

Also, some functions in sais module like count_chars and get_bucket_start_pos are not specific to SA-IS algorithm nor suffix array. FMIndex::create uses these functions to build cs, but this misleads readers that the cs structure construction is related to SA-IS. Maybe these functions should reside in characters.rs or another module.

This part mentioned in #48 has not been addressed in this PR yet. If we move count_chars and get_bucket_start_pos to characters module, a circular dependency occurs:

  • characters module depends on converter module, as count_chars requires Converter.
  • converter module depends on character module, as Converter requires Character.

While this might be handled by Rust language well, it'd be better to avoid this kind of complexity if possible. One option is to reorganize character module like below.

Module Consists of Depends on
character Character -
character::count count_chars, get_bucket_start_pos character::Character
character::converter Converter character::Character

Anyway, let me postpone relocating count_charts and get_bucket_start_pos for now.

@ajalab ajalab requested a review from faassen February 5, 2025 00:27
@ajalab ajalab self-assigned this Feb 5, 2025
@ajalab ajalab linked an issue Feb 5, 2025 that may be closed by this pull request
@faassen
Copy link
Collaborator

faassen commented Feb 5, 2025

Just as a small comment: I haven't had problems with circular dependencies in Rust. But of course from a design perspective it makes sense to have things be sense contained. I like the organization in your table, as it's logical converter is related to character. I'm not a big fan of putting code other than use and pub statements in a mod.rs, so I'd be inclined to have a character::core module that contains Character. And then the mod.rs just exports everything relevant to the rest of the crate.

@@ -52,7 +53,7 @@ where
fn create(text: Vec<T>, converter: C, get_sample: impl Fn(&[u64]) -> S) -> Self {
let text = prepare_text(text);
let cs = sais::get_bucket_start_pos(&sais::count_chars(&text, &converter));
let sa = sais::sais(&text, &converter);
let sa = sais::build_suffix_array(&text, &converter);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely a lot easier to read! It took me a while to figure out that this was a suffix array and now it's immediately obvious.

@ajalab ajalab merged commit fe5eb54 into master Feb 6, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tidy up sais.rs
2 participants