-
-
Notifications
You must be signed in to change notification settings - Fork 43
Possible Enhancement: Ensure non-empty <title> tags in HTML front articles contain valid strings #462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@kazuhidelee Please open the issue corresponding to this PR. There is no reason pure alphanumeric titles shoukd not work. |
|
I've opened a new issue and modified to code to allow pure alphanumeric titles |
|
Please commit only the necessary code to fix the issue. Don't change the code indentation. |
a27cc29 to
b22f478
Compare
|
|
@kazuhidelee Who says that points (1) and (2) lead to invalid titles? Why? |
|
I initially made the decision to treat short or purely numeric titles as invalid based on two things:
With that being said, I’d love your input here, and I’m happy to adjust accordingly! |
|
I anything is not working well with a short title, then a dedicated issue should be open because this sounds like a bug. The only invlid scenario to me is the emoty title and I would like to know:
|
|
|
Then I don't understand why current behaviour (without your patch) is not correct!? Can you please be very precise, list cases with DOM title and filename and resukted ZIM title? |
b22f478 to
363e9f3
Compare
Fixes #463
Description: This PR introduces some potential enhancement to the validation and handling of the <title> tag in HTML front articles. It ensures that the <title> tag is neither empty nor composed entirely of numeric characters/special characters, which could potentially cause issues with the Kiwix suggestion system. If the <title> tag is a invalid string, a default title is generated to ensure robustness in the system.
Changes: Added the isValidTitle function to validate that the <title> tag:
Has a minimum length.
Is not purely numeric.
Contains at least one alphanumeric character.
Updated the parseAndAdaptHtml method to handle missing or invalid <title> tags by generating a default title.
Added warning logs when an invalid title is detected to make debug easier.
Added unit testing to test the functionality and correctness of the isValidTitle function.
This enhancement improves the reliability of the Kiwix suggestion system and ensures that all HTML front articles will have a valid title tag to prevent any unexpected behavior.
Possible Enhancements:
The length and criteria for a valid title can be changed to make it fit better to the Kiwix system.
Explore additional validation rules for the <title> tag to account for other edge cases.