Unproper handling of national characters #433

ArturRuta · 2025-02-07T12:20:20Z

Occassionally when processing articles wallabag is not handling properly the national characters.
The behavior is deterministic in the sense that for a guiven page it's allways the same, it either processes it properly or it doesn't.

This url corresponds to a page that it's allways unproperly processed: error example

You will find that the article tittle, even if containing national characters it's properly handled. For example it contais the work: más
On the other side, the article contents is not propoerly handled. Very early in the text you can see for example the word automÃ³vil that is wrong. It should look like automóvil instead.

Surprisingly enoug some articles are properly handled. This url from the same site contains naional characters as well but is propoerly handled correct sample

I've done some research.

Looking into the prostgres tables were content is recorded i see in the entry table that the content is already trashed there. Therefore is not a matter on how it's rendered/shown when the articles are presented. Problem arises earlier when parsing the article.
I've tested the problem URL at the site f43.me and the problem is reproduced. Text shows unproper combinations when national characters are present. When I enable debug in this site...well, no errors are reported. Curiously enough the languaje is properly identified as es (which stand for spanish)
Finally I've enabled grabby debut logs, collected them during articler parsing and will attach them to this case

ArturRuta · 2025-02-07T12:24:21Z

Please find the graby log below
graby.log

As far as I can see there's no error reported and it can be seen there that contents have garbage characters...but moving from there to a possbile solutions is really beyond my capabilitees.

Thanks a lot in advance for any help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unproper handling of national characters #433

Unproper handling of national characters #433

ArturRuta commented Feb 7, 2025

ArturRuta commented Feb 7, 2025 •

edited

Loading

Unproper handling of national characters #433

Unproper handling of national characters #433

Comments

ArturRuta commented Feb 7, 2025

ArturRuta commented Feb 7, 2025 • edited Loading

ArturRuta commented Feb 7, 2025 •

edited

Loading