Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported language/site (vk.com) #56

Open
Dzhuks opened this issue Aug 30, 2024 · 4 comments
Open

Unsupported language/site (vk.com) #56

Dzhuks opened this issue Aug 30, 2024 · 4 comments
Labels
bug Something isn't working unsupported-site Sites that reject requests from Slurp or use scripts that prevent Slurp from working

Comments

@Dzhuks
Copy link

Dzhuks commented Aug 30, 2024

I encountered an issue while trying to extract text from articles on a Russian social media site, VK. The articles on VK were not processed correctly—the Russian text appeared garbled and unrecognizable. You can see an example of this issue in the article from this URL: Escape from Google Translate.

Slurp
Original

Initially, I suspected that the problem was due to the Russian language itself. However, I tested the extraction process on an article from a Russian news site, and it worked perfectly. Here's an example article that was processed correctly: How to Transfer Money to Kazakhstan from Russia in 2023-2024.

Slurp
Original

This indicates that the issue is specific to the VK platform rather than the Russian language as a whole.

@inhumantsar
Copy link
Owner

interesting! thanks for digging into whether it was a language or site issue. it is strange that VK produces that kind of garbled text, since that would usually indicate an unsupported encoding. it's unlikely they would be using an older standard, like ISO or Windows-1251.

I've got a few bugs like this queued up and will hopefully be writing fixes for them in the next week or two. I'll look for a cause this morning tho and will comment if I find it.

thanks for the report!

@inhumantsar inhumantsar added bug Something isn't working unsupported-site Sites that reject requests from Slurp or use scripts that prevent Slurp from working labels Aug 30, 2024
@inhumantsar
Copy link
Owner

no obvious cause but Firefox's reader view displays the page correctly, so it is likely something to do with Slurp or Obsidian

@inhumantsar
Copy link
Owner

interestingly, it slurped fine on my android device.

Screenshot_20240830-093457.png

can you provide some detail on your setup? OS version and Obsidian version especially

@Dzhuks
Copy link
Author

Dzhuks commented Sep 4, 2024

OS version: Windows 11
Obsidian: 1.67
Slurp: 0.1.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unsupported-site Sites that reject requests from Slurp or use scripts that prevent Slurp from working
Projects
None yet
Development

No branches or pull requests

2 participants