Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast XML bytes to str to avoid encoding issues #31

Merged
merged 2 commits into from
Dec 10, 2022
Merged

Conversation

vgambier
Copy link
Collaborator

Intended to fix #29
Haven't tested it yet, wanted to double-check with you first because of the comments discouraging messing with the encoding.
Another alternative would be to add something like xml = xml.decode("utf-8)

@elsiehupp
Copy link
Member

The unit tests are still broken in the python3 branch, so I can't test it that way, but I'll try running the command from this comment in the original issue. (I'll do this on elementary OS, since it's the closest thing I have to Linux Mint.)

@robkam
Copy link
Member

robkam commented Dec 9, 2022

The unit tests are still broken in the python3 branch,

Is this the fails mentioned at #7 and are these fails what's blocking progress on Wikiteam3?

@elsiehupp
Copy link
Member

Is this the fails mentioned at #7 and are these fails what's blocking progress on Wikiteam3?

Among other things, yes. I'm also just super not on top of things. 😬 It doesn't help that I don't work in Python very much, so I'm extremely rusty.

I had been doing some work in the prepare-for-publication branch, but it has become increasingly out of date with the ad-hoc bug-fixes in the python3 branch. (The big thing I was doing was introducing mypy type-checking, which involved substantial rewrites. I mean you can fiddle around with that branch if you'd like...)

Anyway I just pushed a commit in this Pull Request that seems to fix the encoding issues, and I have repeatedly tested it using the following command:

dumpgenerator http://wiki.othing.xyz --xml

It works on my computer with elementary OS; could either of you—@vgambier or @robkam—try testing it on Windows? If it works on Windows at this point, then we can safely merge it!

As before, to check out this commit, you can do:

git pull && git checkout fix-bytes-regex

@elsiehupp
Copy link
Member

Hi @yzqzss, @Dss0 & @RedSparr0w—could you give this Pull Request a try? Thanks!

(Also apologies that I have been extremely slow to respond here...)

@elsiehupp elsiehupp changed the title use bytes regex to avoid string/bytes conflict Cast XML bytes to str to avoid encoding issues Dec 9, 2022
@elsiehupp
Copy link
Member

By the way, I have no idea why the Pre-Commit CI is failing. It looks like it's a configuration issue. I will have to dig into it further.

@vgambier
Copy link
Collaborator Author

vgambier commented Dec 9, 2022

I'm sorry, I don't have access to a Windows machine. Thank you for the response :)

@RedSparr0w
Copy link

This PR seems to be working well for me on the following:
Python 3.9.6 - Debian 11

dumpgenerator --xml
dumpgenerator --xml --curonly

@RedSparr0w
Copy link

RedSparr0w commented Dec 10, 2022

Not sure if it's just this PR or not, but if I'm not using --xmlrevisions it seems to take a lot longer to fetch all the data.
Using the --xmlrevisions arg, takes about 10→15 minutes.
But without it, it's been about an hour and maybe 20% done.

@robkam
Copy link
Member

robkam commented Dec 10, 2022

dumpgenerator http://wiki.othing.xyz --xml works fine on Windows.
Thank you for supporting Windows!

@robkam
Copy link
Member

robkam commented Dec 10, 2022

I failed to get pip install --force-reinstall dist/*.whl to work with Python 3.11.1 from Python.org on Windows 10, in Git Bash in Windows Terminal. It worked when I used Python 3.10 from the Microsoft Store and using Command Prompt in Windows Terminal.

@robkam robkam merged commit 58baa10 into python3 Dec 10, 2022
@yzqzss yzqzss deleted the fix-bytes-regex branch February 9, 2023 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError: cannot use a string pattern on a bytes-like object
4 participants