Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF 8 file parsing failed #6

Open
anupshaw50 opened this issue Mar 10, 2022 · 5 comments
Open

UTF 8 file parsing failed #6

anupshaw50 opened this issue Mar 10, 2022 · 5 comments

Comments

@anupshaw50
Copy link

Hi,

on file _parsing.py the function read_file(path: str) fails to read file exported from pubmed.
Replace this line with line 20
with open(path, mode='r', encoding='utf-8') as f:

sample file attached that fails parsing, please remove the txt at the end

test.nbib.txt

@holub008
Copy link
Owner

Hi @anupshaw50 - when encoding is not specified to open, it uses your platform default. Could you share your operating system and what the below reports, for my edification?

import sys
sys.getdefaultencoding()

I hesitate to hardcode the encoding as you're suggesting, although given that I've written this lib to focus on exports from PubMed itself, it may make sense to do so.

@holub008
Copy link
Owner

FYI, as a temporary workaround, you may use nbib.read, instead of nbib.read_file, which is a thin wrapper.

@anupshaw50
Copy link
Author

Hi, its utf-8 , i am using windows

@holub008
Copy link
Owner

If sys.getdefaultencoding() is UTF-8, then I wouldn't expect you to encounter an issue with parsing a PubMed-generated nbib file (which would in fact be UTF-8 encoded). Right?

@holub008
Copy link
Owner

Could you share the code & error on your system when running the nbib file attached to this issue? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants