-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 2270: invalid continuation byte #41
Comments
Hmm I think using python2.7 will solve this, or try |
@da03 , oh, it worked! |
Hi, @da03 , I want to confirm whether the processing in this repo is the same process in the paper, Image-to-Markup Generation with Coarse-to-Fine Attention? |
Yes it's the same. You can also found processed data at http://lstm.seas.harvard.edu/latex/data/ |
still not work at python3.7 env before adjust with open(temp_file, 'w') as fout:
prepre = open(output_file, 'r').read().replace('\r', ' ') # delete \r
# replace split, align with aligned
prepre = re.sub(r'\\begin{(split|align|alignedat|alignat|eqnarray)\*?}(.+?)\\end{\1\*?}',
r'\\begin{aligned}\2\\end{aligned}', prepre, flags=re.S)
prepre = re.sub(r'\\begin{(smallmatrix)\*?}(.+?)\\end{\1\*?}',
r'\\begin{matrix}\2\\end{matrix}', prepre, flags=re.S)
fout.write(prepre) after adjust with open(temp_file, 'w') as fout:
# prepre = open(output_file, 'r').read().replace('\r', ' ') # delete \r
prepre = io.open(output_file, 'r', encoding='ascii').read().replace(
'\r', ' ') # delete \r
# replace split, align with aligned
prepre = re.sub(r'\\begin{(split|align|alignedat|alignat|eqnarray)\*?}(.+?)\\end{\1\*?}',
r'\\begin{aligned}\2\\end{aligned}', prepre, flags=re.S)
prepre = re.sub(r'\\begin{(smallmatrix)\*?}(.+?)\\end{\1\*?}',
r'\\begin{matrix}\2\\end{matrix}', prepre, flags=re.S)
fout.write(prepre) show error 2022-04-23 16:52:56,976 root INFO Script being executed: preprocess_formulas.py
2022-04-23 16:52:56,976 root INFO Script being executed: preprocess_formulas.py
Traceback (most recent call last):
File "preprocess_formulas.py", line 103, in <module>
main(sys.argv[1:])
File "preprocess_formulas.py", line 66, in main
prepre = io.open(output_file, 'r', encoding='ascii').read().replace(
File "/home/yhtao/anaconda3/envs/latex_ocr/lib/python3.7/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 854136: ordinal not in range(128) |
@TITC this work for me |
Hi, guys,
I am trying using the scripts in this repo to preprocess the im2latex dataset, but I met this error as,
So, how can I solve this?
Any answer or idea will be appreciated!
The text was updated successfully, but these errors were encountered: