Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

502 error printout breaks console interface #116

Open
newpro opened this issue Aug 10, 2018 · 9 comments
Open

502 error printout breaks console interface #116

newpro opened this issue Aug 10, 2018 · 9 comments

Comments

@newpro
Copy link

newpro commented Aug 10, 2018

Hey @sybrenstuvel
Thanks so much for the repo! It really saves me a lot of time in computer vision research.

The flickr server sometimes gives 502 error, even through it is very rare. My strategy currently include catch the error, and do an exponential backoff, wait for the flickr server to recover. The strategy works very well, however, in some cases, the library print out 502 error message payload, which is 502 webpage, and break the console, most likely special characters causes the program into memory space that should not be accessed. The program seems to be still running and collecting data, however, can not print further messages to monitor the progress. I attached a screenshot of the symptoms for reference.

If I may provide some recommendations to fix the issue, the easy way would be disabling print out the payload, or specifically filter out special characters that cause the console to break.

Thanks again!

Head of the message:
screenshot from 2018-08-10 13-36-34

Tail of the message:

screenshot from 2018-08-10 13-22-20

@sybrenstuvel
Copy link
Owner

special characters causes the program into memory space that should not be accessed

There is no such thing as "special characters". If you're dealing with text, your software should know the encoding it is in and handle that properly. Just assuming it's a single-byte encoding is a bad idea, especially since the Flickr API documentation pretty much screams that everything is UTF-8. Ignoring character encoding will always turn around to bite you.

If I may provide some recommendations to fix the issue, the easy way would be disabling print out the payload, or specifically filter out special characters that cause the console to break.

AFAIK the library doesn't print() anything. All logging output goes via Python's logging module, which you can configure in your application. You can make it completely quiet, log to automatically rotated logfiles, and more.

@newpro
Copy link
Author

newpro commented Aug 13, 2018

Hey @sybrenstuvel
Thank for the quick reply! I really appreciated.

I did some further digging, I still think there is a program logging problem in the repo code, specificly in this line. First, let me say that I did not feed non-UTF text into the API interface. The issue is in response payload from Flickr. With that in mind, I believe the possible error is not lie within logging, but "urllib_parse.unquote". Let me explain with a fun experiment:

  • Can "urllib_parse.unquote" deal with non-UTF code? Answer: yes!
  • Can "logging" deal with non-UTF code? Answer yes!
  • Can "logging" + "unquote" deal with non-UTF code? Answer: NO!!! It will break the console!

Here are the experiment:
screenshot from 2018-08-13 14-59-23

Cheers!

@sybrenstuvel
Copy link
Owner

Please don't screenshot your code. Just use Markdown to format it properly. That will allow me to copy-paste whatever you did and try it myself, instead of having to type everything myself.

Your use of the urlparse module indicates you're indeed using Python 2. What is your reason to stick to that ancient version? It's horrible when it comes to character encoding, and as a result I see mistakes even in your latest experiment (you're talking about u'\xc3' and '\xc3' as the same thing; they aren't).

@newpro
Copy link
Author

newpro commented Aug 14, 2018

hey @sybrenstuvel

Yeah you are right. This is an issue relative to python2. However, my original screenshot is running within python3.5. I was doing a quick test with my laptop on my way out when I submit the last post, so the issue is still there, just i did not get the right one.

I dig a bit further and try to replicate the issue. So the problem is about display this page. However, I tried to google the specific html code for this page trying to load the webpage again, I failed to find any. And also because the server issue are rare, I can not replicate it by send request.

However, I looked into it, and believe that it breaks the code when it is at displaying Korea. So I downloaded a html source code of offical Korea Tourism website to get some Korea byte string. Now we can successfully locate the issue:

import logging
from urllib import parse as urllib_parse
# the following line should freeze your console, or python interface, if not let me know
logging.error(urllib_parse.unquote('\xeb\xac\xb4\xeb\x8b\xa8\xec\x88\x98\xec\xa7\x91\xea\xb1\xb0\xeb\xb6\x80</a></li>\n'))

@oPromessa
Copy link

oPromessa commented Aug 14, 2018

Some info I hope it helps...

  1. On occasion I get the 'bad panda' 502 error... mostly under heavy load. I have logging enabled to file and console and have not noticed this console locking issue you mention. I use both python 2.7 and python 3.6 with unicode.
  2. I've quickly tried your sample code on Windows bash with python 3.4 (will try it on Linux later on) and console seems not to lock.
$ python3.4
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> from urllib import parse as urllib_parse
>>> logging.error(urllib_parse.unquote('\xeb\xac\xb4\xeb\x8b\xa8\xec\x88\x98\xec\xa7\x91\xea\xb1\xb0\xeb\xb6\x80</a></li>\n'))
ERROR:root:무���거�</a></li>

>>> print('still here')
still here
>>>

hope it helps

@newpro
Copy link
Author

newpro commented Aug 14, 2018

@oPromessa

I am using linux 16.04 LTS, python 3.6. I guess it may contribute to the current program stack in memory, and the OS ability to stop program reading into, or stream out to invalid memory. The code breaks in mine, screenshot:
screenshot from 2018-08-14 17-42-26

The issue can be resolved in my system, by decode to UTF-8 before pass into unquote, e.g.,

logging.error(urllib_parse.unquote(b'\xeb\xac\xb4\xeb\x8b\xa8\xec\x88\x98\xec\xa7\x91\xea\xb1\xb0\xeb\xb6\x80</a></li>\n'.decode("utf-8", "strict")))

Observe:
image

@sybrenstuvel
Copy link
Owner

Why are you unquoting a string that clearly isn't URL-encoded at all?

@newpro
Copy link
Author

newpro commented Aug 15, 2018

hey @sybrenstuvel

I got confused about that part 2. If the error is generated at this line, it is a mistake to use unquote function. The function should parse a url string, not request text.

@oPromessa
Copy link

oPromessa commented Aug 21, 2018

@newpro just trying to help out. Would you mind going back to the beginning? I have a wild guess that the console/shell might not have the appropriate locale settings and may be getting confused!

  1. Can you share the environment variables on the shell which launches your app? I'm guessing some LANG/Collation related settings may be the cause of the conflict.
  2. Could you link to your code where you set the logging and where you get this situation.
  • Side notes 1)...
    • I was forced on my app launch shell to set things like this to cover my bases.
# I've used this setting to allow support for international characters in
# folders and file names
export LC_ALL=en_US.utf8
export LANG=en_US.utf8
  • Side notes 2)
    • My train of thought is that with incorrect locale you get different outputs...
$ echo $LANG
en_US.UTF-8
$ find . -type d
.
./Test Photo Library/Várias Pics
$ LANG=en_US find . -type d
.
./Test Photo Library/V??rias Pics
$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants