Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redirect issue requests not passing proxies through to redirect #3895

Closed
toasteez opened this issue Feb 28, 2017 · 24 comments
Closed

Redirect issue requests not passing proxies through to redirect #3895

toasteez opened this issue Feb 28, 2017 · 24 comments

Comments

@toasteez
Copy link

toasteez commented Feb 28, 2017

I'm trying to retrieve data via Quandl from behind a proxy.

The reason I have dropped into requests is that the Quandl library does not allow a proxy dictionary to be passed.

I can successfully retrieve datasets where there is no redirect however when you try to retrieve a complete database there is a redirect

from https://www.quandl.com/api/v3/databases/CLSM/data?api_key ... (cannot provide my private key)

to - https://quandl-bulk-download.s3.amazonaws.com/CLSM.zip? ..... (Amazon credentials are created here)

Code

response = requests.get(url, proxies=proxies, allow_redirects=False) # This lets me see the headers, without failing 

response = requests.get(url, proxies=proxies) # This fails OSError: Tunnel connection failed: 407 Proxy Authentication Required

The initial request returns a status 302.

{'X-Runtime': '0.079339', 'X-XSS-Protection': '1; mode=block', 'Connection': 'keep-alive', 'Content-Length': '1059', 'Cache-Control': 'no-cache', 'X-RateLimit-Remaining': '999901', 'Vary': 'Origin', 'X-Content-Type-Options': 'nosniff', 'Location': 'https://quandl-bulk-download.s3.amazonaws.com/CLSM.zip?...SignedHeaders=host&X-Amz-Signature=... , 'X-Frame-Options': 'SAMEORIGIN', 'X-Rack-CORS': 'preflight-hit; no-origin', 'CF-RAY': '...., 'Set-Cookie': '__cfduid=...; expires=Wed, 28-Feb-18 10:00:16 GMT; path=/; domain=.quandl.com; HttpOnly', 'Server': 'cloudflare-nginx', 'Content-Type': 'text/html; charset=utf-8', 'Date': 'Tue, 28 Feb 2017 10:00:17 GMT', 'X-RateLimit-Limit': '1000000', 'X-Request-Id': ...}

Is there still a bug in this area or should I be doing something differently?

Note that IE prompts for download of file and Chrome just works.

Version
Python 3.5.2
requests 2.13.0

@Lukasa
Copy link
Member

Lukasa commented Feb 28, 2017

It seems like your proxy requires that you authorize to it. Do you have credentials for that proxy?

@toasteez
Copy link
Author

toasteez commented Feb 28, 2017

The proxy auto authorizes for a simple dataset request via quandl but not for the redirect. I tested this by removing the proxies=proxies (the request fails) witha MaxRetryError: HTTPSConnectionPool(host='www.quandl.com', port=443): With the proxy dict it is successful. (this is without a redirect to amazon)

I'm on windows so it uses either Kerberos or NTLM not sure which?

@Lukasa
Copy link
Member

Lukasa commented Feb 28, 2017

Requests does not transparently support Kerberos or NTLM. You'd need to use something like requests-kerberos or requests-ntlm to support those auth challenges. However, I don't think either support authing to CONNECT tunnelling proxies via those protocols.

@toasteez
Copy link
Author

Why would 1st connection to quandl.com authenticate via proxy automatically but the redirect does not?

@Lukasa
Copy link
Member

Lukasa commented Feb 28, 2017

@toasteez Can you show me the proxy dictionary?

@toasteez
Copy link
Author

proxies = {'http': 'http://proxy.company.org:8080',
'https': 'http://proxy.company.org:8080',}

@Lukasa
Copy link
Member

Lukasa commented Feb 28, 2017

So, my best guess is that the first request does not need auth because the proxy server has decided that requests to that hostname (www.quandl.com) do not need authorization, while requests to the second hostname (which belongs to Amazon) do. That is, that this is a proxy concern.

@toasteez
Copy link
Author

OK. Thanks. How do I pass the auth and pw via requests to try this out? My preferred option is to make this seamless without hard coding my credentials anywhere though.

@Lukasa
Copy link
Member

Lukasa commented Feb 28, 2017

If you can auth to the proxy using basic auth you need to pass them in the URL to the proxy, e.g. {'https': 'http://username:[email protected]:8080'}. If you need to auth using Kerberos or NTLM you'll need to consult the documentation for those modules.

@toasteez
Copy link
Author

Or could this be wrapped up somehow to work?

import win32com.client

url = 'https://...'

h = win32com.client.Dispatch('WinHTTP.WinHTTPRequest.5.1')
h.SetAutoLogonPolicy(0)
h.Open('GET', url, False)
h.Send()
result = h.responseText
result

@Lukasa
Copy link
Member

Lukasa commented Feb 28, 2017

It's not at all clear to me what you're trying to do with the above code.

@toasteez
Copy link
Author

Ignoring the above the adding the user:password to the proxy results in:

SSLError: ("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",)

@Lukasa
Copy link
Member

Lukasa commented Feb 28, 2017

Ok, so it looks very much like the proxy is attempting to place itself as a man-in-the-middle on your HTTPS connection by decrypting the traffic to Amazon. To allow that you'll need to get the certificate authority your proxy is using to build its TLS certificates and pass the path to that cert to verify.

@toasteez
Copy link
Author

OK thanks, is there a way to not have to have the user:pass in any script?

@Lukasa
Copy link
Member

Lukasa commented Feb 28, 2017

Yes. Requests supports reading proxy data from the environment variables HTTP_PROXY and HTTPS_PROXY. You can set those up to have the same strings in them that you have as the values of the proxy dictionary. The proxy dictionary then becomes unnecessary. Otherwise, you should write code to read them from config files.

@toasteez
Copy link
Author

It still requires user : password to be in plaintext somewhere which I don't like.

Do you where the machine certificate store would be located on windows? (I can make it work with verify false now.)

@Lukasa
Copy link
Member

Lukasa commented Feb 28, 2017

The windows certificate store is in the registry. The wincertstore module can help you with that.

@toasteez
Copy link
Author

using the example on the wincertstore site what would I pass to verify?

import wincertstore
for storename in ("CA", "ROOT"):
    with wincertstore.CertSystemStore(storename) as store:
        for cert in store.itercerts(usage=wincertstore.SERVER_AUTH):
            print(cert.get_pem())
            print(cert.get_name())
            print(cert.enhanced_keyusage_names())

@Lukasa
Copy link
Member

Lukasa commented Feb 28, 2017

You'd need to iterate over the store and write out all of cert.get_pem() to a file, and then use that.

@toasteez
Copy link
Author

OK, one other thing, what is easier to implement requests-kerberos or requests-ntlm? I think our proxy supports either and want to try easiest first.

@Lukasa
Copy link
Member

Lukasa commented Feb 28, 2017

I am honestly not sure. Might be worth trying either. 😄

@toasteez
Copy link
Author

I've made good progress now, appreciate the input. I hope quandl add the proxy and the verify to their implementation.

@Lukasa
Copy link
Member

Lukasa commented Feb 28, 2017

Indeed, good luck!

@Lukasa Lukasa closed this as completed Feb 28, 2017
@toasteez
Copy link
Author

toasteez commented Feb 28, 2017

requests-kerberos gives a MutualAuthenticationError: Unable to authenticate <Response [302]>

I think this is now related to issue 64

request-ntlm requires the user and password so doesn't add value.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants