Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP client ignores "no_proxy" environment variable #163

Closed
karsten-wagner opened this issue Apr 4, 2023 · 4 comments · Fixed by #166
Closed

HTTP client ignores "no_proxy" environment variable #163

karsten-wagner opened this issue Apr 4, 2023 · 4 comments · Fixed by #166
Assignees
Labels
bug Something isn't working

Comments

@karsten-wagner
Copy link

Describe the bug

clickhouse-connect seems to ignore the environment variable no_proxy/NO_PROXY in a local Apache Superset configuration via Windows 10 with Docker Desktop and WSL2.

Utilizing a configuration that requires a forward proxy (e.g. in a corporate network) creates the need to maintain HTTP_PROXY/HTTPS_PROXY environment variables to allow Apache Superset to download Python packages like the clickhouse-connect driver and sample data. This may eventually need to provide additional SSL certificates and to set corresponding environment variables for Python and Node.JS (details on the configuration below - no issue in this context).
In the particular scenario described below a local Fiddler forward proxy is used, however the used proxy does not seem to matter and is just relevant for the text of the error message.

Installing the clickhouse-connect driver and connecting to clickhouse via connection string clickhousedb://default:@clickhouse:8123 should work, given the clickhouse container was named clickhouse respectively without leaving the local docker network. When trying to connect to the database, there is the following error message:

ERROR: :HTTPDriver for http://clickhouse:8123 returned response code 502)
[Fiddler] DNS Lookup for "clickhouse" failed. System.Net.Sockets.SocketException No such host is known

The error message indicates that the local docker network was left and the request to host clickhouse was sent to the forward proxy. This is ignoring the setting in environment variable no_proxy, which did contain host clickhouse as the exception.

Workaround:
Docker compose creates a default network to bridge to the host. As per the clickhouse settings described above, the container is exposed on a port on the host OS, hence connecting via string clickhousedb://default:@localhost:8123 does work. This is considered sub-optimal since a later production setup should isolate the database, while allowing to install the database driver. This would require proper handling of proxy settings and avoid to reroute via the forward proxy when connecting to the database.

Other considerations:
Connecting to clickhousedb://play:[email protected]:443 as suggested here does work as a sanity check of the proxy and certificates setup. The external URL is recognized by the forward proxy, while the container/service name clickhouse is not and leads to an error.

Steps to reproduce

  1. Setup Apache Superset in Windows 10 with Docker Desktop with WSL2 support as described here

  2. Add clickhouse container to docker-compose-non-dev.yml (add the following in the respective sections):

    x-clickhouse-volumes:
      &clickhouse-volumes
      - clickhouse_home:/var/lib/clickhouse
    
    services:
      clickhouse:
        image: clickhouse/clickhouse-server:23
        container_name: clickhouse
        env_file: docker/.env-non-dev
        user: "root"
        restart: unless-stopped
        ports:
          - "8123:8123"
          - "9000:9000"
        volumes: *superset-volumes
    
    volumes:
      clickhouse_home:
        external: false
  3. Add clickhouse-connect>=0.4.1 to ./docker/requirements-local.txt as described here and here

  4. Add proxy setup to .docker/.env-non-dev (add the following to the file):

    # Local Proxy Settings
    http_proxy="http://host.docker.internal:8888"
    https_proxy="http://host.docker.internal:8888"
    no_proxy="localhost,127.0.0.1,db,redis,superset,superset-init,superset-worker,superset-beat,clickhouse"
    HTTP_PROXY="http://host.docker.internal:8888"
    HTTPS_PROXY="http://host.docker.internal:8888"
    NO_PROXY="localhost,127.0.0.1,db,redis,superset,superset-init,superset-worker,superset-beat,clickhouse"
    # SSL certificate for the Python part
    REQUESTS_CA_BUNDLE=/app/docker/ca-bundle.crt
    # The superset_init container uses urllib (instead of urllib3) and ignores the REQUEST_CA_BUNDLE variable
    SSL_CERT_FILE=/app/docker/ca-bundle.crt
    # Eventually used for the node.js based UI components
    NODE_EXTRA_CA_CERTS=/app/docker/ca-bundle.crt
  5. Start the containers via TAG=1.5.3 docker-compose -f docker-compose-non-dev.yml up

  6. Log in to http://localhost:8088/

  7. Add clickhouse database as described here

Expected behaviour

Connection string clickhousedb://default:@clickhouse:8123 and clicking "Test Connection" should allow successful connectivity to the database.

Code example

The code of clickhouse-connect indicates support for the proxy environment variables, but not for the no_proxy exception list:

:param http_proxy http proxy address. Equivalent to setting the HTTP_PROXY environment variable
:param https_proxy https proxy address. Equivalent to setting the HTTPS_PROXY environment variable

clickhouse-connect and/or ClickHouse server logs

Configuration

Environment - from within the superset_app container:

pip list | grep clickhouse
clickhouse-connect     0.5.18

python --version
Python 3.8.13

uname -a
Linux 8275ab37bfa1 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 GNU/Linux

Clickhouse Server on Version 23 (see docker compose file).

@karsten-wagner
Copy link
Author

Missed to share the server logs:

2023-04-04 09:11:35,070:ERROR:clickhouse_connect.driver.httpclient:[Fiddler] DNS Lookup for "clickhouse" failed. System.Net.Sockets.SocketException No such host is known

[SupersetError(message=':HTTPDriver for http://clickhouse:8123 returned response code 502)\n [Fiddler] DNS Lookup for "clickhouse" failed. System.Net.Sockets.SocketException No such host is known ', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'ClickHouse Connect', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]

2023-04-04 09:11:35,081:WARNING:superset.views.base:[SupersetError(message=':HTTPDriver for http://clickhouse:8123 returned response code 502)\n [Fiddler] DNS Lookup for "clickhouse" failed. System.Net.Sockets.SocketException No such host is known ', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'ClickHouse Connect', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]

172.21.0.1 - - [04/Apr/2023:09:11:35 +0000] "POST /api/v1/database/test_connection HTTP/1.1" 422 550 "http://localhost:8088/databaseview/list/?pageIndex=0&sortColumn=changed_on_delta_humanized&sortOrder=desc" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Edg/110.0.1587.78"

@genzgd genzgd added the bug Something isn't working label Apr 4, 2023
@genzgd genzgd self-assigned this Apr 4, 2023
@genzgd
Copy link
Collaborator

genzgd commented Apr 4, 2023

Thanks for the detailed issue description! It's true there's no handling for a no_proxy environment variable. As a workaround, can you try sending a "default" pool_mgr to the main get_client method? You can create it by calling the http_util.get_pool_manager method with no arguments. The will bypass the httpclient constructor logic that checks environment variables for a proxy.

In the meantime I will look at some very basic logic for checking that list.

@genzgd genzgd linked a pull request Apr 5, 2023 that will close this issue
2 tasks
@karsten-wagner
Copy link
Author

@genzgd, thank you very much for fixing. I can confirm it works now as expected.

I have updated my python dependency to clickhouse-connect>=0.5.19 and tested again with Apache Superset 1.5.3. This new combination works as expected and properly handles the no_proxy environment variable.

@genzgd
Copy link
Collaborator

genzgd commented Apr 25, 2023

Thanks for the update, much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants