Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for socks proxies #821

Open
matecsaj opened this issue Dec 16, 2024 · 0 comments
Open

add support for socks proxies #821

matecsaj opened this issue Dec 16, 2024 · 0 comments
Labels
t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@matecsaj
Copy link

Socks4 and Socks5 are stealthier than HTTP and HTTPS, so please add support for using Socks proxies.

However, be cautious—there was a time when Playwright rejected non-anonymous Socks proxies. I'm not sure about the current status.

Test code:

import asyncio
from crawlee.proxy_configuration import ProxyConfiguration

async def config_proxy() -> ProxyConfiguration:
    # Create and return a ProxyConfiguration object.
    proxy_configuration = ProxyConfiguration(
        tiered_proxy_urls=[
            # No proxy tier. (Not needed, but optional in case you do not want to use any proxy on lowest tier.)
            # [None],
            # lower tier, cheaper, preferred as long as they work
            ['https://example.com:8080', 'https://example.com:8080'],
            # higher tier, more expensive, used as a fallback
            ['socks4://example.com:8080', 'socks5://example.com:8080'],
        ]
    )

    return proxy_configuration

asyncio.run(config_proxy())

Terminal output:

/Users/matecsaj/PycharmProjects/wat-crawlee/venv/bin/python /Users/matecsaj/Library/Application Support/JetBrains/PyCharm2024.3/scratches/scratch_1.py 
Traceback (most recent call last):
  File "/Users/matecsaj/Library/Application Support/JetBrains/PyCharm2024.3/scratches/scratch_1.py", line 19, in <module>
    asyncio.run(config_proxy())
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ~~~~~~~~~~^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/asyncio/base_events.py", line 720, in run_until_complete
    return future.result()
           ~~~~~~~~~~~~~^^
  File "/Users/matecsaj/Library/Application Support/JetBrains/PyCharm2024.3/scratches/scratch_1.py", line 6, in config_proxy
    proxy_configuration = ProxyConfiguration(
        tiered_proxy_urls=[
    ...<6 lines>...
        ]
    )
  File "/Users/matecsaj/PycharmProjects/wat-crawlee/venv/lib/python3.13/site-packages/crawlee/proxy_configuration.py", line 103, in __init__
    [[URL(url) for url in tier if self._url_validator.validate_python(url)] for tier in tiered_proxy_urls]
                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
  File "/Users/matecsaj/PycharmProjects/wat-crawlee/venv/lib/python3.13/site-packages/pydantic/type_adapter.py", line 412, in validate_python
    return self.validator.validate_python(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        object,
        ^^^^^^^
    ...<3 lines>...
        allow_partial=experimental_allow_partial,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
pydantic_core._pydantic_core.ValidationError: 1 validation error for function-wrap[wrap_val()]
  URL scheme should be 'http' or 'https' [type=url_scheme, input_value='socks4://example.com:8080', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/url_scheme

Process finished with exit code 1
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

1 participant