Skip to content

fetch: does not work with ssh even though scp/rsync/sftp do. #43

Closed as not planned
@Jalmenara

Description

@Jalmenara

Bug Report

Description

dvc fetch & pull commands do not work with ssh remote even though the same path works with scp, rsync and sftp+get

Reproduce

I do not see a clear way to make this problem reproducible. In fact, I tried to post this as a Discord question, but my description was too long for a message. Hopefully it can be understood properly. The situation is as follows:

My team and I are facing a problem when using dvc ssh/sftp tools.
We are setting up a dvc remote shared by two companies to work on a project: clientA & contractorB (I belong to the latter).
The remote is hosted by the clientA at a location accessible by us through ssh.
There is an intermediate proxy server, but we have sorted that out using the ProxyJump option in ~/.ssh/config, like this:

Host proxy
    HostName proxy.clientA.com
    User pepito

Host destination
    HostName dest.clientA.com
    User pepito
    ProxyJump proxy

The dvc remote is stored at destination.
We have also configured the ssh keys on the servers of clientA, so that no password prompts are needed.

On the dvc side, we have configured the remote in our repo with the following .dvc/config:

[core]
    remote = bin-remote
['remote "bin-remote"']
    url = ssh://destination:/work/projects/models-bin/

However, the dvc fetch/pull commands fail, with the following prompt:

ERROR: unexpected error - [Errno -2] Name or service not known

The first thing I did was ensuring that the paths were written correctly.
For instance, I removed the : in the url of the .dvc/config, between "destination" and "/work":

    url = ssh://destination/work/projects/models-bin/

This did not solve the problem.

Next, to discard issues with the ssh/sftp connections, I decided to copy manually
the /work/projects/models-bin/ folder from clientA to my contractorB's
computer using three different methods: scp, rsync and sftp.

# Method 1 (works fine)
scp -r pepito@destination:/work/projects/models-bin/ .

# Method 2 (works fine)
mkdir models-bin
rsync -avul pepito@destination:/work/projects/models-bin/ models-bin/

# Method 3 (works fine)
sftp destination
cd /work/projects/models-bin/
lmkdir models-bin/
lcd models-bin
get -r *

The three methods work properly: the "dvc-like" files appear at contractorB side (e.g., models-bin/b0/26324c6904b2a9cb4b88d6d61c81d1).
This is what led me to think that the issue
might be on the dvc side, and not so much on the connection or the paths. Note that
the path used in the three manual methods is exactly the same (copy-pasted).

Expected

Be able to locate the files.

Environment information

Output of dvc doctor:

DVC version: 2.57.2 (pip)
-------------------------
Platform: Python 3.8.15 on Linux-5.10.0-0.bpo.7-amd64-x86_64-with-glibc2.10
Subprojects:
        dvc_data = 0.51.0
        dvc_objects = 0.22.0
        dvc_render = 0.4.0
        dvc_task = 0.2.1
        scmrepo = 1.0.3
Supports:
        http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        ssh (sshfs = 2023.4.1),
        webdav (webdav4 = 0.9.8),
        webdavs (webdav4 = 0.9.8)
Config:
        Global: /home/pepito/.config/dvc
        System: /etc/xdg/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: ssh
Workspace directory: xfs on /dev/etherd/e1.2
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/f4ebe3b9398bbf3512d2cac9e029947f

Additional Information:

Output of dvc fetch -v:

(env_JAR) [pepito@contractorB repo]$ dvc fetch -v
2023-05-18 17:02:23,975 DEBUG: v2.57.2 (pip), CPython 3.8.15 on Linux-5.10.0-0.bpo.7-amd64-x86_64-with-glibc2.10
2023-05-18 17:02:23,975 DEBUG: command: /home/pepito/.conda/envs/env_JAR/bin/dvc fetch -v
2023-05-18 17:02:24,349 DEBUG: Preparing to transfer data from '/work/projects/models-bin/' to '/contractorB/pepito/repo/.dvc/cache'
2023-05-18 17:02:24,349 DEBUG: Preparing to collect status from '/contractorB/pepito/repo/.dvc/cache'
2023-05-18 17:02:24,349 DEBUG: Collecting status from '/contractorB/pepito/repo/.dvc/cache'
2023-05-18 17:02:24,350 DEBUG: Preparing to collect status from '/work/projects/models-bin/'                                                                                                                                                                                                                 
2023-05-18 17:02:24,350 DEBUG: Collecting status from '/work/projects/models-bin/'
2023-05-18 17:02:24,350 DEBUG: Querying 1 oids via object_exists                                                                                                                                                                                                                                                       
2023-05-18 17:02:28,418 DEBUG: Preparing to transfer data from '/work/projects/models-bin/' to '/contractorB/pepito/repo/.dvc/cache'                                                                                                       
2023-05-18 17:02:28,418 DEBUG: Preparing to collect status from '/contractorB/pepito/repo/.dvc/cache'
2023-05-18 17:02:28,418 DEBUG: Collecting status from '/contractorB/pepito/repo/.dvc/cache'
2023-05-18 17:02:28,420 DEBUG: Preparing to collect status from '/work/projects/models-bin/'                                                                                                                                                                                                                 
2023-05-18 17:02:28,420 DEBUG: Collecting status from '/work/projects/models-bin/'
2023-05-18 17:02:32,027 ERROR: unexpected error - [Errno -2] Name or service not known                                                                                                                                                                                                                                 
Traceback (most recent call last):
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/cli/__init__.py", line 210, in main
    ret = cmd.do_run()
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/cli/command.py", line 26, in do_run
    return self.run()
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/commands/data_sync.py", line 84, in run
    processed_files_count = self.repo.fetch(
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/repo/__init__.py", line 65, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/repo/fetch.py", line 86, in fetch
    d, f = _fetch(
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/repo/fetch.py", line 166, in _fetch
    d, f = repo.cloud.pull(
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/data_cloud.py", line 181, in pull
    return self.transfer(
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc/data_cloud.py", line 135, in transfer
    return transfer(src_odb, dest_odb, objs, **kwargs)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_data/hashfile/transfer.py", line 203, in transfer
    status = compare_status(
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 189, in compare_status
    src_exists, src_missing = status(
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 149, in status
    odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/db.py", line 411, in oids_exist
    remote_size, remote_oids = self._estimate_remote_size(
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/db.py", line 293, in _estimate_remote_size
    remote_oids = set(iter_with_pbar(oids))
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/db.py", line 283, in iter_with_pbar
    for oid in oids:
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/db.py", line 249, in _oids_with_limit
    for oid in self._list_oids(prefixes=prefixes, jobs=jobs):
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/db.py", line 236, in _list_oids
    for path in self._list_prefixes(prefixes=prefixes, jobs=jobs):
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/db.py", line 216, in _list_prefixes
    yield from self.fs.find(paths, batch_size=jobs, prefix=prefix)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_objects/fs/base.py", line 429, in find
    yield from self.fs.find(path)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/funcy/objects.py", line 50, in __get__
    return prop.__get__(instance, type)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/funcy/objects.py", line 28, in __get__
    res = instance.__dict__[self.fget.__name__] = self.fget(instance)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/dvc_ssh/__init__.py", line 119, in fs
    return _SSHFileSystem(**self.fs_args)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/fsspec/spec.py", line 76, in __call__
    obj = super().__call__(*args, **kwargs)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/sshfs/spec.py", line 66, in __init__
    self._client, self._pool = self.connect(
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/fsspec/asyn.py", line 113, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/fsspec/asyn.py", line 98, in sync
    raise return_result
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/fsspec/asyn.py", line 53, in _runner
    result[0] = await coro
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
    return fut.result()
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/sshfs/spec.py", line 83, in _connect
    client = await self._stack.enter_async_context(_raw_client)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/contextlib.py", line 568, in enter_async_context
    result = await _cm_type.__aenter__(cm)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/asyncssh/misc.py", line 274, in __aenter__
    self._coro_result = await self._coro
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/asyncssh/connection.py", line 8042, in connect
    return await asyncio.wait_for(
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/tasks.py", line 455, in wait_for
    return await fut
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/site-packages/asyncssh/connection.py", line 430, in _connect
    _, session = await loop.create_connection(
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/base_events.py", line 986, in create_connection
    infos = await self._ensure_resolved(
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/base_events.py", line 1365, in _ensure_resolved
    return await loop.getaddrinfo(host, port, family=family, type=type,
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/asyncio/base_events.py", line 825, in getaddrinfo
    return await self.run_in_executor(
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/pepito/.conda/envs/env_JAR/lib/python3.8/socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

2023-05-18 17:02:32,075 DEBUG: Version info for developers:
DVC version: 2.57.2 (pip)
-------------------------
Platform: Python 3.8.15 on Linux-5.10.0-0.bpo.7-amd64-x86_64-with-glibc2.10
Subprojects:
        dvc_data = 0.51.0
        dvc_objects = 0.22.0
        dvc_render = 0.4.0
        dvc_task = 0.2.1
        scmrepo = 1.0.3
Supports:
        http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        ssh (sshfs = 2023.4.1),
        webdav (webdav4 = 0.9.8),
        webdavs (webdav4 = 0.9.8)
Config:
        Global: /home/pepito/.config/dvc
        System: /etc/xdg/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: ssh
Workspace directory: xfs on /dev/etherd/e1.2
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/f4ebe3b9398bbf3512d2cac9e029947f

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-05-18 17:02:32,076 DEBUG: Analytics is enabled.
2023-05-18 17:02:32,111 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpg1yozo9a']'
2023-05-18 17:02:32,112 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpg1yozo9a']'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions