Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable Datalad SSH tests on macOS #55

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Re-enable Datalad SSH tests on macOS #55

wants to merge 7 commits into from

Conversation

jwodder
Copy link
Member

@jwodder jwodder commented Feb 11, 2021

@yarikoptic
Copy link
Member

everything was merged and released on datalad end since back then. Could you please re-trigger CI run here @jwodder ?

@jwodder
Copy link
Member Author

jwodder commented Mar 12, 2021

@yarikoptic CI run triggered.

@yarikoptic
Copy link
Member

Mac is still ain't happy:

(default) Waiting for an IP...
Error creating machine: Error in driver during machine creation: Too many retries waiting for SSH to be available.  Last error: Maximum number of retries (60) exceeded
Error: Process completed with exit code 1.

@yarikoptic
Copy link
Member

ha -- some pass and some fail with e.g.

2021-03-16T21:55:20.5835210Z datalad.support.exceptions.CommandError: CommandError: 'ssh -o ControlPath=/Users/runner/Library/Caches/datalad/sockets/0ead11a7 datalad-test 'export "PATH=/usr/lib/git-annex.linux:$PATH"; mkdir -p /private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/datalad_temp_check_target_ssh_recursivefimncw8k-False'' failed with exitcode 1 [err: 'mkdir: cannot create directory ‘/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/datalad_temp_check_target_ssh_recursivefimncw8k-False’: Permission denied']

is it the same "too long of a path" issue?

@jwodder
Copy link
Member Author

jwodder commented Mar 17, 2021

@yarikoptic What "too long of a path" issue? The only such issue I recall on macOS affected Conda's decisions about filling in shebangs.

@yarikoptic
Copy link
Member

argh, failed to find related discussion ATM. But you could try meanwhile setting TMPDIR=~/DLTMP and see if goes away. That is what is done also in https://github.com/datalad/datalad/blob/master/.appveyor.yml#L228 and I believe for that reason

@yarikoptic
Copy link
Member

Underlying issue https://unix.stackexchange.com/questions/367008/why-is-socket-path-length-limited-to-a-hundred-chars#:~:text=Mac%20OS%20X%2010.9%3A%20104%20characters maximal socket path length 104 . In Datalad were use HOME in TMPDIR while testing

@yarikoptic
Copy link
Member

From error messages it seems like /Users/runner/DLTMP is not mount-bound inside the docker container and thus leading to various issues? If there is /tmp on those Macs, might be worth trying to export TMPDIR to e.g. /tmp/DLTMP since /tmp should exist in the container and more likely to work?

@yarikoptic
Copy link
Member

#58 supersedes this one, right @jwodder . If yes -- please close

@jwodder
Copy link
Member Author

jwodder commented Mar 25, 2021

@yarikoptic #58 uses a third-party action for setting up Docker on macOS as an alternative to the Docker Machine approach on this branch. I'm not entirely certain how reliable the action in question is, and so I want to leave both PRs open for now.

@yarikoptic
Copy link
Member

blocker was resolved, master of datalad should be green again, time to resolve this issue one way or another to gain better testing on OSX

@jwodder
Copy link
Member Author

jwodder commented Apr 5, 2022

@yarikoptic This PR seems to work now, aside from some datalad test failures.

@yarikoptic
Copy link
Member

@yarikoptic This PR seems to work now, aside from some datalad test failures.

well, it doesn't work in a sense that ssh related tests fail on macOS:

(git)smaug:/mnt/datasets/datalad/ci/git-annex/builds/2022/04[master]pr-55
$> git grep datalad.support.tests.test_annexrepo.test_annex_ssh
build-macos.yaml-645-32886238-failed/1_test-datalad (master).txt:2022-04-05T17:36:03.2718890Z datalad.support.tests.test_annexrepo.test_annex_ssh ... ERROR
build-macos.yaml-645-32886238-failed/1_test-datalad (master).txt:2022-04-05T17:54:00.1381870Z ERROR: datalad.support.tests.test_annexrepo.test_annex_ssh
build-macos.yaml-645-32886238-failed/2_test-datalad (maint).txt:2022-04-05T17:43:28.9826100Z datalad.support.tests.test_annexrepo.test_annex_ssh ... ERROR
build-macos.yaml-645-32886238-failed/3_test-datalad (release).txt:2022-04-05T17:47:29.4295650Z datalad.support.tests.test_annexrepo.test_annex_ssh ... ERROR
build-macos.yaml-645-32886238-failed/test-datalad (maint)/12_Run datalad tests.txt:2022-04-05T17:43:28.9826060Z datalad.support.tests.test_annexrepo.test_annex_ssh ... ERROR
build-macos.yaml-645-32886238-failed/test-datalad (master)/12_Run datalad tests.txt:2022-04-05T17:36:03.2718850Z datalad.support.tests.test_annexrepo.test_annex_ssh ... ERROR
build-macos.yaml-645-32886238-failed/test-datalad (master)/12_Run datalad tests.txt:2022-04-05T17:54:00.1381870Z ERROR: datalad.support.tests.test_annexrepo.test_annex_ssh
build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T17:47:29.4295600Z datalad.support.tests.test_annexrepo.test_annex_ssh ... ERROR
and a sample ERROR
2022-04-05T17:54:00.1381720Z ======================================================================
2022-04-05T17:54:00.1381870Z ERROR: datalad.support.tests.test_annexrepo.test_annex_ssh
2022-04-05T17:54:00.1382130Z ----------------------------------------------------------------------
2022-04-05T17:54:00.1382240Z Traceback (most recent call last):
2022-04-05T17:54:00.1382640Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/nose/case.py", line 198, in runTest
2022-04-05T17:54:00.1382740Z     self.test(*self.arg)
2022-04-05T17:54:00.1383170Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/tests/utils.py", line 288, in _wrap_skip_ssh
2022-04-05T17:54:00.1383270Z     return func(*args, **kwargs)
2022-04-05T17:54:00.1383740Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/tests/utils.py", line 307, in _wrap_skip_nomultiplex_ssh
2022-04-05T17:54:00.1383850Z     return func(*args, **kwargs)
2022-04-05T17:54:00.1384290Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/tests/utils.py", line 874, in _wrap_with_tempfile
2022-04-05T17:54:00.1384410Z     return t(*(arg + (filename,)), **kw)
2022-04-05T17:54:00.1385030Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/support/tests/test_annexrepo.py", line 1223, in test_annex_ssh
2022-04-05T17:54:00.1385270Z     ar.copy_to(["foo"], remote="ssh-remote-1")
2022-04-05T17:54:00.1385730Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/support/gitrepo.py", line 325, in _wrap_normalize_paths
2022-04-05T17:54:00.1385860Z     result = func(self, files_new, *args, **kwargs)
2022-04-05T17:54:00.1386300Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/support/annexrepo.py", line 2902, in copy_to
2022-04-05T17:54:00.1386520Z     files, ['--in', '.', '--not', '--in', remote])
2022-04-05T17:54:00.1386980Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/support/annexrepo.py", line 1514, in _get_expected_files
2022-04-05T17:54:00.1387100Z     merge_annex_branches=merge_annex_branches
2022-04-05T17:54:00.1387930Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/support/annexrepo.py", line 1078, in _call_annex_records
2022-04-05T17:54:00.1388040Z     raise e
2022-04-05T17:54:00.1388570Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/support/annexrepo.py", line 1050, in _call_annex_records
2022-04-05T17:54:00.1388650Z     **kwargs,
2022-04-05T17:54:00.1389360Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/support/annexrepo.py", line 943, in _call_annex
2022-04-05T17:54:00.1389450Z     **kwargs)
2022-04-05T17:54:00.1390700Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/runner/gitrunner.py", line 227, in run_on_filelist_chunks
2022-04-05T17:54:00.1390830Z     **kwargs):
2022-04-05T17:54:00.1391330Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/runner/gitrunner.py", line 161, in _get_chunked_results
2022-04-05T17:54:00.1391410Z     **kwargs)
2022-04-05T17:54:00.1391820Z   File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/runner/runner.py", line 205, in run
2022-04-05T17:54:00.1391910Z     **results,
2022-04-05T17:54:00.1392750Z datalad.runner.exception.CommandError: CommandError: 'git -c diff.ignoreSubmodules=none annex find --in . --not --in ssh-remote-1 --json --json-error-messages -c annex.dotfiles=true -- foo' failed with exitcode 1 under /private/tmp/DLTMP/datalad_temp_test_annex_ssh1x6hktex/main [info keys: stdout_json] [err: 'Unable to parse git config from ssh-remote-1
2022-04-05T17:54:00.1393350Z fatal: '/private/tmp/DLTMP/datalad_temp_test_annex_ssh1x6hktex/remote1' does not appear to be a git repository
2022-04-05T17:54:00.1393930Z CommandError: 'ssh -o ControlPath=/Users/runner/Library/Caches/datalad/sockets/89d769bb -o SendEnv=GIT_PROTOCOL datalad-test 'git-upload-pack '"'"'/private/tmp/DLTMP/datalad_temp_test_annex_ssh1x6hktex/remote1'"'"''' failed with exitcode 128
2022-04-05T17:54:00.1394060Z fatal: Could not read from remote repository.
2022-04-05T17:54:00.1394070Z 
2022-04-05T17:54:00.1394210Z Please make sure you have the correct access rights
2022-04-05T17:54:00.1394310Z and the repository exists.
2022-04-05T17:54:00.1394610Z git-annex: cannot determine uuid for ssh-remote-1 (perhaps you need to run "git annex sync"?)']

@yarikoptic
Copy link
Member

having said that:

  • there seems to be plenty of ssh related tests which pass just fine. not sure which are exactly the ones marked for ssh but here is the tail:
$> git grep 'ssh.*ok\s*$' | grep macos | nl | tail
    91	build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:31.9527840Z datalad.support.tests.test_sshconnector.test_ssh_custom_identity_file ... ok
    92	build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:32.0505800Z datalad.support.tests.test_sshconnector.test_ssh_git_props ... ok
    93	build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:32.1017030Z datalad.support.tests.test_sshconnector.test_ssh_get_connection ... ok
    94	build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:32.2621720Z datalad.support.tests.test_sshconnector.test_ssh_manager_close_no_throw ... ok
    95	build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:33.3163930Z datalad.support.tests.test_sshrun.test_no_stdin_swallow ... ok
    96	build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:33.6052730Z datalad.support.tests.test_sshrun.test_ssh_ipv4_6 ... ok
    97	build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:33.6402710Z datalad.support.tests.test_sshrun.test_ssh_ipv4_6_incompatible ... ok
    98	build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:33.8369630Z datalad.tests.test_api.test_consistent_order_of_args(<class 'datalad.distribution.create_sibling.CreateSibling'>, {'sshurl'}) ... ok
    99	build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:33.8470960Z datalad.tests.test_api.test_consistent_order_of_args(<class 'datalad.support.sshrun.SSHRun'>, {'login', 'cmd'}) ... ok
   100	build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:05:02.3054330Z datalad.tests.test_tests_utils.test_skip_ssh ... ok

and it seems we are running into some TMPDIR binds related issue which we had encountered before? e.g.

2022-04-05T18:05:52.7696010Z datalad.runner.exception.CommandError: CommandError: 'ssh -o ControlPath=/private/tmp/DLTMP/datalad_temp_1ei94vxf/Library/Caches/datalad/sockets/a658d1c0 datalad-test 'export "PATH=/usr/lib/git-annex.linux:$PATH"; mkdir -p /private/tmp/DLTMP/datalad_temp_check_exists_interactivenpqq6gao/sibling'' failed with exitcode 1 [err: 'mkdir: cannot create directory ‘/private/tmp’: Permission denied']

@jwodder
Copy link
Member Author

jwodder commented Apr 8, 2022

@yarikoptic Regarding the TMPDIR issue, the problem seems to be that Datalad is trying to run an SSH command that runs mkdir -p /private/tmp/DLTMP/datalad_temp_check_exists_interactivenpqq6gao/sibling on the remote host, where /private/tmp is a macOS-specific path, but the SSH container is running Ubuntu.

@yarikoptic
Copy link
Member

@yarikoptic Regarding the TMPDIR issue, the problem seems to be that Datalad is trying to run an SSH command that runs mkdir -p /private/tmp/DLTMP/datalad_temp_check_exists_interactivenpqq6gao/sibling on the remote host, where /private/tmp is a macOS-specific path, but the SSH container is running Ubuntu.

hm, I wondered how it works e.g. in mac tests in appveyor of stock datalad -- oh well, https://github.com/datalad/datalad/blob/master/.appveyor.yml#L274 , that is how

  # we place the "unix" one into the user's HOME to avoid git-annex issues on MacOSX
  # gh-5291
  - sh: mkdir ~/DLTMP
  # and use that scratch space to get short paths in test repos
  # (avoiding length-limits as much as possible)
  - cmd: "set TMP=C:\\DLTMP"
  - cmd: "set TEMP=C:\\DLTMP"
  - sh: export TMPDIR=~/DLTMP

so may be do the same here for OSX?

@jwodder
Copy link
Member Author

jwodder commented Apr 8, 2022

@yarikoptic This PR already sets TMPDIR=/private/tmp/DLTMP. The problem is that DataLad is expecting the TMPDIR it its environment to be a valid TMPDIR in the environment that it's SSHing into.

@yarikoptic
Copy link
Member

@yarikoptic This PR already sets TMPDIR=/private/tmp/DLTMP. The problem is that DataLad is expecting the TMPDIR it its environment to be a valid TMPDIR in the environment that it's SSHing into.

rright, that is why as a workaround appveyor setup sets it to a path which should be present in both environments, i.e. ~/DLTMP. In the long(er) run I guess it should sense the path to be used on the remote via remote mktemp execution first I guess. Filed a dedicated datalad/datalad#6622 for that. But since unlikely it to get into imminent 0.16.0, let's do a workaround for now?

@jwodder
Copy link
Member Author

jwodder commented Apr 8, 2022

@yarikoptic If the workaround you mean is to set TMPDIR to ~/DLTMP, that was tried previously; I suspect I had to change it because the path to the local $HOME does not exist inside the SSH container.

@yarikoptic
Copy link
Member

try exactly as ~/DLTMP instead of using env var $HOME and thus possibly expanding it into original path on the host machine!? may be magic exists and it would work somehow? ;)

@jwodder
Copy link
Member Author

jwodder commented Apr 11, 2022

@yarikoptic It appears that magic does not exist.

@yarikoptic
Copy link
Member

But it is interesting how it fails right in fixture here

File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/__init__.py", line [26](https://github.com/datalad/git-annex/runs/5890452736?check_suite_focus=true#step:12:26)5, in setup_package
    _, cfg_file = prep_tmphome()
  File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/__init__.py", line 242, in prep_tmphome
    with make_tempfile(mkdir=True) as new_home:
  File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/utils.py", line 1874, in make_tempfile
    True: tempfile.mkdtemp}[mkdir](**tkwargs_)
  File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/tempfile.py", line 366, in mkdtemp
    _os.mkdir(file, 0o700)
FileNotFoundError: [Errno 2] No such file or directory: '~/DLTMP/datalad_temp_8nl769bw'

and doesn't fail similarly in stock datalad somehow...

@jwodder
Copy link
Member Author

jwodder commented May 3, 2022

Blocked by datalad/datalad#6622

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants