Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sr3 3.0.53 mkdir crash when configurations have statehost set. #1080

Closed
petersilva opened this issue May 30, 2024 · 6 comments
Closed

sr3 3.0.53 mkdir crash when configurations have statehost set. #1080

petersilva opened this issue May 30, 2024 · 6 comments
Labels
bug Something isn't working crasher Crashes entire app. likely-fixed likely fix is in the repository, success not confirmed yet.

Comments

@petersilva
Copy link
Contributor


[me@host ~]$ sr3 status
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
missing state for winnow/winnow-ss5-04
Traceback (most recent call last):
  File "/home/pas037/.local/bin/sr3", line 11, in <module>
    load_entry_point('metpx-sr3==3.0.53', 'console_scripts', 'sr3')()
  File "/usr/lib/python3.6/site-packages/sarracenia/sr.py", line 2963, in main
    gs = sr_GlobalState(cfg, cfg.configurations)
  File "/usr/lib/python3.6/site-packages/sarracenia/sr.py", line 1212, in __init__
    self._resolve()
  File "/usr/lib/python3.6/site-packages/sarracenia/sr.py", line 909, in _resolve
    os.mkdir(self.user_cache_dir + os.sep + c + os.sep + cfg)
FileExistsError: [Errno 17] File exists: '/home/pas037/.cache/sr3/winnow/winnow-ss5-04'
[me@host ~]$
@petersilva petersilva added bug Something isn't working crasher Crashes entire app. labels May 30, 2024
@petersilva
Copy link
Contributor Author

crash addressed by #1076 but not root cause.
Root cause seems to be not looking at statehost in that part of the code. more work needed.

@petersilva
Copy link
Contributor Author

diagnosis:

  • the actual state directory (with statehost) is missing and needs to be created.
  • code wrongly tries to make the directory (without statehost) and that exists.
  • will actually work once... (creating the wrong directory) but on second invocation will crash with EEXIST.

Actual fix is to create the correct state directory, rather than the wrong one.

work-around: create the correct state directory manually/outside sr3. If it exists, it's lack won't be detected, and it won't mkdir it.


[me@host sr3]$ sr3 features
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
statehost
missing state for winnow/winnow-ss5-03
Traceback (most recent call last):
  File "/home/pas037/.local/bin/sr3", line 11, in <module>
    load_entry_point('metpx-sr3', 'console_scripts', 'sr3')()
  File "/fs/homeu2/ssc/di/pas037/Sarracenia/sr3/sarracenia/sr.py", line 2963, in main
    gs = sr_GlobalState(cfg, cfg.configurations)
  File "/fs/homeu2/ssc/di/pas037/Sarracenia/sr3/sarracenia/sr.py", line 1212, in __init__
    self._resolve()
  File "/fs/homeu2/ssc/di/pas037/Sarracenia/sr3/sarracenia/sr.py", line 909, in _resolve
    os.mkdir(self.user_cache_dir + os.sep + c + os.sep + cfg)
FileExistsError: [Errno 17] File exists: '/home/pas037/.cache/sr3/winnow/winnow-ss5-03'
[me@host sr3]$

repair by building the missing state directory:


[me@host sr3]$ mkdir ~/.cache/sr3/host/winnow/winnow-ss5-03

on each host where the command is to be invoked... this may be quite tedious on a large cluster where hundreds of hosts are involved.

@petersilva
Copy link
Contributor Author

A second work-around, in the case of missing directories, is to start them up on the machine.
This will create the missing directories for the local host.

@petersilva
Copy link
Contributor Author

There is an additional weirdness where two host directories are created, one with fqdn, the other with just the hostname. the fqdn one gets the an empty log file created, and the the other gets the rest of the state files (including a functional log file.)

fix is here: dfe67d8

@petersilva
Copy link
Contributor Author

Alternative approach to fixing this problem: why are there mkdirs in that part of the code at all... it means that if we run "sr3 status" it creates a whole bunch of state directories... feels wrong and odd. It should be just reading the state that is there. another branch is taking that approach.

@petersilva
Copy link
Contributor Author

remove unneded statehost print 4ac239a

@petersilva petersilva mentioned this issue Jun 1, 2024
@petersilva petersilva added the likely-fixed likely fix is in the repository, success not confirmed yet. label Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working crasher Crashes entire app. likely-fixed likely fix is in the repository, success not confirmed yet.
Projects
None yet
Development

No branches or pull requests

1 participant