Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

directory removal race condition #1395

Open
reidsunderland opened this issue Mar 10, 2025 · 1 comment
Open

directory removal race condition #1395

reidsunderland opened this issue Mar 10, 2025 · 1 comment
Assignees
Labels
bug Something isn't working crasher Crashes entire app. HPC related to hich performance computing mirroing use case likely-fixed likely fix is in the repository, success not confirmed yet. Priority 5 - Defect Missing and/or broken functionality - do not forget

Comments

@reidsunderland
Copy link
Member

reidsunderland commented Mar 10, 2025

When a full directory is removal is being mirrored, it can cause crashes crashes and creation of empty directories when there are multiple subscriber instances.

Log example:

subscribe_mirror_10.log.2025-03-05:2025-03-05 08:10:59,586 [INFO] sarracenia.flowcb.log after_accept accepted: (lag: 30.30 )  exchange: xs_WINNOW subtopic: source.some.dir a remove with baseUrl: file:/ relPath: source/some/dir/file_2025-03-05_1m_20250401_20250430_.nc
subscribe_mirror_10.log.2025-03-05:2025-03-05 08:10:59,587 [INFO] sarracenia.flowcb.log after_accept accepted: (lag: 30.38 )  exchange: xs_WINNOW subtopic: source.some.dir a remove with baseUrl: file:/ relPath: source/some/dir/file_2025-03-05_1m_20250401_20250430_.nc


subscribe_mirror_10.log.2025-03-05:2025-03-05 08:10:59,589 [INFO] sarracenia.flowcb.log after_accept accepted: (lag: 30.27 )  exchange: xs_WINNOW subtopic:  source.some a remove with baseUrl: file:/ relPath: source/some/dir

# the directory is successfully removed


# but we are still processing messages for file removals inside the removed directory

 
# now these two errors occur because they likely already had the os.path.isdir check return True, but the directory gets deleted before the os.chdir call


subscribe_mirror_17.log.2025-03-05-2025-03-05 08:10:59,595 [INFO] sarracenia.flowcb.log after_accept accepted: (lag: 30.31 )  exchange: xs_WINNOW subtopic: source.some.dir a remove with baseUrl: file:/ relPath: source/some/dir/file_2025-03-05_1m_20250401_20250430_.nc
subscribe_mirror_17.log.2025-03-05-Traceback (most recent call last):
subscribe_mirror_17.log.2025-03-05-  File "/usr/lib/python3.6/site-packages/sarracenia/instance.py", line 276, in <module>
subscribe_mirror_17.log.2025-03-05-    i.start()
subscribe_mirror_17.log.2025-03-05-  File "/usr/lib/python3.6/site-packages/sarracenia/instance.py", line 267, in start
subscribe_mirror_17.log.2025-03-05-    self.running_instance.run()
subscribe_mirror_17.log.2025-03-05-  File "/usr/lib/python3.6/site-packages/sarracenia/flow/__init__.py", line 608, in run
subscribe_mirror_17.log.2025-03-05-    self.work()
subscribe_mirror_17.log.2025-03-05-  File "/usr/lib/python3.6/site-packages/sarracenia/flow/__init__.py", line 1207, in work
subscribe_mirror_17.log.2025-03-05-    self.do()
subscribe_mirror_17.log.2025-03-05-  File "/usr/lib/python3.6/site-packages/sarracenia/flow/__init__.py", line 1197, in do
subscribe_mirror_17.log.2025-03-05-    self.do_download()
subscribe_mirror_17.log.2025-03-05-  File "/usr/lib/python3.6/site-packages/sarracenia/flow/__init__.py", line 1783, in do_download
subscribe_mirror_17.log.2025-03-05-    os.chdir(msg['new_dir'])
subscribe_mirror_17.log.2025-03-05:FileNotFoundError: [Errno 2] No such file or directory: '/destination/some/dir'

subscribe_mirror_20.log.2025-03-05-2025-03-05 08:10:59,600 [INFO] sarracenia.flowcb.log after_accept accepted: (lag: 30.30 )  exchange: xs_WINNOW subtopic: source.some.dir a remove with baseUrl: file:/ relPath: source/some/dir/file_2025-03-05_1m_20250401_20250430_.nc
subscribe_mirror_20.log.2025-03-05-Traceback (most recent call last):
subscribe_mirror_20.log.2025-03-05-  File "/usr/lib/python3.6/site-packages/sarracenia/instance.py", line 276, in <module>
subscribe_mirror_20.log.2025-03-05-    i.start()
subscribe_mirror_20.log.2025-03-05-  File "/usr/lib/python3.6/site-packages/sarracenia/instance.py", line 267, in start
subscribe_mirror_20.log.2025-03-05-    self.running_instance.run()
subscribe_mirror_20.log.2025-03-05-  File "/usr/lib/python3.6/site-packages/sarracenia/flow/__init__.py", line 608, in run
subscribe_mirror_20.log.2025-03-05-    self.work()
subscribe_mirror_20.log.2025-03-05-  File "/usr/lib/python3.6/site-packages/sarracenia/flow/__init__.py", line 1207, in work
subscribe_mirror_20.log.2025-03-05-    self.do()
subscribe_mirror_20.log.2025-03-05-  File "/usr/lib/python3.6/site-packages/sarracenia/flow/__init__.py", line 1197, in do
subscribe_mirror_20.log.2025-03-05-    self.do_download()
subscribe_mirror_20.log.2025-03-05-  File "/usr/lib/python3.6/site-packages/sarracenia/flow/__init__.py", line 1783, in do_download
subscribe_mirror_20.log.2025-03-05-    os.chdir(msg['new_dir'])
subscribe_mirror_20.log.2025-03-05:FileNotFoundError: [Errno 2] No such file or directory: '/destination/some/dir'


# this case, where there's no error, causes /destination/some/dir to be recreated
# and all the removals below fail silently because the file already doesn't exist

subscribe_mirror_19.log.2025-03-05:2025-03-05 08:10:59,609 [INFO] sarracenia.flowcb.log after_accept accepted: (lag: 30.30 )  exchange: xs_WINNOW subtopic: source.some.dir a remove with baseUrl: file:/ relPath: source/some/dir/file_2025-03-05_1m_20250401_20250430_.nc

subscribe_mirror_16.log.2025-03-05:2025-03-05 08:10:59,636 [INFO] sarracenia.flowcb.log after_accept accepted: (lag: 30.36 )  exchange: xs_WINNOW subtopic: source.some.dir a remove with baseUrl: file:/ relPath: source/some/dir/file_2025-03-05_1m_20250401_20250430_.nc
subscribe_mirror_16.log.2025-03-05:2025-03-05 08:10:59,636 [INFO] sarracenia.flowcb.log after_accept accepted: (lag: 30.36 )  exchange: xs_WINNOW subtopic: source.some.dir a remove with baseUrl: file:/ relPath: source/some/dir/file_2025-03-05_1m_20250401_20250430_.nc

if not os.path.isdir(msg['new_dir']):
try:
logger.debug( f"missing destination directories, makedirs: {msg['new_dir']} " )
self.worklist.directories_ok.append(msg['new_dir'])
os.makedirs(msg['new_dir'], 0o775, True)
except Exception as ex:
logger.warning("making %s: %s" % (msg['new_dir'], ex))
logger.debug('Exception details:', exc_info=True)
self.reject(msg, 422, f"cannot create directory {msg['new_dir']} to put file in it." )
continue
os.chdir(msg['new_dir'])
logger.debug( f"chdir {msg['new_dir']}")

I will add another check inside this if statement. If the directory does not exist, and if this message is for a remove event, just reject the message and move on.

@reidsunderland reidsunderland added bug Something isn't working crasher Crashes entire app. HPC related to hich performance computing mirroing use case Priority 5 - Defect Missing and/or broken functionality - do not forget labels Mar 10, 2025
@reidsunderland reidsunderland self-assigned this Mar 10, 2025
@reidsunderland
Copy link
Member Author

I'm not sure if this is a consequence of MetPX/sarrac#201 but it's probably worth fixing in the Python code regardless

@reidsunderland reidsunderland added the likely-fixed likely fix is in the repository, success not confirmed yet. label Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working crasher Crashes entire app. HPC related to hich performance computing mirroing use case likely-fixed likely fix is in the repository, success not confirmed yet. Priority 5 - Defect Missing and/or broken functionality - do not forget
Projects
None yet
Development

No branches or pull requests

1 participant