Skip to content

Conversation

@danielhollas
Copy link
Collaborator

@danielhollas danielhollas commented Sep 25, 2025

This should improve performance of async_ssh plugin for lots of small files since we're no longer sleeping 500ms when waiting for a lock.

https://docs.python.org/3/library/asyncio-sync.html#asyncio.Semaphore

Also makes the code more safe since one doesn't need to manually lock and unlock, and fixes a bug since the number of file IO connections was not decreased when an OsError exception was thrown.

@codecov
Copy link

codecov bot commented Sep 25, 2025

Codecov Report

❌ Patch coverage is 82.60870% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.60%. Comparing base (3999e84) to head (45deedd).

Files with missing lines Patch % Lines
src/aiida/transports/plugins/ssh_async.py 82.61% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7018      +/-   ##
==========================================
- Coverage   77.61%   77.60%   -0.00%     
==========================================
  Files         566      566              
  Lines       43501    43491      -10     
==========================================
- Hits        33758    33746      -12     
- Misses       9743     9745       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +524 to +534
async with self._semaphore:
try:
await self.async_backend.put(
localpath=localpath,
remotepath=remotepath,
dereference=dereference,
preserve=preserve,
recursive=False,
)
except OSError as exc:
raise OSError(f'Error while uploading file {localpath}: {exc}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just look at the code, the equivalent change should be put async with _semaphore inside the try block right before the put().
But I think it is a correct change.

pinning @khsrali for review since you implement it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I overlooked what @danielhollas already wrote: "nd fixes a bug since the number of file IO connections was not decreased when an OsError exception was thrown.".

Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks all good to me.

Out of curiosity, is a transport object tight (its lifetime) with a calcjob or it is tight with a worker?

@khsrali khsrali self-assigned this Sep 26, 2025
@khsrali khsrali self-requested a review September 26, 2025 09:06
Copy link
Contributor

@khsrali khsrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danielhollas
Copy link
Collaborator Author

@unkcpz @khsrali thank you for the review!

@khsrali do you think you could benchmark this PR in the same way you benchmarked the original async_ssh implemetation to see if this fixes the regression you saw for small files?

I am planning to write a regression test here to make sure we're releasing the resources in case of exceptions.

@GeigerJ2 GeigerJ2 moved this to In progress in aiida-core v2.7.2 Oct 7, 2025
@khsrali
Copy link
Contributor

khsrali commented Nov 4, 2025

@danielhollas, is there a particular reason you're holding off on merging this?
It looks like a solid improvement, and I don’t see a reason to delay.

If you're waiting for me to benchmark, unfortunately, I'm not getting any time to unearth the old tests and do it again.
But like I said in aiida meeting, the principle of tests is very simple, just use two blocking add and multiply workgraph, attach extra dummy files to it, so it would spend some time uploading and retrieving them.

In any case, you can always run your tests after merging, hooking the commit number.

@khsrali khsrali marked this pull request as ready for review November 4, 2025 10:37
@danielhollas
Copy link
Collaborator Author

Thanks for the ping, I plan to merge this today, but want to write a test first that is failing in main due to the lock release issue that this PR is fixing.

I still want to do the benchmarking as well, but agree that can be done after merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants