How to improve performance of TransformationManager #6044
Replies: 8 comments 12 replies
-
In LHCb we regularly run transformations with more than 200000 input files without issues. Your issues might not depend specifically from these transformations, rather from running out of sockets on the machine. The CPU load might not be high, but your networking might be. You can try a few things, simplest are:
|
Beta Was this translation helpful? Give feedback.
-
Hi @zhangxiaomei, We have also similar issues. I would like to ask you if you are using the TSCatalog interface as described here: https://dirac.readthedocs.io/en/latest/AdministratorGuide/Systems/Transformation/index.html#id12 or the InputDataQuery Agent. In the case of TSCatalog interface, unfortunately the TS does not support running multiple services, but It's in my to-do list to enable this feature. |
Beta Was this translation helpful? Give feedback.
-
@fstagni about your suggestion:
Do you suggest to increase : net.core.somaxconn ? Thanks |
Beta Was this translation helpful? Give feedback.
-
@fstagni thanks a lot for these informations. In our server, only 2 of the variables you listed are set: The default values of these are: I've just tried to just increase: But given your values, I guess I can increase even more if needed. Do you suggest to eventually also set the other variables? Then, I've also read that the File descriptors should also be increased. The values I had originally were: and in my last test with net.core.somaxconn = 1024 I increased them by a factor 4. Could you please tell me if it does seem reasonable to you? |
Beta Was this translation helpful? Give feedback.
-
OK thank you. I guess we should do the same and tune our settings.
I have no idea why it's only the Master instance that gives timeouts. Anyway, do you think that it could be a similar network issue and that we should proceed to a similar tuning of the SYSCTL settings? Thanks. |
Beta Was this translation helpful? Give feedback.
-
Can we consider this thread answered? What is the status now in your installations? |
Beta Was this translation helpful? Give feedback.
-
I also tried all the network settings you suggested but unfortunately I don't see any improvement. |
Beta Was this translation helpful? Give feedback.
-
I am closing this discussion as "outdated", as 2+ years went without comments. |
Beta Was this translation helpful? Give feedback.
-
When some transformation tasks with >200,000 inputdata are submitted, it seemed as if the response of TransformationManager becomes very slow although it is still working:
(1) Transformation monitoring page get stuck with the error "timed out:SSLTimeoutError('timed out',)"
(2) The related agents such as TransformationAgent and TransformationCleaningAgent responds with the timeout problems from time to time:
"Issue getting socket: <DIRAC.Core.DISET.private.Transports.M2SSLTransport.SSLTransport object at 0x7f2478419c10> : ('dips', 'prod-dirac.ihep.ac.cn', 9131, 'Transformation/TransformationManager') : timed out:SSLTimeoutError('timed out',)"
How can I improve this situation? All the services are in one server. The load of DIRAC server is not high (2~5) and The memory used by TransformationManager is 1.1GB.
Beta Was this translation helpful? Give feedback.
All reactions