Skip to content

subscription api timing out on large dataset #1079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vlimant opened this issue Feb 27, 2017 · 7 comments
Open

subscription api timing out on large dataset #1079

vlimant opened this issue Feb 27, 2017 · 7 comments

Comments

@vlimant
Copy link

vlimant commented Feb 27, 2017

I cannot get https://cmsweb.cern.ch/phedex/datasvc/json/prod/subscriptions?block=/Neutrino_E-10_gun/RunIISpring15PrePremix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v2-v2/GEN-SIM-DIGI-RAW%23*&node=T2_DE_DESY&collapse=n to not timeout and therefore unified cannot identify programatically the location of the pileup.
Is there a way to break down further the request so as to get it to converge ?

FYI @areinsvo @sidnarayanan

@nataliaratnikova
Copy link
Contributor

The Oracle DBAs reported a query plan instability issue for the subscription query, which may hinder the performance in unpredictable ways.

@nataliaratnikova
Copy link
Contributor

Hi Jean-Roch,
I can reproduce the 502 error after ~5 min wait time in case of wild-card query.
It works fine if the full block name is specified. Also, it returns momentarily with block replaced by a dataset in the query. If pileup samples are subscribed on dataset level, then perhaps the dataset based query like the one below [1] would be good for your check? You can get all blocks in the dataset from the data API [2], or all dataset blocks at a given node from blockreplicas API [3].

Meanwhile I will follow up with Kate on performance issue. She mentioned at yesterday's compops meeting about seeing about 100 concurrent sessions for the subscriptions query. If this is initiated by the Unified scripts, could you point me to the corresponding code. I'd like to see if there is a way this could be optimized.
Thanks,
Natalia.
[1] https://cmsweb.cern.ch/phedex/datasvc/json/prod/subscriptions?dataset=/Neutrino_E-10_gun/RunIISpring15PrePremix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v2-v2/GEN-SIM-DIGI-RAW&node=T2_DE_DESY&collapse=n
[2] https://cmsweb.cern.ch/phedex/datasvc/doc/data
[3] https://cmsweb.cern.ch/phedex/datasvc/doc/blockreplicas

@vlimant
Copy link
Author

vlimant commented Mar 9, 2017

Hi @nataliaratnikova , are you suggesting that instead of the wildcard search I first list of blocks and make a phedex call per block ? I can do that of course, no pb, I am unsure on how much load this will put on datasvc.

Unified does not do concurrent calls to the subscription API @sidnarayanan might be able to say more about transfer team, @yiiyama for dynamo. Is there a way you can trace the IP from which the numerous concurrent calls are coming from ?

@sidnarayanan
Copy link

AFAIK the transfer team should not be making 100 concurrent calls to the subscriptions (or any) API.

@yiiyama
Copy link

yiiyama commented Mar 10, 2017

Dynamo can issue up to 64 concurrent blockreplicas queries, but it shouldn't be using subscriptions. I will double check but indeed it will be great if the IP can be known.

@nataliaratnikova
Copy link
Contributor

I found ~40K hits coming from the MIT server, which constitute the majority of all calls to susbscription API. However, this is not necessarily the reason for the problem with the large datasets Jean-Roch reported here. Will investigate further.

@vlimant
Copy link
Author

vlimant commented Mar 10, 2017

mit server is dynamo @yiiyama indeed.
@nataliaratnikova should I switch to making a two stage queries (data => subscription) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants