subscription api timing out on large dataset #1079

vlimant · 2017-02-27T15:38:33Z

I cannot get https://cmsweb.cern.ch/phedex/datasvc/json/prod/subscriptions?block=/Neutrino_E-10_gun/RunIISpring15PrePremix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v2-v2/GEN-SIM-DIGI-RAW%23*&node=T2_DE_DESY&collapse=n to not timeout and therefore unified cannot identify programatically the location of the pileup.
Is there a way to break down further the request so as to get it to converge ?

FYI @areinsvo @sidnarayanan

nataliaratnikova · 2017-02-27T16:46:21Z

The Oracle DBAs reported a query plan instability issue for the subscription query, which may hinder the performance in unpredictable ways.

nataliaratnikova · 2017-02-28T23:09:15Z

Hi Jean-Roch,
I can reproduce the 502 error after ~5 min wait time in case of wild-card query.
It works fine if the full block name is specified. Also, it returns momentarily with block replaced by a dataset in the query. If pileup samples are subscribed on dataset level, then perhaps the dataset based query like the one below [1] would be good for your check? You can get all blocks in the dataset from the data API [2], or all dataset blocks at a given node from blockreplicas API [3].

Meanwhile I will follow up with Kate on performance issue. She mentioned at yesterday's compops meeting about seeing about 100 concurrent sessions for the subscriptions query. If this is initiated by the Unified scripts, could you point me to the corresponding code. I'd like to see if there is a way this could be optimized.
Thanks,
Natalia.
[1] https://cmsweb.cern.ch/phedex/datasvc/json/prod/subscriptions?dataset=/Neutrino_E-10_gun/RunIISpring15PrePremix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v2-v2/GEN-SIM-DIGI-RAW&node=T2_DE_DESY&collapse=n
[2] https://cmsweb.cern.ch/phedex/datasvc/doc/data
[3] https://cmsweb.cern.ch/phedex/datasvc/doc/blockreplicas

vlimant · 2017-03-09T16:16:27Z

Hi @nataliaratnikova , are you suggesting that instead of the wildcard search I first list of blocks and make a phedex call per block ? I can do that of course, no pb, I am unsure on how much load this will put on datasvc.

Unified does not do concurrent calls to the subscription API @sidnarayanan might be able to say more about transfer team, @yiiyama for dynamo. Is there a way you can trace the IP from which the numerous concurrent calls are coming from ?

sidnarayanan · 2017-03-09T23:08:48Z

AFAIK the transfer team should not be making 100 concurrent calls to the subscriptions (or any) API.

yiiyama · 2017-03-10T04:35:51Z

Dynamo can issue up to 64 concurrent blockreplicas queries, but it shouldn't be using subscriptions. I will double check but indeed it will be great if the IP can be known.

nataliaratnikova · 2017-03-10T17:08:30Z

I found ~40K hits coming from the MIT server, which constitute the majority of all calls to susbscription API. However, this is not necessarily the reason for the problem with the large datasets Jean-Roch reported here. Will investigate further.

vlimant · 2017-03-10T20:27:10Z

mit server is dynamo @yiiyama indeed.
@nataliaratnikova should I switch to making a two stage queries (data => subscription) ?

vlimant mentioned this issue Feb 27, 2017

skewed input from checkTransferStatus CMSCompOps/WmAgentScripts#209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subscription api timing out on large dataset #1079

subscription api timing out on large dataset #1079

vlimant commented Feb 27, 2017

nataliaratnikova commented Feb 27, 2017

nataliaratnikova commented Feb 28, 2017

vlimant commented Mar 9, 2017

sidnarayanan commented Mar 9, 2017

yiiyama commented Mar 10, 2017

nataliaratnikova commented Mar 10, 2017

vlimant commented Mar 10, 2017

subscription api timing out on large dataset #1079

subscription api timing out on large dataset #1079

Comments

vlimant commented Feb 27, 2017

nataliaratnikova commented Feb 27, 2017

nataliaratnikova commented Feb 28, 2017

vlimant commented Mar 9, 2017

sidnarayanan commented Mar 9, 2017

yiiyama commented Mar 10, 2017

nataliaratnikova commented Mar 10, 2017

vlimant commented Mar 10, 2017