Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paola's usecase #251

Open
yuyiguo opened this issue Nov 16, 2016 · 11 comments
Open

Paola's usecase #251

yuyiguo opened this issue Nov 16, 2016 · 11 comments

Comments

@yuyiguo
Copy link
Member

yuyiguo commented Nov 16, 2016

Paola request below data searching task:
Given a PrepID and a time frame (a week), she needs to find out:

  1. the workflow names/task names associated with the prepID.
  2. The failed job type.
  3. The site name where the job was failed.
  4. The exitCode of the failed job.

The definition of a failed job is that the job was failed after all the tries, usually Agents will try three times. If a job runs successful after the 3rd try, we will not collect the information for the previous failed tries. And this job is considered as a successful job.

We will only collect the final failed try for above information, the previous tries will not be collected.

currently, PrepID info is at file level. Seangchan and his group is working on getting PrepID and Champaign info into the top level of FWJR. We will need to update WMarchive schema after they put these in production.

Valentin is working on the scripts using the prepID at file level. He will provide Paola a working script for her initial use.

@vkuznet
Copy link
Contributor

vkuznet commented Dec 4, 2016

Hi, the code which describes Paola's use case has been committed here: #273

Here is a recipe to do a job:

  • acquire account on CERN analytix cluster
  • login to CERN analytix cluster
  • cd to your working area
  • download WMArchive code git clone [email protected]:dmwm/WMArchive.git
  • setup environment
cd WMArchive
export PYTHONPATH=$PWD/WMArchive/src/python/
export PATH=$PWD/bin:$PATH
  • write spec file with desired prepid, e.g.
cat prepid.spec
{"spec":{"prep_id":"SUS-RunIISummer16DR80Premix-00169", "timerange":[20161127,20161129]}, "fields":[]}

here you put your desired prep_id and approximate timerange to scan in WMArchive. The dates in timerange list is lower/upper bounds in YYYYMMDD format.

  • run myspark
myspark --spec=prepid.spec --yarn --records-output=prepid.records --script=RecordFinderFailures

Once job is done the output will be in prepid.records. It is a json file with the following structure:

[
{"site": "T2_ES_CIEMAT", "exitCode": "99109", "exitStep": "stageOut1", "jobtype": "Processing", "workflow": "pdmvserv_task_SUS-RunIISummer15GS-00196__v1_T_161125_233700_7116"},
{"site": "T2_UK_SGrid_RALPP", "exitCode": "99109", "exitStep": "stageOut1", "jobtype": "Processing", "workflow": "pdmvserv_task_SUS-RunIISummer15GS-00196__v1_T_161125_233700_7116"}
]

@paorozo
Copy link

paorozo commented Dec 14, 2016

Hi, I am having a problem running myspark, it seems no records are retrieved, even though I know the task had failures in the time range. I don't know if I am doing something wrong.
I am attaching the file .spec I am using, and the output of myspark.
Thanks.
archive.zip

@vkuznet
Copy link
Contributor

vkuznet commented Dec 14, 2016 via email

@paorozo
Copy link

paorozo commented Dec 14, 2016

I guess I misunderstood the instructions. I updated the github code, and it worked for the specific task.
But of course, I need the records by prep_id. So, in this case the .spec file will contain this line:
{"spec":{"prep_id":"B2G-RunIISummer16MiniAODv2-0012,"timerange":[20161203,20161205]}, "fields":[]}
I used the bd12469 commit, but it did not work.
What else do I need to do?
I'm sorry for the inconveniences.

@vkuznet
Copy link
Contributor

vkuznet commented Dec 15, 2016 via email

@paorozo
Copy link

paorozo commented Dec 16, 2016

Thanks Valentin.
I am doing a couple of tests with the prep_id 'ReReco-Run2016H-v1-09Nov2016-0009':

{"spec":{"prep_id":"ReReco-Run2016H-v1-09Nov2016-0009","timerange":[20161109,20161121]}, "fields":[]}

I executed:
myspark --spec=rereco.spec --yarn --records-output=rereco.records --script=RecordFinderFailures

The result was:

[
{"site": "T1_US_FNAL_Disk", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Processing", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_DE_RWTH", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Processing", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_FR_GRIF_IRFU", "exitCode": 50664, "exitStep": "PerformanceError", "jobtype": "Processing", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_CH_CERN_HLT", "exitCode": 50664, "exitStep": "PerformanceError", "jobtype": "Processing", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_US_MIT", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Merge", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_US_MIT", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Merge", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_US_MIT", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Merge", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_US_MIT", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Merge", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_US_MIT", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Merge", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_US_MIT", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Merge", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_US_MIT", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Merge", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_US_MIT", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Merge", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_US_MIT", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Merge", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_US_MIT", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Merge", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_US_MIT", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Merge", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_US_MIT", "exitCode": 99109, "exitStep": "stageOut1", "jobtype": "Merge", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"},
{"site": "T2_CH_CERN_HLT", "exitCode": 50664, "exitStep": "PerformanceError", "jobtype": "Processing", "workflow": "fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208"}]

According to WMStats the only workflow with this prep_id for that period of time, had these failures (please check "jobfailed": { )
https://cmsweb.cern.ch/wmstatsserver/data/jobdetail/fabozzi_Run2016H-v1-ZeroBiasIsolatedBunch4-09Nov2016_8023_161109_181753_3208

The results are not coherent. What do you think is the problem?

@vkuznet
Copy link
Contributor

vkuznet commented Dec 16, 2016 via email

@jenimal
Copy link

jenimal commented Jan 31, 2017

Do you need to have an account on an analytix cluster machine? if so how do I get one? Can I do testing on the agent machines?

Jen

@jenimal
Copy link

jenimal commented Jan 31, 2017

See also
#305

@vkuznet
Copy link
Contributor

vkuznet commented Jan 31, 2017 via email

@jenimal
Copy link

jenimal commented Jan 31, 2017

Thanks,
just sent the request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants