Add test-pr.py and associated docs

Infinoid · Infinoid · commit 9a2d5739004b · 2021-05-27T15:05:59.000-04:00
diff --git a/ci/test-pr.md b/ci/test-pr.md
@@ -0,0 +1,117 @@
+# CUDA testing
+
+The script below, `test-pr.py`, is a quick and easy way to run CUDA tests on a PR.
+
+This should be done after code review, see [issue #457](https://github.com/tensor-compiler/taco/issues/457) for a discussion of the overall process.
+
+## System requirements
+
+You will need write access to the `tensor-compiler/taco` github repo, and access to run actions.
+
+You will need to have the command line `git` and `gh` tools installed, configured, and talking to github.
+
+`git` needs to be set up with SSH authentication or HTTPS authentication to push to the `tensor-compiler/taco` github repo without a password prompt.  If you don't have that, please see [ssh setup in the github docs](https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh) or [seting up an HTTPS token](https://docs.github.com/en/github/authenticating-to-github/keeping-your-account-and-data-secure/creating-a-personal-access-token).
+
+`gh` is a command-line interface to the github REST API.  If you don't have `gh`, please see [their installation guide](https://github.com/cli/cli#installation).
+
+`gh` needs to be set up to hit REST API endpoints without a password prompt.  If you don't have that, the [gh auth login](https://cli.github.com/manual/gh_auth_login) command should help.
+
+The script itself is `test-pr.py`, you can grab that below.
+
+The script requires python version 3.x, and a few basic python modules that should be installed by default (like subprocess and tempfile).
+
+## What it does
+
+This script does the following:
+
+* creates a temporary test branch for a PR
+* pushes that test branch to the taco github repo
+* kicks off the cuda test (with the default parameters) to run on that test branch
+* waits for the test to complete
+* removes the temporary test branch
+
+The script will give you a link to the test run on the github website, so you can watch and inspect the results.
+
+## Using it
+
+`python3 test-pr.md [--protocol=(ssh|https)] <PRNUMBER> [<TRYNUMBER>]`
+
+`PRNUMBER` is the PR you want to test, without the `#` prefix.
+
+If specified, `TRYNUMBER` becomes a suffix for the temporary test branch, so you can have multiple test branches going at once.  That can be omitted unless it is needed.
+
+## what it looks like
+
+This output comes from my own fork of taco as I was testing the script:
+
+```
+% python3 test-pr.py 3
+
+=== looking up ID and params of test action
+
+=== creating test branch test-pr3
+remote:
+remote: Create a pull request for 'test-pr3' on GitHub by visiting:
+remote:      https://github.com/Infinoid/taco/pull/new/test-pr3
+remote:
+
+=== triggering test action
+✓ Created workflow_dispatch event for cuda-test-manual.yml at test-pr3
+
+To see runs for this workflow, try: gh run list --workflow=cuda-test-manual.yml
+Test action is at: https://github.com/Infinoid/taco/actions/runs/882846018?check_suite_focus=true
+
+=== waiting for action to complete
+
+Refreshing run status every 3 seconds. Press Ctrl+C to quit.
+
+X test-pr3 CUDA build and test (manual) · 882846018
+Triggered via workflow_dispatch about 11 minutes ago
+
+JOBS
+X tests CUDA in 10m54s (ID 2686889157)
+  ✓ Set up job
+  ✓ Run actions/checkout@v2
+  ✓ create_build
+  ✓ cmake
+  ✓ make
+  X test
+  ✓ Post Run actions/checkout@v2
+  ✓ Complete job
+
+ANNOTATIONS
+X Process completed with exit code 2.
+tests CUDA: .github#1
+
+
+X Run CUDA build and test (manual) (882846018) completed with 'failure'
+Test results are at: https://github.com/Infinoid/taco/actions/runs/882846018?check_suite_focus=true
+
+=== cleaning up test branch test-pr3
+
+=== cleaning up temp dir
+
+```
+
+# Troubleshooting
+
+## no access to taco repo, no access to run actions in taco repo
+
+If you are a taco developer, ask Fred for access.
+
+## test workflow does not exist
+
+If you see output that looks like this:
+
+```
+=== triggering test action
+could not create workflow dispatch event: HTTP 422: Workflow does not have 'workflow_dispatch' trigger (https://api.github.com/repos/tensor-compiler/taco/actions/workflows/someIDnumber/dispatches)
+```
+
+This means the test branch does not have the cuda test workflow file.  In other words, the file `.github/workflows/cuda-test-manual.yml` does not exist yet in the version of taco that the PR is based on.
+
+To fix this, rebase or merge the PR to the current taco master branch, and then rerun the test.
+
+## other stuff
+
+If you have some other problem with it, at me on github (@infinoid) or email me for help figuring it out.
diff --git a/ci/test-pr.py b/ci/test-pr.py
@@ -0,0 +1,123 @@
+#!/usr/bin/env python3
+
+# This script runs the TACO CUDA tests on a pull request.
+# Usage: python3 test-pr.py [--protocol=(https|ssh)] <PRnumber> [<Trynumber>]
+
+# see test-pr.md for more details.
+
+repo = "tensor-compiler/taco"
+action_name = "CUDA build and test (manual)"
+
+import os
+import subprocess
+import sys
+import tempfile
+import time
+
+def main():
+    args=list(sys.argv[1:])
+    repo_url = "https://github.com/" + repo
+    try:
+        if len(args) > 0 and args[0].startswith('--protocol='):
+            protocol = args[0][11:]
+            args = args[1:]
+            if protocol == 'ssh':
+                repo_url = "ssh://git@github.com/" + repo
+            elif protocol == 'https':
+                repo_url = "https://github.com/" + repo
+            else:
+                raise Exception("unknown protocol " + protocol)
+        pr=args[0]
+        pr = int(pr)
+        attempt = None
+        if len(args) > 1:
+            attempt = args[1]
+            attempt = int(attempt)
+    except:
+        print("Usage: {} [--protocol=(ssh|https)] <prnumber> [attemptnumber]".format(sys.argv[0]))
+        exit(1)
+
+    print("\n=== looking up ID and params of test action")
+    workflowid = find_workflow_id()
+
+    with tempfile.TemporaryDirectory() as tmpdir:
+        #print("tmpdir is", tmpdir)
+        branchname = "test-pr{}".format(pr)
+        if attempt is not None:
+            branchname += "-try{}".format(attempt)
+
+        print("\n=== creating test branch", branchname)
+        subprocess.run(["git", "clone", "-q", repo_url, "git"], stdout=subprocess.DEVNULL, cwd=tmpdir, check=True)
+        gitdir=os.path.join(tmpdir, "git")
+        #print("gitdir is", gitdir)
+
+        subprocess.run(["git", "fetch", "-q", "origin", "pull/{}/head:{}".format(pr, branchname)], stdout=subprocess.DEVNULL, cwd=gitdir, check=True)
+
+        subprocess.run(["git", "checkout", "-q", branchname], stdout=subprocess.DEVNULL, cwd=gitdir, check=True)
+
+        subprocess.run(["git", "push", "-q", "--set-upstream", "origin", branchname], stdout=subprocess.DEVNULL, cwd=gitdir, check=True)
+
+        print("\n=== triggering test action")
+        old_job_id = find_latest_workflow_run(workflowid)
+        subprocess.run(["gh", "workflow", "run", "-R", repo, action_name, "-r", branchname], check=True)
+        job_api_id = None
+        while(job_api_id is None):
+            # it takes a moment for new run requests to show up in the API.
+            try:
+                job_api_id, human_url = find_workflow_run(workflowid, branchname, later_than=old_job_id)
+            except:
+                time.sleep(1)
+
+        print("\nTest action is at:", human_url)
+
+        print("\n=== waiting for action to complete")
+        time.sleep(10)
+        #subprocess.run(["gh", "run", "watch", "-R", repo, str(job_api_id)], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
+        subprocess.run(["gh", "run", "watch", "-R", repo, str(job_api_id)])
+
+        print("\nTest results are at:", human_url)
+
+        print("\n=== cleaning up test branch", branchname)
+        subprocess.run(["git", "push", "-q", "origin", "--delete", branchname], stdout=subprocess.DEVNULL, cwd=gitdir, check=True)
+
+        print("\n=== cleaning up temp dir")
+        return
+
+
+def find_workflow_id():
+    result = subprocess.run(["gh", "workflow", "view", "-R", repo, action_name], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, check=True)
+    output = str(result.stdout, "utf-8")
+    lines = output.split("\n")
+    if lines[1].startswith("ID: "):
+        workflowid = lines[1][4:]
+        return int(workflowid)
+    raise Exception("cannot find test workflow with name {}".format(action_name))
+
+def find_latest_workflow_run(workflowid):
+    output = subprocess.run(["gh", "run", "list", "-R", repo], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, check=True)
+    output = str(output.stdout, "utf-8")
+    lines = output.split("\n")
+    line = lines[0]
+    try:
+        status, result, title, workflow, ref, origin, elapsed, runid = line.split("\t")
+        return runid
+    except:
+        pass
+    return None
+
+def find_workflow_run(workflowid, branch, later_than=None):
+    output = subprocess.run(["gh", "run", "list", "-R", repo, "-w", str(workflowid)], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, check=True)
+    output = str(output.stdout, "utf-8")
+    lines = output.split("\n")
+    for line in lines:
+        try:
+            status, result, title, workflow, ref, origin, elapsed, runid = line.split("\t")
+            if later_than is not None and later_than >= runid:
+                continue
+            if ref == branch:
+                return runid, "https://github.com/{}/actions/runs/{}?check_suite_focus=true".format(repo, runid)
+        except:
+            pass
+    raise Exception("could not find workflow run for branch {}".format(branch))
+
+main()