Add support for CDM v8 #144

dbutenhof · 2024-12-13T15:30:55Z

Type of change

Description

Principally this changes the names of the various document "id" fields from "id" to "<doctype>-uuid" (e.g., "run-uuid"). This code determines whether the Crucible instance is using v7 or v8, and generates the proper ID field names.

There are some other minor cleanups here, in handling broken documents.

Related Tickets & Documents

PANDA-693 CDM v8

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.

Testing

Manually tested against a CDM v8 Crucible instance provided by Karl Rister as well as against the existing RHEL AI CPT dashboard v7 instance.

This encapsulates substantial logic to encapsulate interpretation of the Crucible Common Data Model OpenSearch schema for the use of CPT dashboard API components. By itself, it does nothing.

This builds on the `crucible_svc` layer in cloud-bulldozer#122 to add a backend API.

This relies on the ilab API in cloud-bulldozer#123, which in turn builds on the crucible service in cloud-bulldozer#122.

When graphing metrics from two runs, the timestamps rarely align; so we add a `relative` option to convert the absolute metric timestamps into relative delta seconds from each run's start.

This adds the basic UI to support comparison of the metrics of two InstructLab runs. This compares only the primary metrics of the two runs, in a relative timeline graph. This is backed by cloud-bulldozer#125, which is backed by cloud-bulldozer#124, which is backed by cloud-bulldozer#123, which is backed by cloud-bulldozer#122. These represent a series of steps towards a complete InstructLab UI and API, and will be reviewed and merged from cloud-bulldozer#122 forward.

This PR is primarily CPT dashboard backend API (and Crucible service) changes to support pulling and displaying multiple Crucible metric statistics. Only minor UI changes are included to support API changes. The remaining UI changes to pull and display statistics will be pushed separately.

Add statistics charts for selected metric in row expansion and comparison views.

Extract the "Metadata" into a separate component, which allows it to be reused as an info flyover on the comparison page to help in identifying target runs to be compared.

Modify the metrics pulldown to allow multiple selection. The statistical summary chart and graph will show all selected metrics in addition to the inherent benchmark primary benchmark (for the primary period).

Support selection of multiple metrics using the pulldown in the comparison page. The update occurs when the pulldown closes. To simplify the management of "available metrics" across multiple selected runs, which might have entirely different metrics, the reducer no longer tries to store separate metric selection lists for each run. This also means that the "default" metrics selection remains when adding another comparison run, or expanding another row.

The Plotly graphing package doesn't directly support a "delta time" type, and in the comparison view we want to use delta time to compare two runs that will generally have different absolute timestamps. (It turns out that the native PatternFly graphing package, Victory, has the same limitation.) Initially, this just reported numeric delta seconds, but that's unnatural for a reader. This PR adds support for a `absolute_relative` option which reports the delta times as small absolute timestamps, like `1970-01-01 00:01:00` for 60 seconds, formatting ticks using `"%H:%M:%S"` ("00:01:00") for readability. I also made the X axis title appear, which necessitated some refactoring of the layout to avoid overlaying the legend on the axis label; and in the process I moved the "presentation specific" width parameter into the UI and the others into the API so they don't have to be duplicated in the two action calls.

The UI is currently hardcoded to support two environments: a development deployment from `localhost:3000` to `localhost:8000`, and the API explicitly allows `http://localhost:3000` as a cross-site origin; and an OpenShift deployment where cross-site scripting is unnecessary as the cluster API reverse proxy hides the port numbers. Partly for more general testing and deployment, but specifically because the RHEL AI InstructLab project requires CPT dashboard access now before our code has been integrated into the production OpenShift deployment, it's convenient to support a third "bare metal" mode where the containerized UI and backend are hosted at ports 3000 and 8000 on some host (e.g., in the RDU3 Performance Lab). For this, the UI needs to recognize that a non-`localhost` `window.location` with a `3000` port needs to call the API at port `8000` on the same host (for our "bare metal" deployment) while an empty port indicates we're using the OpenShift API reverse proxy routing. Similarly, the backend code cross-site scripting protection needs to allow port 3000 from the same host as a valid origin.

The API makes an effort to create unique metric labels when comparing data across multiple runs, but without human direction the constructed names aren't necessarily ideal to help a human focus on the comparisons they want to make. This PR adds a UI template mechanism to allow the human to format names that are more useful -- for example, to focus on the differences between software releases or hardware configurations expressed by Crucible tags or params. For example, a template of `<metric> <iteration>:<period> <tag:accelerator>` identifies metrics between two runs based on the hardware accelerator type used for each. This introduces a new action to fetch and store the available filters, which exposes the param and tag values currently in use. Usability of the template mechanism will depend on consistent application of specific tags, which we expect to be increasingly supplied automatically by Crucible discovery rather than relying on the (current) ad hoc user definition in endpoint files.

Principally this changes the names of the various document "id" fields from "id" to "<doctype>-uuid" (e.g., "run-uuid"). This code determines whether the Crucible instance is using v7 or v8, and generates the proper ID field names. There are some other minor cleanups here, in handling broken documents.

dbutenhof and others added 14 commits October 18, 2024 12:51

Add new Crucible backend service

6717f4d

This encapsulates substantial logic to encapsulate interpretation of the Crucible Common Data Model OpenSearch schema for the use of CPT dashboard API components. By itself, it does nothing.

Add an ilab API module for InstructLab

55f9c2b

This builds on the `crucible_svc` layer in cloud-bulldozer#122 to add a backend API.

Add an ilab UI tab

176ddee

This relies on the ilab API in cloud-bulldozer#123, which in turn builds on the crucible service in cloud-bulldozer#122.

Support graphing multiple run comparisons

7215bb2

When graphing metrics from two runs, the timestamps rarely align; so we add a `relative` option to convert the absolute metric timestamps into relative delta seconds from each run's start.

Display statistical summaries

1245407

Add statistics charts for selected metric in row expansion and comparison views.

Add metadata flyover on comparison page

f2c28fd

Extract the "Metadata" into a separate component, which allows it to be reused as an info flyover on the comparison page to help in identifying target runs to be compared.

Support selection of multiple metrics

ea19d5a

Modify the metrics pulldown to allow multiple selection. The statistical summary chart and graph will show all selected metrics in addition to the inherent benchmark primary benchmark (for the primary period).

dbutenhof force-pushed the cdmv8 branch from 0039473 to 31fe81b Compare December 13, 2024 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for CDM v8 #144

Add support for CDM v8 #144

dbutenhof commented Dec 13, 2024

Add support for CDM v8 #144

Are you sure you want to change the base?

Add support for CDM v8 #144

Conversation

dbutenhof commented Dec 13, 2024

Type of change

Description

Related Tickets & Documents

Checklist before requesting a review

Testing