New way of comparing current run uuid with matching data runs #652

paigerube14 · 2023-11-15T18:23:46Z

Type of change

Description

With an environment variable set of the current UUID of a run, we want to be able to see if any metrics have regressed enough to fail a job. This current implementation takes the metadata of the uuid and collects other matching runs. Using the similar runs it aggregates the data of specific fields and gives a pass fail based on toleration value

This is still a work in progress and would love any feedback. I tried to document steps and configurations in the utils/compare/README.md

Related Tickets & Documents

Related Issue #https://issues.redhat.com/browse/PERFSCALE-2298

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.

Testing

How were the fix/results from this change verified? Please provide relevant screenshots or results.

Example ingress comparison

uuid - value_y: 833d28b4-23ad-481a-bbce-7d0735f14fef
comparison ids - value_x: ['perfscale-cpt-84173e88-bbe6-4e97-9a88-cde0b12b693b', 'f5544cff-b022-43b8-8918-965d987b6c44', 'CPT-d2581a63-b46b-48bc-a298-53cfee346bf7', 'e9cb2f03-4bdd-468d-9940-6dd7150f2cf3', '6297dffe-0987-4614-8858-2806926571a4', 'fda3af38-7181-4d03-ac38-e32a7d7e1dfb', '80ec50dd-6ef8-4b76-a614-3525cae5ebac', '96c6b369-cf90-409f-ba51-5f852665d89a']
,config.termination,sample,config.serverReplicas,config.connections,config.concurrency,total_avg_rps_x,total_avg_rps_y,difference,result
0,edge,1,45,200,18,91824.4242857143,93677.84999999999,2.0184452325219038,Pass
1,edge,2,45,200,18,92859.28428571428,97639.56999999998,5.147881281937794,Pass
2,http,1,45,200,18,137532.44142857144,143860.77000000002,4.6013351509616385,Pass
3,http,2,45,200,18,138334.88571428572,144228.68000000002,4.260526370685147,Pass
4,passthrough,1,45,200,18,212393.5,232025.31000000003,9.243131263433213,Pass
5,passthrough,2,45,200,18,215646.17142857146,261904.77000000002,21.45115689510435,Fail
6,reencrypt,1,45,200,18,77213.55428571429,82293.16,6.578645111309855,Pass
7,reencrypt,2,45,200,18,76440.35285714285,82400.04000000001,7.79651966546393,Pass

vishnuchalla · 2023-12-09T23:29:37Z

Can we gitignore the __pycache__ please?

vishnuchalla · 2023-12-10T00:20:18Z

utils/compare/README.md

+## Defining new config
+
+Each config is a json file consisting of 
+* metrics: A list of dictionairies each contiaining a list of the key values you want to compare upon (can use regex on some values, depends on set up in elasticsearch). Multiple items in this list will still use the same below fields with every search in addition to the key/values set in the dictionary


Can we specify those same below fields used in addition to the key/values set in the dictionary here, please?

vishnuchalla · 2023-12-10T00:26:14Z

utils/compare/README.md

+
+Each config is a json file consisting of 
+* metrics: A list of dictionairies each contiaining a list of the key values you want to compare upon (can use regex on some values, depends on set up in elasticsearch). Multiple items in this list will still use the same below fields with every search in addition to the key/values set in the dictionary
+* additionalColumns: List of keys that you want to have in your results table, this can be used for only combining certain metrics. For example. if I set "samples" in the additionalColumns the data wont combine/aggregate different samples, it'll give me separate rows for each different sample type 


Can we specify those certain metrics that are combined/not-combined for easy understanding.

vishnuchalla · 2023-12-10T01:12:30Z

utils/compare/graph.py

+        version = float(version) - .01
+
+    # https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
+    if not workerCount: 


Suggested change

if not workerCount:

query_params = {

"benchmark": meta["benchmark"],

"workerNodesType": meta["workerNodesType"],

"masterNodesType": meta["masterNodesType"],

"networkType": meta["networkType"],

"platform": meta["platform"],

"ocpVersion": f"{version}*",

"jobStatus": "success",

}

if workerCount:

query_params.update({

"masterNodesCount": meta["masterNodesCount"],

"workerNodesCount": meta["workerNodesCount"],

})

query = {

"_source": ["uuid"],

"query": {

"query_string": {

"query": " AND ".join([f'{key}: "{value}"' for key, value in query_params.items()]),

}

}

}

es = ElasticService(index=index)

response = es.post(query)

es.close()

return [item['_source']['uuid'] for item in response["hits"]["hits"]]

This will only fetch the uuid portion of the docs instead of the entire docs in-memory.

vishnuchalla · 2023-12-10T01:59:00Z

utils/compare/graph.py

+        value = "{}".format(row['throughput'])
+        x.append(value)
+        y.append(test)
+    new = {'y' : x,


These x, y need not necessarily interchange. This implementation can be written in more readable and simple way. More details here.

vishnuchalla · 2023-12-10T01:59:38Z

utils/compare/graph.py

+    if np.isinf(row['difference']): 
+        return "Pass"
+    # Fail if value is a more negative than tolerancy
+    # High positive valus mean new uuid is better


[typo]: High positive values

vishnuchalla · 2023-12-10T02:15:56Z

utils/compare/graph.py

+
+    if np.isinf(row['difference']): 
+        return "Pass"
+    # Fail if value is a more negative than tolerancy


README.md line#16 says better performance in negative case, but here we are marking it as Fail 🤔

I had just switched the way the dataframe was set up switching the percent values, will update to be sure they match

utils/compare/graph.py

vishnuchalla · 2023-12-10T02:47:36Z

utils/compare/graph.py

+    print("ndf " + str(ndf))
+    return ndf
+
+def getResults(uuid: str, uuids: list, index_str: str, metrics: dict ):


The idea behind this function looks cool. I think we can reuse the same to fetch other metrics like jobSummary etc.

utils/compare/graph.py

vishnuchalla · 2023-12-10T13:02:16Z

utils/compare/graph.py

+    #print("p tile" + str(ptile))
+    return ptile
+
+def jobFilter(pdata: dict, data: dict):


No invocations for this function call too. Lets remove dead code as it creates unnecessary confusion.

it was an old function I rewrote to work for all workload types, should be taken out now

vishnuchalla · 2023-12-10T13:04:18Z

utils/compare/graph.py

+        return all_result_tolerancy
+    return []
+
+def jobSummary(uuids: list):


just now realized that, this function can also be removed.

I think a lot of these are from the graph-api in the dashboard work...

Preferably, we should create a python package to share these functions, and reuse them across our projects.

Expect for the cpt-dashboard what is the alternative place that we are planning to visualize these graphs?

Honestly, no where to my knowledge. The idea was to share how we build the dataframes for the trending graphs and for the comparisons.

IMHO, lets wait for at-least on more tool to have the same use-case of displaying these graphs. Until then its just dead code and can be removed.

vishnuchalla · 2023-12-10T13:29:48Z

utils/compare/graph.py

+    return ids_df['uuid'].to_list()
+
+def burnerFilter(data: dict, columns: list[str]) :
+    #


If interested in keeping doc strings, we can keep the format same across: https://google.github.io/styleguide/pyguide.html

utils/compare/read_files.py

vishnuchalla · 2023-12-10T15:10:39Z

utils/compare/read_files.py

+    kubelet_json = read_config_file("configs/kubelet.json")
+    etcd_json = read_config_file("configs/etcd.json")
+    master_json = read_config_file("configs/master_node.json")
+    worker_agg_json = read_config_file("configs/worker_node.json")


Did we miss on adding/removing this from the metrics list?

Yes! Think I updated now

utils/compare/search.py

vishnuchalla · 2023-12-10T16:59:10Z

Do we want to remove or name first column? Currently it seems empty

,metricName,quantileName,P99_x,P99_y,difference,result
0,podLatencyQuantilesMeasurement,ContainersReady,13.0,13.0,0.0,Pass
1,podLatencyQuantilesMeasurement,Initialized,1.0,1.0,0.0,Pass

vishnuchalla

In overall I like this idea. After going through the discussion on JIRA attached, Reg: "I'm wondering where a good spot more helper type functions should go", here are my .02 cents

If we want visualizations on top of this data, cpt-dashboard would be a better place to go.
Or else if we just want to use it as a tool at the end of each CI run just to look at the results in a csv/tabular/json format, I think better place would be benchmark-comparison that we have as of today.

vishnuchalla · 2023-12-10T18:10:57Z

utils/compare/graph.py

+def process_all_data(metric_of_interest, find_metrics,uuid, ids, data_func, index, divider, additional_columns): 
+    # Process all
+
+    columns = []


Can we make it a set please, so that we can avoid duplicate columns.

Because I use this variable as the columns for the dataframe I dont think it allows me to specify it as a set.

File "/Users/prubenda/PycharmProjects/e2e-benchmarking/utils/compare/venv3/lib/python3.10/site-packages/pandas/core/frame.py", line 702, in __init__ raise ValueError("columns cannot be a set") ValueError: columns cannot be a set

In that case, we can have set (to handle duplicates if any in the additionalColumns) to collect all the columns and then type cast it to a list which is supported by data frames.

paigerube14 · 2023-12-11T19:24:20Z

Do we want to remove or name first column? Currently it seems empty

,metricName,quantileName,P99_x,P99_y,difference,result
0,podLatencyQuantilesMeasurement,ContainersReady,13.0,13.0,0.0,Pass
1,podLatencyQuantilesMeasurement,Initialized,1.0,1.0,0.0,Pass

I think this is just the row indices from the data frame, I will look into how I can easily remove that

jtaleric · 2023-12-19T13:26:03Z

In overall I like this idea. After going through the discussion on JIRA attached, Reg: "I'm wondering where a good spot more helper type functions should go", here are my .02 cents

If we want visualizations on top of this data, cpt-dashboard would be a better place to go.

IMHO the shared functions should go into a shared library -- I am thinking something similar to go-commons -- python-commons?

Or else if we just want to use it as a tool at the end of each CI run just to look at the results in a csv/tabular/json format, I think better place would be benchmark-comparison that we have as of today.

The implementation belongs in benchmark-comparison -- agreed. Or we create a tool more specific to the use case benchmark-regression

jtaleric · 2024-01-09T15:55:40Z

In overall I like this idea. After going through the discussion on JIRA attached, Reg: "I'm wondering where a good spot more helper type functions should go", here are my .02 cents

If we want visualizations on top of this data, cpt-dashboard would be a better place to go.

IMHO the shared functions should go into a shared library -- I am thinking something similar to go-commons -- python-commons?

https://github.com/cloud-bulldozer/py-commons

We have created this repo ;)

Or else if we just want to use it as a tool at the end of each CI run just to look at the results in a csv/tabular/json format, I think better place would be benchmark-comparison that we have as of today.

The implementation belongs in benchmark-comparison -- agreed. Or we create a tool more specific to the use case benchmark-regression

paigerube14 · 2024-01-30T19:36:45Z

Moving this work between py-commons and orion

A start of moving this functionality is in these PR's

cloud-bulldozer/orion#9
cloud-bulldozer/py-commons#16

paigerube14 requested a review from jtaleric November 15, 2023 18:25

paigerube14 force-pushed the new_compare branch from 294ea13 to a221fd2 Compare November 15, 2023 18:29

vishnuchalla self-requested a review November 15, 2023 21:42