Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New way of comparing current run uuid with matching data runs #652

Closed
wants to merge 1 commit into from

Conversation

paigerube14
Copy link
Collaborator

@paigerube14 paigerube14 commented Nov 15, 2023

Type of change

  • Refactor
  • New feature
  • Bug fix
  • Optimization
  • Documentation Update

Description

With an environment variable set of the current UUID of a run, we want to be able to see if any metrics have regressed enough to fail a job. This current implementation takes the metadata of the uuid and collects other matching runs. Using the similar runs it aggregates the data of specific fields and gives a pass fail based on toleration value

This is still a work in progress and would love any feedback. I tried to document steps and configurations in the utils/compare/README.md

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.

Testing

  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Example ingress comparison

uuid - value_y: 833d28b4-23ad-481a-bbce-7d0735f14fef
comparison ids - value_x: ['perfscale-cpt-84173e88-bbe6-4e97-9a88-cde0b12b693b', 'f5544cff-b022-43b8-8918-965d987b6c44', 'CPT-d2581a63-b46b-48bc-a298-53cfee346bf7', 'e9cb2f03-4bdd-468d-9940-6dd7150f2cf3', '6297dffe-0987-4614-8858-2806926571a4', 'fda3af38-7181-4d03-ac38-e32a7d7e1dfb', '80ec50dd-6ef8-4b76-a614-3525cae5ebac', '96c6b369-cf90-409f-ba51-5f852665d89a']
,config.termination,sample,config.serverReplicas,config.connections,config.concurrency,total_avg_rps_x,total_avg_rps_y,difference,result
0,edge,1,45,200,18,91824.4242857143,93677.84999999999,2.0184452325219038,Pass
1,edge,2,45,200,18,92859.28428571428,97639.56999999998,5.147881281937794,Pass
2,http,1,45,200,18,137532.44142857144,143860.77000000002,4.6013351509616385,Pass
3,http,2,45,200,18,138334.88571428572,144228.68000000002,4.260526370685147,Pass
4,passthrough,1,45,200,18,212393.5,232025.31000000003,9.243131263433213,Pass
5,passthrough,2,45,200,18,215646.17142857146,261904.77000000002,21.45115689510435,Fail
6,reencrypt,1,45,200,18,77213.55428571429,82293.16,6.578645111309855,Pass
7,reencrypt,2,45,200,18,76440.35285714285,82400.04000000001,7.79651966546393,Pass

@vishnuchalla
Copy link
Collaborator

Can we gitignore the __pycache__ please?

## Defining new config

Each config is a json file consisting of
* metrics: A list of dictionairies each contiaining a list of the key values you want to compare upon (can use regex on some values, depends on set up in elasticsearch). Multiple items in this list will still use the same below fields with every search in addition to the key/values set in the dictionary
Copy link
Collaborator

@vishnuchalla vishnuchalla Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specify those same below fields used in addition to the key/values set in the dictionary here, please?


Each config is a json file consisting of
* metrics: A list of dictionairies each contiaining a list of the key values you want to compare upon (can use regex on some values, depends on set up in elasticsearch). Multiple items in this list will still use the same below fields with every search in addition to the key/values set in the dictionary
* additionalColumns: List of keys that you want to have in your results table, this can be used for only combining certain metrics. For example. if I set "samples" in the additionalColumns the data wont combine/aggregate different samples, it'll give me separate rows for each different sample type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specify those certain metrics that are combined/not-combined for easy understanding.

version = float(version) - .01

# https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
if not workerCount:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not workerCount:
query_params = {
"benchmark": meta["benchmark"],
"workerNodesType": meta["workerNodesType"],
"masterNodesType": meta["masterNodesType"],
"networkType": meta["networkType"],
"platform": meta["platform"],
"ocpVersion": f"{version}*",
"jobStatus": "success",
}
if workerCount:
query_params.update({
"masterNodesCount": meta["masterNodesCount"],
"workerNodesCount": meta["workerNodesCount"],
})
query = {
"_source": ["uuid"],
"query": {
"query_string": {
"query": " AND ".join([f'{key}: "{value}"' for key, value in query_params.items()]),
}
}
}
es = ElasticService(index=index)
response = es.post(query)
es.close()
return [item['_source']['uuid'] for item in response["hits"]["hits"]]

This will only fetch the uuid portion of the docs instead of the entire docs in-memory.

value = "{}".format(row['throughput'])
x.append(value)
y.append(test)
new = {'y' : x,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These x, y need not necessarily interchange. This implementation can be written in more readable and simple way. More details here.

if np.isinf(row['difference']):
return "Pass"
# Fail if value is a more negative than tolerancy
# High positive valus mean new uuid is better
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[typo]: High positive values


if np.isinf(row['difference']):
return "Pass"
# Fail if value is a more negative than tolerancy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README.md line#16 says better performance in negative case, but here we are marking it as Fail 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had just switched the way the dataframe was set up switching the percent values, will update to be sure they match

print("ndf " + str(ndf))
return ndf

def getResults(uuid: str, uuids: list, index_str: str, metrics: dict ):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this function looks cool. I think we can reuse the same to fetch other metrics like jobSummary etc.

#print("p tile" + str(ptile))
return ptile

def jobFilter(pdata: dict, data: dict):
Copy link
Collaborator

@vishnuchalla vishnuchalla Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No invocations for this function call too. Lets remove dead code as it creates unnecessary confusion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was an old function I rewrote to work for all workload types, should be taken out now

return all_result_tolerancy
return []

def jobSummary(uuids: list):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just now realized that, this function can also be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lot of these are from the graph-api in the dashboard work...

Preferably, we should create a python package to share these functions, and reuse them across our projects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expect for the cpt-dashboard what is the alternative place that we are planning to visualize these graphs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, no where to my knowledge. The idea was to share how we build the dataframes for the trending graphs and for the comparisons.

Copy link
Collaborator

@vishnuchalla vishnuchalla Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, lets wait for at-least on more tool to have the same use-case of displaying these graphs. Until then its just dead code and can be removed.

return ids_df['uuid'].to_list()

def burnerFilter(data: dict, columns: list[str]) :
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If interested in keeping doc strings, we can keep the format same across: https://google.github.io/styleguide/pyguide.html

kubelet_json = read_config_file("configs/kubelet.json")
etcd_json = read_config_file("configs/etcd.json")
master_json = read_config_file("configs/master_node.json")
worker_agg_json = read_config_file("configs/worker_node.json")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we miss on adding/removing this from the metrics list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Think I updated now

@vishnuchalla
Copy link
Collaborator

Do we want to remove or name first column? Currently it seems empty

,metricName,quantileName,P99_x,P99_y,difference,result
0,podLatencyQuantilesMeasurement,ContainersReady,13.0,13.0,0.0,Pass
1,podLatencyQuantilesMeasurement,Initialized,1.0,1.0,0.0,Pass

Copy link
Collaborator

@vishnuchalla vishnuchalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In overall I like this idea. After going through the discussion on JIRA attached, Reg: "I'm wondering where a good spot more helper type functions should go", here are my .02 cents

  1. If we want visualizations on top of this data, cpt-dashboard would be a better place to go.
  2. Or else if we just want to use it as a tool at the end of each CI run just to look at the results in a csv/tabular/json format, I think better place would be benchmark-comparison that we have as of today.

def process_all_data(metric_of_interest, find_metrics,uuid, ids, data_func, index, divider, additional_columns):
# Process all

columns = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it a set please, so that we can avoid duplicate columns.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I use this variable as the columns for the dataframe I dont think it allows me to specify it as a set.

File "/Users/prubenda/PycharmProjects/e2e-benchmarking/utils/compare/venv3/lib/python3.10/site-packages/pandas/core/frame.py", line 702, in __init__
  raise ValueError("columns cannot be a set")
ValueError: columns cannot be a set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we can have set (to handle duplicates if any in the additionalColumns) to collect all the columns and then type cast it to a list which is supported by data frames.

@paigerube14
Copy link
Collaborator Author

Do we want to remove or name first column? Currently it seems empty

,metricName,quantileName,P99_x,P99_y,difference,result
0,podLatencyQuantilesMeasurement,ContainersReady,13.0,13.0,0.0,Pass
1,podLatencyQuantilesMeasurement,Initialized,1.0,1.0,0.0,Pass

I think this is just the row indices from the data frame, I will look into how I can easily remove that

@jtaleric
Copy link
Member

In overall I like this idea. After going through the discussion on JIRA attached, Reg: "I'm wondering where a good spot more helper type functions should go", here are my .02 cents

  1. If we want visualizations on top of this data, cpt-dashboard would be a better place to go.

IMHO the shared functions should go into a shared library -- I am thinking something similar to go-commons -- python-commons?

  1. Or else if we just want to use it as a tool at the end of each CI run just to look at the results in a csv/tabular/json format, I think better place would be benchmark-comparison that we have as of today.

The implementation belongs in benchmark-comparison -- agreed. Or we create a tool more specific to the use case benchmark-regression

@jtaleric
Copy link
Member

jtaleric commented Jan 9, 2024

In overall I like this idea. After going through the discussion on JIRA attached, Reg: "I'm wondering where a good spot more helper type functions should go", here are my .02 cents

  1. If we want visualizations on top of this data, cpt-dashboard would be a better place to go.

IMHO the shared functions should go into a shared library -- I am thinking something similar to go-commons -- python-commons?

https://github.com/cloud-bulldozer/py-commons

We have created this repo ;)

  1. Or else if we just want to use it as a tool at the end of each CI run just to look at the results in a csv/tabular/json format, I think better place would be benchmark-comparison that we have as of today.

The implementation belongs in benchmark-comparison -- agreed. Or we create a tool more specific to the use case benchmark-regression

@paigerube14
Copy link
Collaborator Author

Moving this work between py-commons and orion

A start of moving this functionality is in these PR's

cloud-bulldozer/orion#9
cloud-bulldozer/py-commons#16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants