Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding cost script to create a costs summary #26

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ ADD scripts/enable_api.sh /opt/scripts/enable_api.sh
ADD scripts/estimate_billing.py /opt/scripts/estimate_billing.py
ADD scripts/persist_artifacts.py /opt/scripts/persist_artifacts.py
ADD scripts/costs_json_to_csv.py /opt/scripts/costs_json_to_csv.py
ADD scripts/cost_script.py /opt/scripts/cost_script.py

# GMS setup/run
ADD gms/resources.sh /opt/gms/resources.sh
Expand Down
8 changes: 8 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,14 @@ This functionality is also wrapped into estimate\_billing.py under the
I'd still run these separately just to have both, but if you're only
after the CSV this may be more convenient.

# cost\_script.py

Takes the output of costs_json_to_csv.py and collapses tasks that have been split into shards, giving one cost for the entire task.
It outputs a csv labeled costs_report_final.csv.

Use as follows-

python3 /opt/scripts/cost_script.py costs.tsv

# Troubleshooting scripts

Expand Down
50 changes: 50 additions & 0 deletions scripts/cost_script.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/usr/bin/env python3

"""
Converts costs TSV file to summary costs TSV file

Usage: cost_script.py [costs tsv file]
"""
#Import modules
import sys
import pandas as pd
import regex as re

file=sys.argv[1]

#initialize list called table where we'll store all the values from tsv
table = []
with open(file) as f:
for line in f:
L = line.split('\t') #split by tab
table.append(L)

#delete anything that resembles 'shard' followed by a number.
for i in table:
ksinghal28 marked this conversation as resolved.
Show resolved Hide resolved
if "shard" in i[0]:
if "retry" in i[0]:
# print("retry",i[0])
i[0] = re.sub('_shard-\d+','',i[0])
# print(i[0])
else:
# print("no retry",i[0])
i[0] = re.sub('_shard-\d+','',i[0])
# print(i[0])


#convert list of lists to pandas dataframe using first list item as header. Grab specific columns we want. Drop the first row because it's just the list of column names
table_df = pd.DataFrame(table, columns=table[0])
table_df = table_df[["callName","totalCost","cpuCost","memoryCost","diskCost"]]
table_df=table_df.drop([0])

#convert all numerical values from strings to floats
table_df = table_df.astype({'totalCost':'float','cpuCost':'float','memoryCost':'float','diskCost':'float'})

#sum all rows with same callname
table_df_sum = table_df.groupby("callName").sum()

#sort by descending order of total cost
table_df_sum=table_df_sum.sort_values(by=['totalCost'], ascending=False)

#save to csv
table_df_sum.to_csv('costs_report_final.csv', index=True)
3 changes: 3 additions & 0 deletions scripts/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
numpy
pandas
regex
cwl_utils
miniwdl == 1.2.1

Expand Down