Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health Impact of Realistic Consumable Availability Scenarios #1367

Draft
wants to merge 103 commits into
base: master
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
b632850
add script to prepare HHFA data for regression analysis
May 22, 2024
a1646a5
add scripts to run regression analysis
May 22, 2024
eedb2e8
move file paths to main.R
May 23, 2024
1f2c5b7
Add script to generation regression-based predicted changes to availa…
May 23, 2024
64b2160
add predict.R to workflow
May 23, 2024
33c9039
draft script to generate consumable availability scenarios
May 23, 2024
3f1fdff
update consumable RF to include TLO-HHFA consumable mapping
May 28, 2024
1e46315
updates to imputation
May 29, 2024
6175495
check that the dataframe contains all districts
May 30, 2024
e9a2f51
temporary changes to revert
May 30, 2024
56dc371
update 'regression_application' column
May 31, 2024
e3fc740
update interpolation methodology
Jun 3, 2024
3e287a8
add scenarios 6-8
Jun 4, 2024
ed78828
update test for consumables availability ResourceFile
Jun 4, 2024
31f6ead
update test for consumables availability ResourceFile
Jun 5, 2024
0089527
add script to generate barplots to visualise change in available_prop…
Jun 5, 2024
e3211fd
update the method of generating scenarios 6-8
Jun 6, 2024
19aa36e
minor figure edit
Jun 10, 2024
a85db77
Merge remote-tracking branch 'origin/master' into sakshi/impact_of_co…
Jun 10, 2024
0135a23
allow simulation to import availability estimates based on new scenarios
Jun 10, 2024
f9c7df1
update health system parameter
Jun 10, 2024
2e5ed3f
revert to consumable RFs before master was merged in
Jun 10, 2024
84fdb73
update RF to include availability iunder the 8 realistic improved ava…
Jun 10, 2024
3539397
add the full list of scenario availability columns to consumable and …
Jun 10, 2024
7bf2b07
update assertion to ensure that only consumable availabilility at lev…
Jun 10, 2024
6cbe0ee
update helper function to load consuamable data
Jun 10, 2024
553d0cf
Revert "update helper function to load consuamable data"
Jun 10, 2024
d4fd581
update helper function to load consuamable data
Jun 10, 2024
242503a
update _process_consumables_data
Jun 10, 2024
82cb1b0
update _process_consumables_data
Jun 10, 2024
df28a99
correct _process_consumables_data
Jun 10, 2024
ee9616e
update duration of simulation for local run
Jun 10, 2024
8d2a79f
update _process_consumables_data
Jun 11, 2024
b78d5a1
update scenario for large run
Jun 11, 2024
be63ffc
add scenario analysis script
Jun 13, 2024
37d9b5f
add figures showing mechanisms of impact
Jun 14, 2024
30548ab
update figure legend to show cleaner scneario names
Jun 15, 2024
4d7ae42
minor script cleaning
Jun 15, 2024
db710c7
add heatmap aummarising consumable availability in the RF
Jun 15, 2024
1c68e6b
remove line plots for mechanisms of impact
Jun 22, 2024
998c5ae
update total DALYs averted and total DALYs accrued figures
Jun 22, 2024
c47d771
plot DALYs accrued by cause
Jun 22, 2024
847d4ae
plot DALYs averted by cause
Jun 23, 2024
fc41b9a
plot DALYs averted per person year
Jun 23, 2024
230f1a3
plot DALYs averted per person year by cause
Jun 23, 2024
52d54ca
update scenario names to be more clear
Jun 23, 2024
bd75d63
fix error in calculation of DALYs per person-year
Jun 23, 2024
30682a2
change plot labels from person-year to person
Jun 23, 2024
a96d78e
add bar plots of capacity utilised by cadre and level
Jun 23, 2024
1009734
update names of scenarios
Jun 24, 2024
030083a
update heatmap to only include two levels of care
Jun 24, 2024
0b671c9
fix average to only include 1a and 1b in the heatmap
Jun 25, 2024
2558e5f
update 'status quo' to 'actual'
Jun 28, 2024
18046f4
get heatmaps for all scenarios
Jul 10, 2024
0a3ac87
Merge remote-tracking branch 'origin/master' into sakshi/impact_of_co…
Sep 4, 2024
8dad43a
update paths from dropbox to sharepoint
Sep 4, 2024
d62e086
Clean code for assigning consumable category
Sep 4, 2024
fd106ae
Rename item category column and extract into consumable availability RF
Sep 5, 2024
4c2644c
Add 'item_category' to the function to check format of consumable ava…
Sep 5, 2024
b01904f
remove reindex() line because this is performed earlier in the script
Sep 5, 2024
4ab53e3
removed these scripts which are not being used in the TLO model
Sep 5, 2024
09d8bb2
Drop columns not needed from RF_items_and_packages
Sep 5, 2024
35d6ebe
Merge branch 'sakshi/sharepoint_update_for_consumables_RF' into saksh…
Sep 5, 2024
15aeb98
load 'item_category' directly from the small consumables RF
Sep 5, 2024
e51384a
update df name for readability
Sep 5, 2024
21fd8a8
add HHFA proxies for two more TLO model consumables
Sep 6, 2024
1b63b30
add scenario 9 (level 2 increased to 99th percentile)
Sep 6, 2024
6488940
add parallel supply chain scenarios
Sep 10, 2024
b4dedea
add heatmaps to represent scenarios
Sep 10, 2024
a733a89
add scenarios to simulation file
Sep 10, 2024
2eda2d4
add new scenarios to consumables.py
Sep 10, 2024
5f9b9c4
add new scenarios to healthsystem.py
Sep 10, 2024
106e322
[TEMPORARY] remove item_category
Sep 10, 2024
c8b1431
Revert "Drop columns not needed from RF_items_and_packages"
Sep 10, 2024
c76624a
update scenario for local run
Sep 11, 2024
208fc4e
Remove item_category from RF_small and move to RF_designations
Sep 12, 2024
d1b7d78
add item_category from RF_designations
Sep 12, 2024
7aae87e
update scenario file for local run
Sep 12, 2024
3989c44
add description of scenarios
Sep 12, 2024
fedc2fc
minor update to correct direction of mapping
Sep 12, 2024
82cc7f2
update scenario to submit to Azure
Sep 12, 2024
c9bffa5
minor changes to figures + allow new results to be loaded
Sep 17, 2024
ecaf310
minor changes to figures
Sep 17, 2024
c262452
Explicitly specify colours for each scenario
Sep 17, 2024
3cd31b6
drop the DHO scenario
Sep 17, 2024
f22d845
rename columns for df_for_plot and streamline the choice of scenarios…
Sep 17, 2024
ec338b8
update figures for think tank presentation
Sep 19, 2024
05156d2
Update scenarios 10-12 to apply max conditions as necessary
Sep 26, 2024
52f12bf
update bar plot
Sep 26, 2024
280c753
update bar plot
Sep 26, 2024
a018c3a
update scenarios 6-9 so that availability is max(original availabilit…
Oct 20, 2024
4edcdcf
Merge remote-tracking branch 'origin/master' into sakshi/impact_of_co…
Oct 20, 2024
0f55c9c
add scenarios 9-12 to test_consumables.py
Oct 20, 2024
693518d
correct num_dalys plots to account for median rather than mean values
Nov 15, 2024
9c4b4b4
update DALYs averted plot to add heatmap of average consumable availa…
Nov 15, 2024
96ae730
correct resourcefilepath
Nov 15, 2024
cdaa1b2
add items 1237, 1239, 2678, and 1124 to consumables RFs (include Fans…
Jan 15, 2025
8eae35a
added item code 75 (Gauze)
Jan 15, 2025
a281527
add detailed heatmaps for actual and 75th percentile scenarios for co…
Jan 16, 2025
b90dd13
Assume that 50% of the expenditure reported under 'Vehicles - Fuel an…
Jan 17, 2025
08acc9b
Revert "Assume that 50% of the expenditure reported under 'Vehicles -…
Jan 17, 2025
aeee9ee
edit axis titles
Jan 21, 2025
c1cebe8
update heatmap figures so that average availability across rows is ba…
Jan 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
temporary changes to revert
  • Loading branch information
sm2511 committed May 30, 2024
commit e9a2f51027afed6c908cf22c554d4bacfecb9082
Original file line number Diff line number Diff line change
@@ -148,7 +148,7 @@
'cancer': 'ncds',
}
# TODO Check if the above mapping is correct
# TODO check how infection_prev should be mapped
# TODO collapse infection_prev and general in the HHFA-based predicted dataframe

scenario_availability_facid_merge['category_tlo'] = scenario_availability_facid_merge['program_plot'].replace(map_model_programs_to_hhfa)

@@ -162,7 +162,7 @@
# TODO add new consumables Rifapentine to this?

# Now merge in TLO item codes
scenario_availability_facid_merge = scenario_availability_facid_merge.reset_index().drop(['index'], axis=1)
scenario_availability_facid_merge = scenario_availability_facid_merge.reset_index(drop = True)
scenario_availability_facid_itemcode_merge = scenario_availability_facid_merge.merge(consumable_crosswalk_df[['item_code', 'item_hhfa', 'regression_application', 'module_name']],
on = ['item_hhfa'], how='right', indicator=True, validate = "m:m")
scenario_availability_facid_itemcode_merge = scenario_availability_facid_itemcode_merge.drop_duplicates(['Facility_ID', 'item_code'])
@@ -184,10 +184,15 @@
#------------------------------------------------------
# 1.2.6.1 Facility IDs not matched
#------------------------------------------------------
# Before merging the scenario dataframe with tlo_availability_df, generate rows with all 57 relevant facility IDs for item_codes
# Before merging the scenario dataframe with tlo_availability_df, generate rows with all 59 relevant facility IDs for item_codes
# which are not matched
df = scenario_availability_facid_itemcode_merge
df_missing_facids = df.loc[df['Facility_ID'].isna()].reset_index()
df = scenario_availability_facid_itemcode_merge[['District', 'Facility_Level', 'Facility_ID',
'category_tlo', 'item_code',
'available', 'available_prob_predicted', 'change_proportion_scenario1',
'change_proportion_scenario2', 'change_proportion_scenario3',
'change_proportion_scenario4', 'change_proportion_scenario5', 'regression_application',
'merge_itemcode']]
df_missing_facids = df.loc[df['Facility_ID'].isna()].reset_index(drop = True)
df_missing_facids = df_missing_facids.drop_duplicates('item_code') # These item_codes don't have separate rows by
# Facility_ID because they were not found in the HHFA regression analysis

@@ -310,6 +315,39 @@
scenario_final_df = pd.concat([scenario_final_df, result_df], ignore_index=True)

'''
# 1.2.6.3 For level 1b for the districts where this level was not present in the regression analysis/HHFA dataset, assume
# that the change is equal to the product of the (ratio of average change across districts for level 1b to
# average change across districts for level 1a) and change for each item_code for level 1a for that district
#------------------------------------------------------------------------------------------------------------
average_change_across_districts = scenario_final_df.groupby(['Facility_Level','item_code'])[list_of_scenario_variables].mean().reset_index()

# Generate the ratio of the proportional changes to availability of level 1b to 1a in the districts for which level 1b data is available
new_colnames_1a = {col: col + '_1a' if col in list_of_scenario_variables else col for col in average_change_across_districts.columns}
new_colnames_1b = {col: col + '_1b' if col in list_of_scenario_variables else col for col in average_change_across_districts.columns}
average_change_across_districts_for_1a = average_change_across_districts[average_change_across_districts.Facility_Level == "1a"].rename(new_colnames_1a, axis = 1).drop('Facility_Level', axis = 1)
average_change_across_districts_for_1b = average_change_across_districts[average_change_across_districts.Facility_Level == "1b"].rename(new_colnames_1b, axis = 1).drop('Facility_Level', axis = 1)
ratio_of_change_across_districts_1b_to_1a = average_change_across_districts_for_1a.merge(average_change_across_districts_for_1b,
how = "left", on = ['item_code'])
for var in list_of_scenario_variables:
var_ratio = 'ratio_' + var
var_1a = var + '_1a'
var_1b = var + '_1b'
ratio_of_change_across_districts_1b_to_1a[var_ratio] = (ratio_of_change_across_districts_1b_to_1a[var_1b])/(ratio_of_change_across_districts_1b_to_1a[var_1a])
ratio_of_change_across_districts_1b_to_1a.reset_index(drop = True)
# TODO check if this ratio should be of the proportions minus 1

# Use the above for those districts with no level 1b facilities recorded in the HHFA data
cond_1b_missing_district = scenario_final_df.District.isin(districts_with_no_scenario_data_for_1b_only)
cond_1b = scenario_final_df.Facility_Level == '1b'
cond_1a = scenario_final_df.Facility_Level == '1a'
df_missing_1b = scenario_final_df[cond_1b_missing_district & cond_1b]
df_1a = scenario_final_df[cond_1b_missing_district & cond_1a]


scenario_final_df



# TODO There are still some items missing for some facility IDs

# 2. Merge TLO model availability data with scenario data using crosswalk
@@ -344,7 +382,7 @@
var_1a = var + '_1a'
var_1b = var + '_1b'
ratio_of_change_across_districts_1b_to_1a[var_ratio] = (ratio_of_change_across_districts_1b_to_1a[var_1b]-1)/(ratio_of_change_across_districts_1b_to_1a[var_1a] - 1)
ratio_of_change_across_districts_1b_to_1a.reset_index()
ratio_of_change_across_districts_1b_to_1a.reset_index(drop = True)

# Use the above for those districts no level 1b facilities recorded in the HHFA data
cond_1b_missing_district = new_availability_df.District.isin(districts_with_no_scenario_data_for_1b_only)
@@ -373,19 +411,117 @@
new_availability_df_imputed = pd.concat([new_availability_df[~(cond_1b_missing_district & cond_1b)], df_missing_1b_imputed], ignore_index = True)

# 2.2.2 For all levels other than 1a and 1b, there will be no change in consumable availability
#------------------------------------------------------
#------------------------------------------------------------------------------------------------------------
fac_levels_not_relevant_to_regression = new_availability_df_imputed.Facility_Level.isin(['0', '2', '3', '4'])
new_availability_df_imputed.loc[fac_levels_not_relevant_to_regression, 'availability_change_prop'] = 1

for var in list_of_scenario_variables:
new_availability_df_imputed.loc[fac_levels_not_relevant_to_regression, var] = 1

# 2.3 Final checks
#------------------------------------------------------
# 2.3.1 Check that the merged dataframe has the same number of unique items, facility IDs, and total
# number of rows as the original small availability resource file
#------------------------------------------------------
#---------------------------------------------------------------------------------------------------------
assert(new_availability_df_imputed.item_code.nunique() == tlo_availability_df.item_code.nunique())
assert(new_availability_df_imputed.Facility_ID.nunique() == tlo_availability_df.Facility_ID.nunique())
assert(len(new_availability_df_imputed) == len(tlo_availability_df))

# 2.3.2 Construct dataset that conforms to the principles expected by the simulation: i.e. that there is an entry for every
# facility_id and for every month for every item_code.
#-----------------------------------------------------------------------------------------------------------------------
# Generate the dataframe that has the desired size and shape
fac_ids = set(mfl.loc[mfl.Facility_Level != '5'].Facility_ID)
item_codes = set(tlo_availability_df.item_code.unique())
months = range(1, 13)
all_availability_columns = ['available_prop', 'change_proportion_scenario1', 'change_proportion_scenario2',
'change_proportion_scenario3', 'change_proportion_scenario4',
'change_proportion_scenario5']

# Create a MultiIndex from the product of fac_ids, months, and item_codes
index = pd.MultiIndex.from_product([fac_ids, months, item_codes], names=['Facility_ID', 'month', 'item_code'])

# Initialize a DataFrame with the MultiIndex and columns, filled with NaN
full_set = pd.DataFrame(index=index, columns=all_availability_columns)
full_set = full_set.astype(float) # Ensure all columns are float type and filled with NaN

# Insert the data, where it is available.
full_set = full_set.combine_first(new_availability_df_imputed.set_index(['Facility_ID', 'month', 'item_code'])[all_availability_columns])

# Fill in the blanks with rules for interpolation.

facilities_by_level = defaultdict(set)
for ix, row in mfl.iterrows():
facilities_by_level[row['Facility_Level']].add(row['Facility_ID'])


def get_other_facilities_of_same_level(_fac_id):
"""Return a set of facility_id for other facilities that are of the same level as that provided."""
for v in facilities_by_level.values():
if _fac_id in v:
return v - {_fac_id}


def interpolate_missing_with_mean(_ser):
"""Return a series in which any values that are null are replaced with the mean of the non-missing."""
if pd.isnull(_ser).all():
raise ValueError
return _ser.fillna(_ser.mean())


# Create new dataset that include the interpolations (The operation is not done "in place", because the logic is based
# on what results are missing before the interpolations in other facilities).
full_set_interpolated = full_set * np.nan

for fac in fac_ids:
for item in item_codes:

print(f"Now doing: fac={fac}, item={item}")

# Get records of the availability of this item in this facility.
_monthly_records = full_set.loc[(fac, slice(None), item)].copy()

if pd.notnull(_monthly_records).any():
# If there is at least one record of this item at this facility, then interpolate the missing months from
# the months for there are data on this item in this facility. (If none are missing, this has no effect).
_monthly_records = interpolate_missing_with_mean(_monthly_records)

else:
# If there is no record of this item at this facility, check to see if it's available at other facilities
# of the same level
facilities = list(get_other_facilities_of_same_level(fac))
recorded_at_other_facilities_of_same_level = pd.notnull(
full_set.loc[(facilities, slice(None), item)]
).any()

if recorded_at_other_facilities_of_same_level:
# If it recorded at other facilities of same level, find the average availability of the item at other
# facilities of the same level.
facilities = list(get_other_facilities_of_same_level(fac))
_monthly_records = interpolate_missing_with_mean(
full_set.loc[(facilities, slice(None), item)].groupby(level=1).mean()
)

else:
# If it is not recorded at other facilities of same level, then assume it is never available at the
# facility.
_monthly_records = _monthly_records.fillna(0.0)

# Insert values (including corrections) into the resulting dataset.
full_set_interpolated.loc[(fac, slice(None), item)] = _monthly_records.values

# Check that there are not missing values
assert not pd.isnull(full_set_interpolated).any().any()

# --- Check that the exported file has the properties required of it by the model code. --- #
check_format_of_consumables_file(df=full_set_interpolated.reset_index(), fac_ids=fac_ids)

# %%
# Save
full_set_interpolated.reset_index().to_csv(
path_for_new_resourcefiles / "ResourceFile_Consumables_availability_small.csv",
index=False
)

# 2.3.2. Browse missingness in the availability_change_prop variable
#------------------------------------------------------
pivot_table = pd.pivot_table(new_availability_df_imputed,