This repository contains the materials for Day 4 of the short course EPID731: Analysis Of Electronic Health Record (EHR) Data, offered at the University of Michigan. The focus of Day 4, taught by Lars Fritsche, is on using GPTs to harmonize medication data.
This short course offers an overview of modern analytical methods and research applications using EHR data, with a specific focus on epidemiologic inferences. For Day 4, participants will learn about using GPTs to harmonize medication data.
Contains configuration files for different models and temperature settings.
config_example_4models.ini
config_example_4models_cs.ini
config_example_high_temperature.ini
config_example_low_temperature.ini
Contains input files used for the analysis.
medication_example_1med.txt
medication_example_1med_x.txt
medication_example_5meds.txt
medication_example_aha.txt
unique_drug_concept_names.txt
Contains system and user prompt files used for processing. These are consecutive steps to develop a clear and efficient system prompt for a GPT API.
system_prompt_1.txt
system_prompt_2.txt
system_prompt_3.txt
system_prompt_4.txt
system_prompt_5.txt
system_prompt_6.txt
system_prompt_7.txt
system_prompt_8.txt
system_prompt_9.txt
system_prompt_10.txt
user_prompt.txt
Contains Python scripts used for processing and analysis.
gpt_line_processor.py
: Processes each line of input using the GPT model.gpt_process_batches.py
: Processes input data in batches using the GPT model.
In this workshop, participants will learn how to:
- Set up the environment for using the OpenAI API.
- Develop a powerful prompt to classify medications.
- Explore various parameters of the API that influence the model's performance.
Here is the revised "Getting Started" section that includes a note about needing an API key to access the OpenAI API:
- Python 3.x
- An OpenAI API key
- Required Python packages:
openai
pandas
configparser
tiktoken
csv
re
asyncio
Clone the repository to your local machine:
git clone https://github.com/statgen/EPID731_2024.git
Navigate to the repository directory:
cd EPID731_2024
Install the required packages:
pip install openai pandas configparser
pip install tiktoken --only-binary :all:
To use the OpenAI API, you need to have an API key. You can get your API key by signing up on the OpenAI website.
Once you have your API key, set it as an environment variable:
export OPENAI_API_KEY='your-api-key-here'
To run the batch processing script, use the following example:
# Import the external script containing batch processing functions
exec(open("Day4/scripts/gpt_process_batches.py").read())
# Define the asynchronous function to handle batch processing
async def run():
await process_batches(
config_file='Day4/configs/config_example_low_temperature.ini',
system_prompt_file='Day4/prompts/system_prompt_9.txt',
user_prompt_file='Day4/prompts/user_prompt.txt',
input_file='Day4/inputs/medication_example_aha.txt',
output_location='GPT_Outputs',
file_prefix='Example4_prompt9_aha_meds',
chunk_size=100 # Adjust based on API rate limits and performance needs
)
# Execute the asynchronous batch processing
await run()
Alternatively, you can run the gpt_line_processor.py
script with the necessary parameters set within the script:
- Open
scripts/gpt_line_processor.py
. - Set the parameters for
process_batches
function. - Run the script:
python scripts/gpt_line_processor.py