EPID731_2024 - Day 4

This repository contains the materials for Day 4 of the short course EPID731: Analysis Of Electronic Health Record (EHR) Data, offered at the University of Michigan. The focus of Day 4, taught by Lars Fritsche, is on using GPTs to harmonize medication data.

Description

This short course offers an overview of modern analytical methods and research applications using EHR data, with a specific focus on epidemiologic inferences. For Day 4, participants will learn about using GPTs to harmonize medication data.

Repository Structure

configs

Contains configuration files for different models and temperature settings.

config_example_4models.ini
config_example_4models_cs.ini
config_example_high_temperature.ini
config_example_low_temperature.ini

inputs

Contains input files used for the analysis.

medication_example_1med.txt
medication_example_1med_x.txt
medication_example_5meds.txt
medication_example_aha.txt
unique_drug_concept_names.txt

prompts

Contains system and user prompt files used for processing. These are consecutive steps to develop a clear and efficient system prompt for a GPT API.

system_prompt_1.txt
system_prompt_2.txt
system_prompt_3.txt
system_prompt_4.txt
system_prompt_5.txt
system_prompt_6.txt
system_prompt_7.txt
system_prompt_8.txt
system_prompt_9.txt
system_prompt_10.txt
user_prompt.txt

scripts

Contains Python scripts used for processing and analysis.

gpt_line_processor.py: Processes each line of input using the GPT model.
gpt_process_batches.py: Processes input data in batches using the GPT model.

Script Details

In this workshop, participants will learn how to:

Set up the environment for using the OpenAI API.
Develop a powerful prompt to classify medications.
Explore various parameters of the API that influence the model's performance.

Here is the revised "Getting Started" section that includes a note about needing an API key to access the OpenAI API:

Getting Started

Prerequisites

Python 3.x
An OpenAI API key
Required Python packages:

openai
pandas
configparser
tiktoken
csv
re
asyncio

Installation

Clone the repository to your local machine:

git clone https://github.com/statgen/EPID731_2024.git

Navigate to the repository directory:

cd EPID731_2024

Install the required packages:

pip install openai pandas configparser
pip install tiktoken --only-binary :all:

OpenAI API Key

To use the OpenAI API, you need to have an API key. You can get your API key by signing up on the OpenAI website.

Once you have your API key, set it as an environment variable:

export OPENAI_API_KEY='your-api-key-here'

Usage

To run the batch processing script, use the following example:

# Import the external script containing batch processing functions
exec(open("Day4/scripts/gpt_process_batches.py").read())

# Define the asynchronous function to handle batch processing
async def run():
    await process_batches(
        config_file='Day4/configs/config_example_low_temperature.ini',
        system_prompt_file='Day4/prompts/system_prompt_9.txt',
        user_prompt_file='Day4/prompts/user_prompt.txt',
        input_file='Day4/inputs/medication_example_aha.txt',
        output_location='GPT_Outputs',
        file_prefix='Example4_prompt9_aha_meds',
        chunk_size=100  # Adjust based on API rate limits and performance needs
    )

# Execute the asynchronous batch processing
await run()

Alternatively, you can run the gpt_line_processor.py script with the necessary parameters set within the script:

Open scripts/gpt_line_processor.py.
Set the parameters for process_batches function.
Run the script:

python scripts/gpt_line_processor.py

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Day4		Day4
EPID731_Medication_Classification_with_OpenAI_GPT_API.ipynb		EPID731_Medication_Classification_with_OpenAI_GPT_API.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EPID731_2024 - Day 4

Description

Repository Structure

configs

inputs

prompts

scripts

Script Details

Getting Started

Prerequisites

Installation

OpenAI API Key

Usage

About

Releases

Packages

Languages

statgen/EPID731_2024

Folders and files

Latest commit

History

Repository files navigation

EPID731_2024 - Day 4

Description

Repository Structure

configs

inputs

prompts

scripts

Script Details

Getting Started

Prerequisites

Installation

OpenAI API Key

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages