Analysis of public comments received on proposed rule on Supporting the Head Start Workforce and Consistent Quality Programming
This purpose of open-sourcing this repository is to be transparent about how AI was used to assist in efficiently analyzing public comments and to provide a starting point for others who would like to explore using commercial Large Language Models to aide in the public comment analysis process.
How to Run: Outlines how to replicate the project and run the files in this repo.
Technical Documentation: Full technical documentation for this project including technical considerations for future project iterations and the rationale behind some of our choices.
Cloud Architecture: A detailed outline of how we structured our cloud infrastructure.
Lessons Learned: A collection of lessons learned from the Policy Team and the Data Surge Team.
inputs/: Should hold pickle file and file used for bill tagging
json_outputs/: Holds one output for each chunk of text that's sent to chatGPT with a prompt.
logs/: Log files will be created when you run data_processing.py and gpt_parallel.py. Logs are timestamped and indicate if there were any issues with particular comments when sending to chatGPT, and the time it takes to run both scripts.
outputs/: Holds an "intermediate" and "final" folder. "Intermediate" folder holds the chunked pickle file created in part of the pipeline. "Final" holds the final csvs exported in long and wide formats as well as the failed_jsons_files.csv and the summaries documents