Data extraction and data mining python codes for dataset of 100 GB and larger This pipeline was used in extracting electronic health records in CSV It is efficient in CPU environment.
Follow the numeric order in the python scripts to conduct data extraction.
Related papers published using this pipeline includes:
- Association between risk of Alzheimer’s disease and related dementias and angiotensin receptor Ⅱ blockers treatment for individuals with hypertension in high-volume claims data Lundin, Sori Kim et al. eBioMedicine, Volume 109, 105378