Skip to content

Latest commit

 

History

History
136 lines (91 loc) · 5.32 KB

README.md

File metadata and controls

136 lines (91 loc) · 5.32 KB

APAC - OHDSI - OMOP - PASAR DATA

OMOP

Analysis

https://ohdsiorg.sharepoint.com/:x:/s/OHDSIAPAC-2024APACETLProject-Dataanalysis/EZWK6NCPUFtJpEOU54YmemcBp-B4JIk1Er_T7cDZApXVJQ?e=N32f6C&wdOrigin=TEAMS-MAGLEV.teams_ns.rwc&wdExp=TEAMS-TREATMENT&wdhostclicktime=1722821798159&web=1

Vocabulary Mapping

ETL

Pre-requisites

  • bash
  • Python >= v3.10

For Windows users please adapt the following steps accordingly! Recommend to install linux on windows: https://learn.microsoft.com/en-us/windows/wsl/install

Setup Python environment

  • Navigate to under etl folder

  • Run the following commands:

    1. Create and activate virtual environment
      • python3 -m venv pypasarenv
      • source pypasarenv/bin/activate
    2. Install python packages pip install -r requirements.txt
    3. Clone .env file from example cp .env.example .env
    4. Update .env file with credentials

OMOP Database Setup

Postgres sql scripts are now stored at this point in time etl/pypasar/db/sql and will likely to change based on usage later

Setup Postgres via Docker

  • Ensure docker is installed
  • Export envs source .env / Copy the value POSTGRES_PORT from .env and replace in the next line
  • Run Postgres as docker container docker run -v pg-pasar-data:/var/lib/postgresql/data --env-file .env -d --name pasar-postgres -p ${POSTGRES_PORT}:5432 postgres:16-alpine

Existing R Setup

If you have an existing R Setup and familiar with OHDSI Packages then setup the OMOP using https://github.com/OHDSI/CommonDataModel/blob/main/README.md

Using Python

  • Ensure

    1. Environment variables are setup accordingly in .env
    2. Current working directory is under etl folder
  • Create omop schema and tables

    1. Run python . db create_omop_schema.
    2. Schema defined as POSTGRES_OMOP_SCHEMA in .env will be created and OMOP tables populated.
    3. Verify through PGAdmin / psql client
  • Drop omop schema and tables

    1. python . db drop_omop_schema.
    2. Schema defined as POSTGRES_OMOP_SCHEMA in .env will be dropped

Execute ETL

  • To begin contributing transformation to the various OMOP tables, go to etl/pypasar/omop and choose the appropriate python file
  • SQL can be used as well in the python class
  • Example is available for cdm_source table at etl/pypasar/omop/cdm_source.py

ETL Development & Testing

  • Please feel free to implement in whichever way you choose. The only mandatory requirement is that the execute must be the entrypoint to the respective omop class. Because execute method will be called for each class from __main__.py file

  • Current working directory is under etl folder

  • Run python . etl <omop_table_name>.

    • Example python . etl cdm_source
    • Multiple tables for cdm_source and concept python . etl cdm_source,concept. NO SPACES BETWEEN COMMA SEPARTED OMOP Tables

Cleanup

Remove Python environment

  • Run deactivate
  • Under etl, Run rm -rf pypasarenv

Remove Docker Container & Volume

  • Remove container, Run docker rm -f pasar-postgres
  • Remove volume (CAUTION - ALL DATA WILL BE LOST!!), Run docker volume rm pg-pasar-data

GCP Development

Setup VM

In the below snippet at Step 3

  1. Replace <IP> Based on group in the below snippet
  2. Replace <username> (filename of the private key / mentioned in ETL Development sheet)
  3. Copy the snippet to ~/.ssh/config and save the file
Host pypasar
    HostName <IP>
    User <username>
    IdentityFile ~/.ssh/<username>
    ControlMaster     auto
    ControlPath       ~/.ssh/control-%C
    ControlPersist    yes
  1. Copy public and private key files to ~/.ssh folder

  2. Test on terminal ssh pypasar. Should be able to login to home folder, run pwd.

  3. For remote development https://code.visualstudio.com/docs/remote/ssh

Setup Git repo in the local folder

  1. Once you are in your home folder, refer to this document https://ohdsiorg.sharepoint.com/:p:/s/OHDSIAPAC/ESUGOh6Lza9FvxH1TyaoO7oBlMv_9Iq57tLQ-41V2HUFtA?e=QdjNOP and fork this repo to your org account.

  2. Once the repo clone is done, navigate to the <username>-pasar repo

  3. Enter the following configuration (No GLOBAL!!)

git config user.name "<username>"
git config user.email <email>
  1. Follow the document in step 1 to create a new branch and push to repo.

Setup Postgres GUI on Vscode GCP VM

  1. How to browse and Install VS Code extensions: https://code.visualstudio.com/docs/editor/extension-marketplace#_browse-for-extensions
  2. Recommended Postgres GUI extension: https://marketplace.visualstudio.com/items?itemName=ckolkman.vscode-postgres