Skip to content

Commit

Permalink
Update to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
abuendia committed Jun 17, 2023
1 parent ad2a61e commit b01e86d
Showing 1 changed file with 0 additions and 55 deletions.
55 changes: 0 additions & 55 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@

## Introductory Tutorial

<<<<<<< HEAD
`omop-learn` allows [OMOP-standard (CDM v5 and v6)](https://github.com/OHDSI/CommonDataModel/wiki) medical data like claims and EHR information to be processed efficiently for predictive tasks. The library allows users to precisely define cohorts of interest, patient-level time series features, and target variables of interest. Relevant data is automatically extracted and surfaced in formats suitable for most machine learning algorithms, and the (often extreme) sparsity of patient-level data is fully taken into account to provide maximum performance.

The library provides several benefits for modeling, both in terms of ease of use and performance:
Expand All @@ -14,19 +13,6 @@ The library provides several benefits for modeling, both in terms of ease of use
`omop-learn` serves as a modern python alternative to the [PatientLevelPrediction R library](https://github.com/OHDSI/PatientLevelPrediction). We allow seamless integration of many Python-based machine learning and data science libraries by supporting generic `sklearn`-style classifiers. Our new data storage paradigm also allows for more on-the-fly feature engineering as compared to previous libraries.

In this tutorial, we walk through the process of using `omop-learn` for an end-of-life prediction task for Medicare patients with clear applications to improving palliative care. The code used can also be found in the [example notebook](https://github.com/clinicalml/omop-learn/blob/master/examples/eol/sard_eol.ipynb), and can be run on your own data as you explore `omop-learn`. The control flow diagram below also links to relevant sections of the library documentation.
=======
omop-learn allows [OMOP-standard (CDM v5 and v6)](https://github.com/OHDSI/CommonDataModel/wiki) medical data like Claims and EHR information to be processed efficiently for predictive tasks. The library allows users to precisely define cohorts of interest, patient-level time series features, and target variables of interest. Relevant data is automatically extracted and surfaced in formats suitable for most machine learning algorithms, and the (often extreme) sparsity of patient-level data is fully taken into account to provide maximum performance.


The library provides several benefits for modelling, both in terms of ease of use and performance:
* All that needs to be specified are cohort and outcome definitions, which can often be done using simple SQL queries.
* Our fast data ingestion and transformation pipelines allow for easy and efficient tuning of algorithms -- we have seen significant improvements in out-of-sample performance of predictors after hyperparameter tuning that would take days with simple SQL queries but minutes with Prediction Library
* We modularize the data extraction and modelling processes, allowing users to use new models as they become available with very little modification to the code. Tools ranging from simple regressions to deep neural net models can easily be substituted in and out in a plug-and-play manner.

omop-learn serves as a modern python alternative to the [PatientLevelPrediction R library](https://github.com/OHDSI/PatientLevelPrediction). We allow seamless integration of many Python based machine learning and data science libraries by supporting generic sklearn-stye classifiers. Our new data storage paradigm also allows for more on-the-fly feature engineering as compared to previous libraries.

In this tutorial, we walk through the process of using omop-learn for an end-of-life prediction task for Medicare patients with clear applications to improving palliative care. The code used can also be found in the [example notebook](https://github.com/clinicalml/omop-learn/blob/master/PL2%20Test%20Driver.ipynb), and can be run on your own data as you explore omop-learn. The control flow diagram below also links to relevant sections of the library documentation.
>>>>>>> master
<center>
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;" data-mxgraph="{&quot;highlight&quot;:&quot;#006633&quot;,&quot;lightbox&quot;:false,&quot;nav&quot;:false,&quot;resize&quot;:true,&quot;xml&quot;:&quot;&lt;mxfile host=\&quot;www.draw.io\&quot; modified=\&quot;2020-01-27T20:09:03.888Z\&quot; agent=\&quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36\&quot; etag=\&quot;8otv9sNdF-oivO5T6t3e\&quot; version=\&quot;12.5.8\&quot; type=\&quot;device\&quot;&gt;&lt;diagram id=\&quot;C5RBs43oDa-KdzZeNtuy\&quot; name=\&quot;Page-1\&quot;&gt;7Zpbc5s4FIB/jR/DgABfHuM47nY33XinbdI+dWRQQImMqJAbe3/9HoG44xindnrZeDItOhIS0vnOhWMP7IvV5o3AcfiO+4QNkOlvBvZsgJDlIDRQf6a/zSQjx80EgaC+HlQK3tN/iRaaWrqmPklqAyXnTNK4LvR4FBFP1mRYCP5YH3bHWX3VGAekJXjvYdaW3lJfhpl0jEal/A9CgzBf2RpOsp4VzgfrnSQh9vljRWRfDuwLwbnMrlabC8LU4eXncvt2e8uuHoZv/vwn+Yo/Tv/68PfNWTbZ/JBbii0IEslnT42+boPxTXI7f7uerS7DL7d8yM9y7X7DbK0PTG9WbvMThGlAWdCYPoZUkvcx9lTPI/ACslCuGLQsuMRJnGnwjm4IrDoNGE4S3enxFfX0dSIFfyAXnHGRLmGbpumY46InV5Sr5hDYp7Dvyui79AN9dzySc7yiTLF5Q4SPI6zFGkTLgbbeIhGSbBpQ7DlRq1Az2AfhKyLFFu7Ts7hDbQraNIY5KY8laJarZWEVslyINdxBMXepQLjQOkybHxMirpf36oCRyfCSsFxdQwbTTpdwEagLy4ArvFK6iZZJnO4/GwIzFqNm5I5GBKa64CEXak4c+fDv9VqCpkj2eIxGD9kqoZTKXs/V06G5Bx0UTGzFjIDKcL00KAfxQhCfepLy6IouBRbbdDPI9tO1vnh6JbSPd7ty6hriCo6CryNf4ZWitAfJJgpVIuusqd7Cws0GsDtAO4SiDgB3gmUNzRpYyOkAy+kAa3Q4V9CsoHWA6zBbnuP63fUCJDMsccuJqM0rZK4UuQueUIUJdEmuNIYZDVTLg4MkYOXTFPAp9h6CVN1dxp/PeK7vXXIp+arThzzhb6oa93ESpmCZWU+snn21CVRgBMKTkUEhSiWGrza4EwrW2GGxpyaLFUyz7XLhE5E/ZcQjssf9ncqv2aiO36RNn+O04XNtw3YP5q8fbJbVDks+xHndBL8S8oBHmF2W0mndU1TtuRx/xRWAqfCeSLnVCsJryesqggMV208ajrTxWTUMN2/ONtXO2bbkqCSMRP65ymkUrox7D5loTlm+Shep8GlRbR8a/dRpPYMROHG+Fh55QjW2TuqwCIh8YtyomzlBGJb0W/3hThEXUa+4eBFynpDCi0GumsZIxtIFjh8U7wiWawEJci0s7j7oXycsnsAvjX+BsOjsT6jrqqqopcMDdXqqqvqsPm4FWnp8O8m2mxreEXt1QNrvdnf7sCo7ZENl6k6NievqduZRJ/ZIt0uXqhrbSmNBBAUVqrha+txPdYf8ueaPu73zqZ2j+ULOUd+64DSSFYOZ1A3GQg1DyJ5f31V9d2xMNHQalmc2Jso22JooNapiP88P/+6rUR1kVFbNoNynzemZic0RTWf3W2AP07FPYzojZJiobj3OxJiMxxPX1B/3ecZkN43JflljspwfnksXvr/q+a2emB7m4X/7/HvU006s7zWU73qBG7U8eJFUdxcM8vfvkGxwoF6kp3El5mtpkQag/TnvMYqRP7rg2EyA8/ftagKMOhLg8eEJcE9f0qN+nGuSrtJSfVUnzRpOVg3aVSsqKjw9qkPpYud5QVoZcKs6rZ9n1niN8yOU1nng9cwnwgBUQJqWfNA8rf+og6KJB6eDI8LXSXoIc+R0yOFvbMRR8Jvg1yhLWuNRCz9n3KYvlx2fvslrJPuNIpnVN+Wz3NOEst61JNsYKDfbLh8tBIkF9wiYejWuHalmFFdm31susv739SJ3VPdX9uhF60W9aXJ20VQUI7Pv5j4ITCP4P/+SfkFjwtJv8Z5iTIOVxbESuF2YgTdZqr3hRGVaMOgKpC4sThKZXswEWJgwaLyNln0wdF8xrGPomj9h2RK1v87bm8gdI0/rzgC7c7VaOlfN3grWVX52xjKYNfHpOJWzRWQtMDuLiHzk4uFMDe2WniHbuI+DFyRiiDry+C4iCuHRUyn0E33B9oulRc/JAU+eSrk9U6nKL5mOWT4bug2f13wD7V15bk5kHatYBs3yt1vZ8PIXcPblfw==&lt;/diagram&gt;&lt;/mxfile&gt;&quot;}"></div>
<script type="text/javascript" src="https://www.draw.io/js/viewer.min.js"></script>
Expand All @@ -38,7 +24,6 @@ We define our end-of-life task as follows:

> For each patient who is over the age of 70 at prediction time, and is enrolled in an insurance plan for which we have claims data available for 95% of the days of calendar year 2016, and is alive as of March 31, 2017: predict if the patient will die during the interval of time between April 1, 2017 and September 31, 2017 using data including the drugs prescribed, procedures performed, conditions diagnosed and the medical specialties of the clinicians who cared for the patient during 2016.
<<<<<<< HEAD
`omop-learn` splits the conversion of this natural language specification of a task to code into two natural steps. First, we define a **cohort** of patients, each of which has an outcome. Second, we generate **features** for each of these patients. These two steps are kept independent of each other, allowing different cohorts or feature sets to very quickly be tested and evaluated. We explain how cohorts and features are initialized through the example of the end-of-life problem.

#### 1.1 Data Backend Initialization
Expand All @@ -63,10 +48,6 @@ backend = BigQueryBackend(config)
```

#### 1.2 <a name="define_cohort"></a> Cohort Initialization
=======
omop-learn splits the conversion of this natural language specification of a task to code into two natural steps. First, we define a **cohort** of patients, each of which has an outcome. Second, we generate **features** for each of these patients -- these two steps are kept independent of each other in omop-learn, allowing different cohorts or feature sets to very quickly be tested and evaluated. We explain how cohorts and features are initialized through the example of the end-of-life problem.
#### 1.1 <a name="define_cohort"></a> Cohort Initialization
>>>>>>> master
OMOP's [`PERSON`](https://github.com/OHDSI/CommonDataModel/wiki/PERSON) table is the starting point for cohort creation, and is filtered via SQL query. Note that these SQL queries can be written with variable parameters which can be adjusted for different analyses. These parameters are implemented as [Python templates](https://www.python.org/dev/peps/pep-3101/). In this example, we leave dates as parameters to show how cohort creation can be flexible.

We first want to establish when patients were enrolled in insurance plans which we have access to. We do so using OMOP's `OBSERVATION_PERIOD` table. Our SQL logic finds the number of days within our data collection period (all of 2016, in this case) that a patient was enrolled in a particular plan:
Expand All @@ -88,11 +69,7 @@ death_training_elig_counts as (
from cdm.observation_period
)
```
<<<<<<< HEAD
Note that the dates are left as template strings that can be filled later on. Next, we want to filter for patients who are enrolled for 95% of the days in our data collection period. Note that we must be careful to include patients who used multiple different insurance plans over the course of the year by aggregating the intermediate table `death_training_elig_counts` which is specified above. Thus, we first aggregate and then collect the `person_id` field for patients with sufficient coverage over the data collection period:
=======
Note that the dates are left as template strings that can be filled later on. Next, we want to filter for patients who are enrolled for 95% of the days in our data collection period - note that we must be careful to include patients who used multiple different insurance plans over the course of the year by aggregating the intermediate table `death_training_elig_counts` which is specified above. Thus, we first aggregate and then collect the `person_id` field for patients with sufficient coverage over the data collection period:
>>>>>>> master
```sql
death_trainingwindow_elig_perc as (
select
Expand Down Expand Up @@ -153,18 +130,13 @@ Finally, we can create the cohort:
or d.death_datetime >= (date '{ training_end_date }' + interval '{ gap }')
)
```
<<<<<<< HEAD
The full cohort creation SQL query can be found [here](https://github.com/clinicalml/omop-learn/blob/master/examples/eol/postgres_sql/gen_EOL_cohort.sql).
=======
The full cohort creation SQL query can be found [here](https://github.com/clinicalml/omop-learn/blob/master/sql/Cohorts/gen_EOL_cohort.sql).
>>>>>>> master

Note the following key fields in the resulting table:

Field | Meaning
------------ | -------------
`example_id` | A unique identifier for each example in the dataset. While in the case of end-of-life each patient will occur as a positive example at most once, this is not the case for all possible prediction tasks, and thus this field offers more flexibility than using the patient ID alone.
<<<<<<< HEAD
`y` | A column indicating the outcome of interest. Currently, `omop-learn` supports binary outcomes.
`person_id` | A column indicating the ID of the patient.
`start_date` and `end_date` | Columns indicating the beginning and end of the time periods to be used for data collection for this patient. This will be used downstream for feature generation.
Expand All @@ -187,33 +159,6 @@ cohort = Cohort.from_sql_file(sql_file, backend, params=cohort_params)
```

#### <a name="define_features"></a> 1.3 Feature Initialization
=======
`y` | A column indicating the outcome of interest. Currently, omop-learn supports binary 0/1 outcomes
`person_id` | A column indicating the ID of the patient
`start_date` and `end_date` | Columns indicating the beginning and end of the time periods to be used for data collection for this patient. This will be used downstream for feature generation.

We are now ready to build a cohort. We use the `CohortGenerator` class to pass in a cohort name, a path to a SQL script, and relevant parameters in Python:
```sql
cohort_name = '__eol_cohort'
cohort_script_path = config.SQL_PATH_COHORTS + '/gen_EOL_cohort.sql'
params = {'schema_name' : schema_name,
'aux_data_schema' : config.CDM_AUX_SCHEMA,
'training_start_date' : '2016-01-01',
'training_end_date' : '2017-01-01',
'gap' : '3 months',
'outcome_window' : '6 months'
}

cohort = CohortGenerator.Cohort(
schema_name=schema_name,
cohort_table_name=cohort_name,
cohort_generation_script=cohort_script_path,
cohort_generation_kwargs=params
)
```
Note that this does *not* run the SQL queries -- the `CohortGenerator` object currently just stores *how* to set up the cohort in any system, allowing for more portability. Thus, our next step is to materialize the actual cohort to a table in a specific database `db` by calling `cohort.build(db)`.
#### <a name="define_features"></a> 1.2 Feature Initialization
>>>>>>> master
With a cohort now fully in place, we are ready to associate features with each patient in the cohort. These features will be used downstream to predict outcomes.

The OMOP Standardized Clinical Data tables offer several natural features for a patient, including histories of condition occurrence, procedures, etc. omop-learn includes SQL scripts to collect time-series of these common features automatically for any cohort, allowing a user to very quickly set up a feature set. To do so, we first initialize a `FeatureGenerator` object with a database indicating where feature data is to be found. Similar to the `CohortGenerator`, this does not actually create a feature set -- that is only done once all parameters are specified. We next select the pre-defined features of choice, and finally select a cohort for which data is to be collected:
Expand Down

0 comments on commit b01e86d

Please sign in to comment.