Skip to content

Commit

Permalink
293 make it possible to read evaluation directly from s3 (#337)
Browse files Browse the repository at this point in the history
* wiip

* adds S3Path class and a couple helper funcs.

* update for s3path and make the sql method table registration optional.

* remove validate and insert func, add some additional validation, remove unused code,

* update tables for s3 reading, refactor to have intermediate classes, add validation, load and check_load methods.

* update models, remove non-domain pydantic models, add pandas ans pyspark pandera models

* update tests

* update user guide examples.

* cleanup playground

* update init

* set location_attributes and location_crosswalks validation to `strict="filter"`

* delete one big table module...

* make add_attrs optional

* add validation to write parquet

* update example

* update filter formatting

* minor import refactor

* update read from s3 test

* update examples

* make read-only when path is s3 also clean up commented code

* fix broken test

* add name to class

* make clone from s3 use new read/write table methods

* set table name in table class

* add list s3 evaluations

* clean up

* update to v0.4.4

* delete commented code
  • Loading branch information
mgdenno authored Dec 2, 2024
1 parent 9d71ee0 commit 3484847
Show file tree
Hide file tree
Showing 75 changed files with 4,246 additions and 3,540 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ python -m teehr.utils.install_spark_jars
```
Use Docker
```bash
$ docker build -t teehr:v0.4.3 .
$ docker run -it --rm --volume $HOME:$HOME -p 8888:8888 teehr:v0.4.3 jupyter lab --ip 0.0.0.0 $HOME
$ docker build -t teehr:v0.4.4 .
$ docker run -it --rm --volume $HOME:$HOME -p 8888:8888 teehr:v0.4.4 jupyter lab --ip 0.0.0.0 $HOME
```

## Examples
Expand Down
15 changes: 15 additions & 0 deletions docs/sphinx/changelog/index.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,21 @@
Release Notes
=============

0.4.4 - 2024-12-02
--------------------

Added
^^^^^
* Added ability to read an Evaluation dataset directly from an S3 bucket.
* When path to an Evaluation dataset is an S3 bucket, the Evaluation is read-only.

Changed
^^^^^^^
* Pretty significant refactor of the Table classes to make them more flexible and easier to use.
* Added more robust Pandera validation to the Table classes.
* Updated docs to reflect changes and added `read_from_s3` example.


0.4.3 - 2024-10-19
--------------------

Expand Down
4 changes: 2 additions & 2 deletions docs/sphinx/getting_started/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ Or, if you do not want to install TEEHR in your own virtual environment, you can

.. code-block:: bash
docker build -t teehr:v0.4.3 .
docker run -it --rm --volume $HOME:$HOME -p 8888:8888 teehr:v0.4.3 jupyter lab --ip 0.0.0.0 $HOME
docker build -t teehr:v0.4.4 .
docker run -it --rm --volume $HOME:$HOME -p 8888:8888 teehr:v0.4.4 jupyter lab --ip 0.0.0.0 $HOME
Project Objectives
------------------
Expand Down
2 changes: 2 additions & 0 deletions docs/sphinx/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Before starting, make sure you have installed TEEHR and its dependencies as desc

:doc:`Grouping and Filtering </user_guide/notebooks/06_grouping_and_filtering>` :download:`(download notebook) </user_guide/notebooks/06_grouping_and_filtering.ipynb>`

:doc:`Grouping and Filtering </user_guide/notebooks/07_read_from_s3>` :download:`(download notebook) </user_guide/notebooks/07_read_from_s3>.ipynb>`

.. toctree::
:maxdepth: 2
Expand All @@ -36,6 +37,7 @@ Before starting, make sure you have installed TEEHR and its dependencies as desc
Introduction to the Evaluation Class </user_guide/notebooks/03_introduction_class>
Setting-up a Simple Example </user_guide/notebooks/04_setup_simple_example>
Clone an Evaluation from S3 </user_guide/notebooks/05_clone_from_s3>
Read Evaluation from S3 </user_guide/notebooks/07_read_from_s3>
Joining Timeseries </user_guide/tutorials/joining_timeseries>
Grouping and Filtering </user_guide/notebooks/06_grouping_and_filtering>
Metrics </user_guide/metrics/metrics>
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.10.15"
}
},
"nbformat": 4,
Expand Down
4 changes: 2 additions & 2 deletions docs/sphinx/user_guide/notebooks/05_clone_from_s3.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
"outputs": [],
"source": [
"# Define the directory where the Evaluation will be created\n",
"test_eval_dir = Path(Path().home(), \"temp\", \"04_setup_real_example\")\n",
"test_eval_dir = Path(Path().home(), \"temp\", \"05_clone_from_s3\")\n",
"shutil.rmtree(test_eval_dir, ignore_errors=True)\n",
"\n",
"# Create an Evaluation object and create the directory\n",
Expand Down Expand Up @@ -396,7 +396,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.10.15"
}
},
"nbformat": 4,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.10.15"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 3484847

Please sign in to comment.