From 4d075e9cb07030470d960d885c9523d771e0cabc Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Thu, 30 May 2024 09:46:17 +0200 Subject: [PATCH] User-guide - pandas : Add alternative to xarray.Dataset.from_dataframe (#9020) * Update pandas.rst * Update pandas.rst * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update pandas.rst * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update ecosystem.rst * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser * review comments * Update doc.yml * Update doc.yml * Update doc.yml * Update doc.yml * Update doc.yml * Update doc.yml * remove code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser * Update ci/requirements/doc.yml Co-authored-by: Mathias Hauser * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Mathias Hauser --- doc/ecosystem.rst | 1 + doc/user-guide/pandas.rst | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/doc/ecosystem.rst b/doc/ecosystem.rst index 076874d82f3..63f60cd0090 100644 --- a/doc/ecosystem.rst +++ b/doc/ecosystem.rst @@ -74,6 +74,7 @@ Extend xarray capabilities - `Collocate `_: Collocate xarray trajectories in arbitrary physical dimensions - `eofs `_: EOF analysis in Python. - `hypothesis-gufunc `_: Extension to hypothesis. Makes it easy to write unit tests with xarray objects as input. +- `ntv-pandas `_ : A tabular analyzer and a semantic, compact and reversible converter for multidimensional and tabular data - `nxarray `_: NeXus input/output capability for xarray. - `xarray-compare `_: xarray extension for data comparison. - `xarray-dataclasses `_: xarray extension for typed DataArray and Dataset creation. diff --git a/doc/user-guide/pandas.rst b/doc/user-guide/pandas.rst index 76349fcd371..26fa7ea5c0c 100644 --- a/doc/user-guide/pandas.rst +++ b/doc/user-guide/pandas.rst @@ -110,6 +110,26 @@ work even if not the hierarchical index is not a full tensor product: s[::2] s[::2].to_xarray() +Lossless and reversible conversion +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The previous ``Dataset`` example shows that the conversion is not reversible (lossy roundtrip) and +that the size of the ``Dataset`` increases. + +Particularly after a roundtrip, the following deviations are noted: + +- a non-dimension Dataset ``coordinate`` is converted into ``variable`` +- a non-dimension DataArray ``coordinate`` is not converted +- ``dtype`` is not allways the same (e.g. "str" is converted to "object") +- ``attrs`` metadata is not conserved + +To avoid these problems, the third-party `ntv-pandas `__ library offers lossless and reversible conversions between +``Dataset``/ ``DataArray`` and pandas ``DataFrame`` objects. + +This solution is particularly interesting for converting any ``DataFrame`` into a ``Dataset`` (the converter find the multidimensional structure hidden by the tabular structure). + +The `ntv-pandas examples `__ show how to improve the conversion for the previous ``Dataset`` example and for more complex examples. + Multi-dimensional data ~~~~~~~~~~~~~~~~~~~~~~