Skip to content

filter on the time dimension with a large dataset #13

@miniufo

Description

@miniufo

I am interested in the lntime accessor that could filter along the time dimension. So a long-standing problem (at least for me) is how to do filtering along time dimension in a large dataset while the time dimension is chunked?

For example, I have the daily AVISO sea-surface height dataset over several years, which is chunked along time (2D lat-lon data in a single file per day):

<xarray.Dataset> Size: 364GB
Dimensions:         (time: 1096, lat: 1440, nv: 2, lon: 2880)
Coordinates:
  * time            (time) datetime64[ns] 9kB 2015-01-01 ... 2017-12-31
  * lat             (lat) float32 6kB -89.94 -89.81 -89.69 ... 89.69 89.81 89.94
  * lon             (lon) float32 12kB -179.9 -179.8 -179.7 ... 179.8 179.9
  * nv              (nv) int32 8B 0 1
Data variables: (12/14)
    crs             (time) int32 4kB -2147483647 -2147483647 ... -2147483647
    lat_bnds        (time, lat, nv) float32 13MB dask.array<chunksize=(1, 1440, 2), meta=np.ndarray>
    lon_bnds        (time, lon, nv) float32 25MB dask.array<chunksize=(1, 2880, 2), meta=np.ndarray>
    sla             (time, lat, lon) float64 36GB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
    err_sla         (time, lat, lon) float64 36GB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
    ugosa           (time, lat, lon) float64 36GB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
    ...              ...
    err_vgosa       (time, lat, lon) float64 36GB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
    adt             (time, lat, lon) float64 36GB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
    ugos            (time, lat, lon) float64 36GB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
    vgos            (time, lat, lon) float64 36GB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
    flag_ice        (time, lat, lon) float64 36GB dask.array<chunksize=(1, 1440, 2880), meta=np.ndarray>
    tpa_correction  (time) float64 9kB dask.array<chunksize=(1,), meta=np.ndarray>
Attributes: (12/42)
    Conventions:                     CF-1.6
    Metadata_Conventions:            Unidata Dataset Discovery v1.0
    cdm_data_type:                   Grid
    comment:                         Sea Surface Height measured by Altimetry...
    contact:                         servicedesk.cmems@mercator-ocean.eu
    creator_email:                   servicedesk.cmems@mercator-ocean.eu
    ...                              ...
    geospatial_vertical_units:       m
    time_coverage_duration:          P1D
    time_coverage_resolution:        P1D
    time_coverage_end:               2015-01-01T12:00:00Z
    time_coverage_start:             2014-12-31T12:00:00Z
    platform:                        Cryosat-2, OSTM/Jason-2, Haiyang-2A, Altika

then how to do filtering along time (for example to extract the signal of a certain bandwidth) on a machine that cannot load the whole dataset into memory?

Just want to know if lenapy has some better way to do this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions