Skip to content

Commit 0c8f862

Browse files
authored
Add rescale and standardize normalization utilities (#1028)
* Add rescale and standardize normalization utilities (#1027) New module xrspatial/normalize.py with two functions: - rescale: min-max normalization to a target range - standardize: z-score normalization (mean 0, std 1) Both support all four backends via ArrayTypeFunctionMapping. * Add tests for rescale and standardize (#1027) 29 tests covering NumPy, Dask, and CuPy backends. Tests include known-value checks, NaN/inf passthrough, constant rasters, single cells, coordinate preservation, and cross-backend agreement. * Add rescale/standardize to API reference docs (#1027) * Add user guide notebook for rescale and standardize (#1027) * Add rescale/standardize to README feature matrix (#1027) * Rename normalization notebook to 34 (#1027)
1 parent c6c4f97 commit 0c8f862

File tree

7 files changed

+770
-0
lines changed

7 files changed

+770
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -355,6 +355,8 @@ In the GIS world, rasters are used for representing continuous phenomena (e.g. e
355355
| Name | Description | Source | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
356356
|:----------:|:------------|:------:|:----------------------:|:--------------------:|:-------------------:|:------:|
357357
| [Preview](xrspatial/preview.py) | Downsamples a raster to target pixel dimensions for visualization | Custom | ✅️ | ✅️ | ✅️ | 🔄 |
358+
| [Rescale](xrspatial/normalize.py) | Min-max normalization to a target range (default [0, 1]) | Standard | ✅️ | ✅️ | ✅️ | ✅️ |
359+
| [Standardize](xrspatial/normalize.py) | Z-score normalization (subtract mean, divide by std) | Standard | ✅️ | ✅️ | ✅️ | ✅️ |
358360

359361
#### Usage
360362

docs/source/reference/utilities.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,14 @@ Preview
4646

4747
xrspatial.preview.preview
4848

49+
Normalization
50+
=============
51+
.. autosummary::
52+
:toctree: _autosummary
53+
54+
xrspatial.normalize.rescale
55+
xrspatial.normalize.standardize
56+
4957
Diagnostics
5058
===========
5159
.. autosummary::
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "x2762y1yuan",
6+
"source": "# Normalization: rescale and standardize\n\nTwo common preprocessing steps before combining rasters or feeding them into models:\n\n- **rescale** maps values to a target range (default [0, 1]) using min-max normalization.\n- **standardize** centers values at zero with unit variance (z-score normalization).\n\nBoth functions handle NaN and infinite values (they pass through unchanged) and work on all four xarray-spatial backends: NumPy, CuPy, Dask+NumPy, and Dask+CuPy.",
7+
"metadata": {}
8+
},
9+
{
10+
"cell_type": "code",
11+
"id": "wzy5yzkbde",
12+
"source": "%matplotlib inline\nimport numpy as np\nimport xarray as xr\nimport matplotlib.pyplot as plt\n\nfrom xrspatial.normalize import rescale, standardize\nfrom xrspatial import generate_terrain",
13+
"metadata": {},
14+
"execution_count": null,
15+
"outputs": []
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"id": "5zey0oy2cji",
20+
"source": "## Synthetic terrain\n\nGenerate a 500x500 elevation raster with values roughly in the 0-1200 range. We'll sprinkle in a few NaN cells to show how they're preserved.",
21+
"metadata": {}
22+
},
23+
{
24+
"cell_type": "code",
25+
"id": "1n68h492hdu",
26+
"source": "terrain = generate_terrain(canvas=xr.DataArray(np.zeros((500, 500)), dims=['y', 'x']))\n\n# Add some NaN holes\nrng = np.random.default_rng(42)\nmask = rng.random(terrain.shape) < 0.005\nterrain.values[mask] = np.nan\n\nprint(f\"Shape: {terrain.shape}\")\nprint(f\"Range: {float(np.nanmin(terrain)):.1f} to {float(np.nanmax(terrain)):.1f}\")\nprint(f\"NaN cells: {int(np.isnan(terrain.values).sum())}\")\n\nfig, ax = plt.subplots(figsize=(7, 6))\nterrain.plot.imshow(ax=ax, cmap='terrain', add_colorbar=True,\n cbar_kwargs={'label': 'Elevation'})\nax.set_title('Raw elevation')\nax.set_axis_off()\nplt.tight_layout()",
27+
"metadata": {},
28+
"execution_count": null,
29+
"outputs": []
30+
},
31+
{
32+
"cell_type": "markdown",
33+
"id": "h7fv68zd0dq",
34+
"source": "## rescale: min-max normalization\n\nBy default, `rescale()` maps finite values to [0, 1]. You can supply a custom range with `new_min` and `new_max`.",
35+
"metadata": {}
36+
},
37+
{
38+
"cell_type": "code",
39+
"id": "z9j191ufbs",
40+
"source": "scaled_01 = rescale(terrain)\nscaled_byte = rescale(terrain, new_min=0, new_max=255)\n\nfig, axes = plt.subplots(1, 3, figsize=(18, 5))\n\nterrain.plot.imshow(ax=axes[0], cmap='terrain', add_colorbar=True)\naxes[0].set_title('Original')\naxes[0].set_axis_off()\n\nscaled_01.plot.imshow(ax=axes[1], cmap='terrain', add_colorbar=True)\naxes[1].set_title('rescale() -> [0, 1]')\naxes[1].set_axis_off()\n\nscaled_byte.plot.imshow(ax=axes[2], cmap='terrain', add_colorbar=True)\naxes[2].set_title('rescale(0, 255)')\naxes[2].set_axis_off()\n\nplt.tight_layout()\n\nprint(f\"[0,1] range: {float(np.nanmin(scaled_01)):.4f} to {float(np.nanmax(scaled_01)):.4f}\")\nprint(f\"[0,255] range: {float(np.nanmin(scaled_byte)):.1f} to {float(np.nanmax(scaled_byte)):.1f}\")\nprint(f\"NaN preserved: {int(np.isnan(scaled_01.values).sum())} cells\")",
41+
"metadata": {},
42+
"execution_count": null,
43+
"outputs": []
44+
},
45+
{
46+
"cell_type": "markdown",
47+
"id": "u8clrxl4r2i",
48+
"source": "## standardize: z-score normalization\n\n`standardize()` subtracts the mean and divides by the standard deviation of finite values. The result has mean ~0 and std ~1. Use `ddof=1` for sample standard deviation.",
49+
"metadata": {}
50+
},
51+
{
52+
"cell_type": "code",
53+
"id": "e64naydwlya",
54+
"source": "zscored = standardize(terrain)\n\nfig, axes = plt.subplots(1, 2, figsize=(14, 5))\n\nterrain.plot.imshow(ax=axes[0], cmap='terrain', add_colorbar=True)\naxes[0].set_title('Original')\naxes[0].set_axis_off()\n\nzscored.plot.imshow(ax=axes[1], cmap='RdBu_r', add_colorbar=True,\n cbar_kwargs={'label': 'Z-score'})\naxes[1].set_title('standardize()')\naxes[1].set_axis_off()\n\nplt.tight_layout()\n\nfinite = zscored.values[np.isfinite(zscored.values)]\nprint(f\"Mean: {finite.mean():.2e}\")\nprint(f\"Std: {finite.std():.6f}\")\nprint(f\"Range: {finite.min():.3f} to {finite.max():.3f}\")",
55+
"metadata": {},
56+
"execution_count": null,
57+
"outputs": []
58+
},
59+
{
60+
"cell_type": "markdown",
61+
"id": "0ql3g8hvphj",
62+
"source": "## Practical use case: combining layers with different scales\n\nWhen combining elevation and slope into a composite index, the raw values live on different scales. Rescaling both to [0, 1] puts them on equal footing.",
63+
"metadata": {}
64+
},
65+
{
66+
"cell_type": "code",
67+
"id": "9hhldlalk7f",
68+
"source": "from xrspatial import slope\n\nslp = slope(terrain)\n\n# Raw values are on very different scales\nprint(f\"Elevation range: {float(np.nanmin(terrain)):.0f} to {float(np.nanmax(terrain)):.0f}\")\nprint(f\"Slope range: {float(np.nanmin(slp)):.1f} to {float(np.nanmax(slp)):.1f}\")\n\n# Rescale both to [0, 1] and combine\nelev_norm = rescale(terrain)\nslope_norm = rescale(slp)\n\n# Simple composite: high elevation + steep slope = high risk\ncomposite = 0.6 * elev_norm + 0.4 * slope_norm\n\nfig, axes = plt.subplots(1, 3, figsize=(18, 5))\n\nelev_norm.plot.imshow(ax=axes[0], cmap='terrain', add_colorbar=True)\naxes[0].set_title('Elevation [0, 1]')\naxes[0].set_axis_off()\n\nslope_norm.plot.imshow(ax=axes[1], cmap='YlOrRd', add_colorbar=True)\naxes[1].set_title('Slope [0, 1]')\naxes[1].set_axis_off()\n\ncomposite.plot.imshow(ax=axes[2], cmap='inferno', add_colorbar=True)\naxes[2].set_title('Weighted composite (0.6 elev + 0.4 slope)')\naxes[2].set_axis_off()\n\nplt.tight_layout()",
69+
"metadata": {},
70+
"execution_count": null,
71+
"outputs": []
72+
},
73+
{
74+
"cell_type": "markdown",
75+
"id": "c6w0w124c4c",
76+
"source": "## Accessor syntax\n\nBoth functions are available through the `.xrs` accessor on DataArrays.\n\n```python\nimport xrspatial\n\nterrain.xrs.rescale()\nterrain.xrs.standardize(ddof=1)\n```",
77+
"metadata": {}
78+
},
79+
{
80+
"cell_type": "code",
81+
"id": "gcc8pi1a7u",
82+
"source": "import xrspatial # registers .xrs accessor\n\naccessor_result = terrain.xrs.rescale()\nnp.testing.assert_array_equal(accessor_result.values, scaled_01.values)\nprint(\"Accessor output matches function output.\")",
83+
"metadata": {},
84+
"execution_count": null,
85+
"outputs": []
86+
}
87+
],
88+
"metadata": {
89+
"kernelspec": {
90+
"display_name": "Python 3",
91+
"language": "python",
92+
"name": "python3"
93+
},
94+
"language_info": {
95+
"name": "python",
96+
"version": "3.11.0"
97+
}
98+
},
99+
"nbformat": 4,
100+
"nbformat_minor": 5
101+
}

xrspatial/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,8 @@
7070
from xrspatial.multispectral import sipi # noqa
7171
from xrspatial.pathfinding import a_star_search # noqa
7272
from xrspatial.pathfinding import multi_stop_search # noqa
73+
from xrspatial.normalize import rescale # noqa
74+
from xrspatial.normalize import standardize # noqa
7375
from xrspatial.perlin import perlin # noqa
7476
from xrspatial.preview import preview # noqa
7577
from xrspatial.proximity import allocation # noqa

xrspatial/accessor.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,16 @@ def preview(self, **kwargs):
321321
from .preview import preview
322322
return preview(self._obj, **kwargs)
323323

324+
# ---- Normalization ----
325+
326+
def rescale(self, **kwargs):
327+
from .normalize import rescale
328+
return rescale(self._obj, **kwargs)
329+
330+
def standardize(self, **kwargs):
331+
from .normalize import standardize
332+
return standardize(self._obj, **kwargs)
333+
324334
# ---- Raster to vector ----
325335

326336
def polygonize(self, **kwargs):
@@ -637,6 +647,16 @@ def preview(self, **kwargs):
637647
from .preview import preview
638648
return preview(self._obj, **kwargs)
639649

650+
# ---- Normalization ----
651+
652+
def rescale(self, **kwargs):
653+
from .normalize import rescale
654+
return rescale(self._obj, **kwargs)
655+
656+
def standardize(self, **kwargs):
657+
from .normalize import standardize
658+
return standardize(self._obj, **kwargs)
659+
640660
# ---- Fire ----
641661

642662
def burn_severity_class(self, **kwargs):

0 commit comments

Comments
 (0)