Skip to content

Commit

Permalink
Merge pull request #360 from astronomy-commons/sean/change-hipscat-index
Browse files Browse the repository at this point in the history
Replace hipscat_id with spatial index
  • Loading branch information
smcguire-cmu authored Oct 1, 2024
2 parents d87b494 + 76c5565 commit 67a5964
Show file tree
Hide file tree
Showing 8 changed files with 221 additions and 363 deletions.
76 changes: 2 additions & 74 deletions docs/guide/pixel_math.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,81 +193,9 @@ We take a chunk of our points and generate a set matrix of points with a shape (

To speed up our calculations, the inner loops of calculations is compiled with the `numba` JIT compiler.

## HiPSCat ID
## Spatial Index

This index is defined as a 64-bit integer which has two parts:

* healpix pixel (at order 19)
* incrementing counter (within same healpix, for uniqueness)

Visually, in bits:

```
|------------------------------------------|-------------------|
|<----- healpixel at order 19 ------>|<-- counter -->|
```
This index is defined as a 64-bit integer with the healpix pixel of the row (at order 29)

This provides us with an increasing index, that will not overlap
between spatially partitioned data files.

### compute_hipscat_id

For a given list of coordinates, compute the HiPSCat ID s.t. coordinates in the
same order 19 pixel are appended with a counter to make a unique hats_id.

For the example, we'll work with the following simplified hex numbers to help
illustrate: `[0xbeee, 0xbeef, 0xbeee, 0xfeed, 0xbeef]`

#### Counter construction

To construct our counters we sort the pixel numbers, call
[numpy.unique](https://numpy.org/doc/stable/reference/generated/numpy.unique.html),
then do some silly arithmetic with the results.

The sorted representation of the above is `[beee, beee, beef, beef, feed]`. What
we're looking for at this point is a counter that indicates if the value is being
repeated and how many times we've seen this value so far. e.g. `[0, 1, 0, 1, 0]`.

The `np.unique` call will yield three outputs:

* `unique_values` (ignored) `[beee, beef, feed]`
* `unique_indices` - the index of the *first* occurrence of each unique value.
`[0, 2, 4]`
* `unique_inverse` - the indices of *all* occurrences of the unique values
(using indexes from that `unique_values`), used to reconstruct the
original array. `[0, 0, 1, 1, 2]`

By indexing into `unique_indexes` by the `unique_inverse`, we get an array with the
index of the first time that healpix pixel was seen, which provides an step-like
offset. e.g. `[0, 0, 2, 2, 4]`. This jumps up to the current index each time the
pixel changes. We subtract this from the actual index value (e.g. `[0, 1, 2, 3, 4]`)
to get the desired counter. e.g.

```
[0, 1, 2, 3, 4]
-
[0, 0, 2, 2, 4]
=
[0, 1, 0, 1, 0]
```

#### Putting it together

After mapping our coordinates into order 19 healpix pixels, we bit-shift them to make room for our counters. Then we add the counters to the end.

e.g.

```
[ 0xbeee, 0xbeee, 0xbeef, 0xbeef, 0xfeed]
<< shifted <<
[0x5F7700000, 0x5F7700000, 0x5F7780000, 0x5F7780000, 0x7F7680000]
+
[ 0, 1, 0, 1, 0]
=
[0x5F7700000, 0x5F7700001, 0x5F7780000, 0x5F7780001, 0x7F7680000]
```

And finally, we unsort the array to get back the hats ids in the order the
coordinates were provided.

`[0x5F7700000, 0x5F7780000, 0x5F7700001, 0x7F7680000, 0x5F7780001]`
2 changes: 1 addition & 1 deletion src/hats/pixel_math/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from .healpix_pixel import HealpixPixel
from .healpix_pixel_convertor import HealpixInputTypes, get_healpix_pixel
from .hipscat_id import compute_hipscat_id, hipscat_id_to_healpix
from .margin_bounding import check_margin_bounds
from .partition_stats import empty_histogram, generate_alignment, generate_histogram
from .pixel_margins import get_margin
from .spatial_index import compute_spatial_index, spatial_index_to_healpix
2 changes: 1 addition & 1 deletion src/hats/pixel_math/healpix_pixel.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

import numpy as np

from hats.pixel_math.hipscat_id import SPATIAL_INDEX_ORDER
from hats.pixel_math.spatial_index import SPATIAL_INDEX_ORDER


@dataclass(eq=True, frozen=True, order=True)
Expand Down
111 changes: 0 additions & 111 deletions src/hats/pixel_math/hipscat_id.py
Original file line number Diff line number Diff line change
@@ -1,111 +0,0 @@
"""
Compute the hipscat index field
This index is defined as a 64-bit integer which has two parts:
- healpix pixel (at order 19)
- incrementing counter (within same healpix, for uniqueness)
::
|------------------------------------------|-------------------|
|<----- healpixel at order 19 ------>|<-- counter -->|
This provides us with an increasing index, that will not overlap
between spatially partitioned data files.
See the README in this directory for more information on the fiddly pixel math.
"""

from __future__ import annotations

from typing import List

import numpy as np

import hats.pixel_math.healpix_shim as hp

SPATIAL_INDEX_COLUMN = "_healpix_29"
SPATIAL_INDEX_ORDER = 19
HIPSCAT_ID_MAX = 2**64 - 1


def compute_hipscat_id(ra_values: List[float], dec_values: List[float]) -> np.ndarray:
"""Compute the hipscat ID field.
Args:
ra_values (List[float]): celestial coordinates, right ascension
dec_values (List[float]): celestial coordinates, declination
Returns:
one-dimensional numpy array of int64s with hipscat IDs for the sky positions.
Raises:
ValueError: if the length of the input lists don't match.
"""
if len(ra_values) != len(dec_values):
raise ValueError("ra and dec arrays should have the same length")

## Construct the bit-shifted healpix segment
value_count = len(ra_values)
mapped_pixels = hp.ang2pix(2**SPATIAL_INDEX_ORDER, ra_values, dec_values, nest=True, lonlat=True)

## We sort to put pixels next to each other that will need to be counted.
## This simplifies the counter logic, as we can subtract the index where
## we first see the pixel value from the current index to get the offset counter.
sort_index = np.argsort(mapped_pixels, kind="stable")
mapped_pixels = mapped_pixels[sort_index]

## Construct the counter.
_, unique_indices, unique_inverse = np.unique(mapped_pixels, return_inverse=True, return_index=True)
unique_indices = unique_indices.astype(np.uint64)
boring_number_index = np.arange(value_count, dtype=np.uint64)
offset_counter = boring_number_index - unique_indices[unique_inverse]

## Add counter to shifted pixel, and map back to the original, unsorted, values
shifted_pixels = _compute_hipscat_id_from_mapped_pixels(mapped_pixels, offset_counter)

unsort_index = np.argsort(sort_index, kind="stable")
return shifted_pixels[unsort_index]


def _compute_hipscat_id_from_mapped_pixels(mapped_pixels, offset_counter):
shifted_pixels = mapped_pixels.astype(np.uint64) << np.uint64(64 - (4 + 2 * SPATIAL_INDEX_ORDER))
shifted_pixels = shifted_pixels + offset_counter
return shifted_pixels


def hipscat_id_to_healpix(ids: List[int], target_order: int = SPATIAL_INDEX_ORDER) -> np.ndarray:
"""Convert some hipscat ids to the healpix pixel at the specified order
This is just bit-shifting the counter away.
Args:
ids (List[int64]): list of well-formatted hipscat ids
target_order (int64): Defaults to `SPATIAL_INDEX_ORDER`.
The order of the pixel to get from the hipscat ids.
Returns:
numpy array of target_order pixels from the hipscat id
"""
return np.asarray(ids, dtype=np.uint64) >> (64 - (4 + 2 * target_order))


def healpix_to_hipscat_id(
order: int | List[int], pixel: int | List[int], counter: int | List[int] = 0
) -> np.uint64 | np.ndarray:
"""Convert a healpix pixel to a hipscat_id
This maps the healpix pixel to the lowest pixel number within that pixel at order 19,
then shifts and adds the given counter to get a hipscat_id.
Useful for operations such as filtering by hipscat_id.
Args:
order (int64 | List[int64]): order of pixel to convert
pixel (int64 | List[int64]): pixel number in nested ordering of pixel to convert
counter (int64 | List[int64]) (Default: 0): counter value in converted hipscat id
Returns:
hipscat id or numpy array of hipscat ids
"""
order = np.uint64(order)
pixel = np.uint64(pixel)
counter = np.uint64(counter)
pixel_higher_order = pixel * (4 ** (SPATIAL_INDEX_ORDER - order))
hipscat_id = _compute_hipscat_id_from_mapped_pixels(pixel_higher_order, counter)
return hipscat_id
62 changes: 62 additions & 0 deletions src/hats/pixel_math/spatial_index.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
from __future__ import annotations

from typing import List

import numpy as np

import hats.pixel_math.healpix_shim as hp

SPATIAL_INDEX_COLUMN = "_healpix_29"
SPATIAL_INDEX_ORDER = 29


def compute_spatial_index(ra_values: List[float], dec_values: List[float]) -> np.ndarray:
"""Compute the healpix index field.
Args:
ra_values (List[float]): celestial coordinates, right ascension in degrees
dec_values (List[float]): celestial coordinates, declination in degrees
Returns:
one-dimensional numpy array of int64s with healpix NESTED pixel numbers at order 29
Raises:
ValueError: if the length of the input lists don't match.
"""
if len(ra_values) != len(dec_values):
raise ValueError("ra and dec arrays should have the same length")

mapped_pixels = hp.ang2pix(2**SPATIAL_INDEX_ORDER, ra_values, dec_values, nest=True, lonlat=True)

return mapped_pixels


def spatial_index_to_healpix(ids: List[int], target_order: int = SPATIAL_INDEX_ORDER) -> np.ndarray:
"""Convert _healpix_29 values to the healpix pixel at the specified order
Args:
ids (List[int64]): list of well-formatted _healpix_29 values
target_order (int64): Defaults to `SPATIAL_INDEX_ORDER`.
The order of the pixel to get from the healpix index.
Returns:
numpy array of target_order pixels from the healpix index
"""
delta_order = SPATIAL_INDEX_ORDER - target_order
return np.array(ids) >> (2 * delta_order)


def healpix_to_spatial_index(order: int | List[int], pixel: int | List[int]) -> np.uint64 | np.ndarray:
"""Convert a healpix pixel to the healpix index
This maps the healpix pixel to the lowest pixel number within that pixel at order 29.
Useful for operations such as filtering by _healpix_29.
Args:
order (int64 | List[int64]): order of pixel to convert
pixel (int64 | List[int64]): pixel number in nested ordering of pixel to convert
Returns:
healpix index or numpy array of healpix indices
"""
order = np.uint64(order)
pixel = np.uint64(pixel)
pixel_higher_order = pixel * (4 ** (SPATIAL_INDEX_ORDER - order))
return pixel_higher_order
4 changes: 2 additions & 2 deletions tests/hats/pixel_math/test_healpix_pixel.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import pytest

from hats.pixel_math.healpix_pixel import HealpixPixel
from hats.pixel_math.hipscat_id import SPATIAL_INDEX_ORDER
from hats.pixel_math.spatial_index import SPATIAL_INDEX_ORDER


def test_pixels_equal():
Expand All @@ -16,7 +16,7 @@ def test_pixels_equal():

def test_order_greater_than_max_order_fails():
with pytest.raises(ValueError):
HealpixPixel(order=20, pixel=0)
HealpixPixel(order=30, pixel=0)


def test_equal_pixel_hash_equal():
Expand Down
Loading

0 comments on commit 67a5964

Please sign in to comment.