diff --git a/CHANGELOG.md b/CHANGELOG.md index a32ee55..c40d920 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,13 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added -### Changed - -### Deprecated - -### Removed - -### Fixed +- Added `verify` command to cli with accompanying script to ensure that the Merkle tree verification json produced by the `compute` command matches [#2](https://github.com/stacchain/stac-merkle-tree-cli/pull/2) ## [v0.3.0] - 2024-11-20 diff --git a/README.md b/README.md index c262d9a..1cbde31 100644 --- a/README.md +++ b/README.md @@ -123,21 +123,27 @@ catalog/ ### Basic Usage -After installing the package, you can use the `stac-merkle-tree-cli` command to compute and add Merkle information to your STAC catalog. +After installing the package, you can use the `stac-merkle-tree-cli` command to compute or verify Merkle information in your STAC catalog. + +### Commands: + +### 1. `compute` + +The compute command computes and adds Merkle information (`merkle:object_hash`, `merkle:root`, `merkle:hash_method`) to your STAC catalog. ```bash -stac-merkle-tree-cli path/to/catalog_directory [OPTIONS] +stac-merkle-tree-cli compute path/to/catalog_directory [OPTIONS] ``` #### Parameters: -- path/to/catalog_directory: (Required) Path to the root directory containing catalog.json. +- `path/to/catalog_directory`: (Required) Path to the root directory containing `catalog.json`. #### Options: -- --merkle-tree-file TEXT: (Optional) Path to the output Merkle tree structure file. Defaults to merkle_tree.json within the provided catalog_directory. +- `--merkle-tree-file TEXT`: (Optional) Path to the output Merkle tree structure file. Defaults to `merkle_tree.json` within the provided catalog_directory. -### Example +#### Example Assuming your directory structure is as follows: @@ -160,7 +166,7 @@ my_stac_catalog/ Run the tool: ```bash -stac-merkle-tree-cli my_stac_catalog/ +stac-merkle-tree-cli compute my_stac_catalog/ ``` Expected Output: @@ -176,6 +182,43 @@ Processed Catalog: /path/to/my_stac_catalog/catalog.json Merkle tree structure saved to /path/to/my_stac_catalog/merkle_tree.json ``` +### 2. `verify` + +The `verify` command validates the integrity of a Merkle tree JSON file by recalculating `merkle:root` values and comparing them to the expected values. + +```bash +stac-merkle-tree-cli verify path/to/merkle_tree.json +``` + +#### Parameters: + +- `path/to/merkle_tree.json`: (Required) Path to the Merkle tree JSON file to verify. + +#### Example: + +Run the command: + +```bash +stac-merkle-tree-cli verify my_stac_catalog/merkle_tree.json +``` + +Example Output (Success): + +```bash +Verification Successful: The merkle:root matches. +``` + +Example Output (Failure): + +```bash +Verification Failed: + - Expected merkle:root: 5808b480d9bed10e7663d52c218571d053c7b5df42a5aefc11e216c66c711f77 + - Calculated merkle:root: f0ed08b316b917a98c085e699c090af1cea964b697dd0bc44491ebced4d0006c +Discrepancies found in the following nodes: + - Collection 'COP-DEM' has mismatched merkle:root. + - Catalog 'Catalogue' has mismatched merkle:root. +``` + ## Merkle Tree Extension Specification This tool complies with the [Merkle Tree Extension Specification](https://github.com/stacchain/merkle-tree), which outlines how to encode STAC objects in a Merkle tree to ensure metadata integrity. @@ -357,15 +400,19 @@ Contributions are welcome! If you encounter issues or have suggestions for impro ## Verification Steps -### 1. Run the CLI Tool: +### 1. Compute Merkle Tree + +Use the `compute` command to process your STAC catalog and generate a Merkle tree structure. ```bash -stac-merkle-tree-cli path/to/catalog_directory +stac-merkle-tree-cli compute path/to/catalog_directory ``` -### 2. Check the Output: +#### Options: + +--merkle-tree-file : Specify the output file name for the Merkle tree JSON (default is merkle_tree.json). -- **Console Output**: You should see logs indicating the processing of Items, Collections, and the Catalog. +#### Example Output: ```ruby Processed Item: /path/to/catalog_directory/collections/collection1/item1.json @@ -378,17 +425,64 @@ Processed Catalog: /path/to/catalog_directory/catalog.json Merkle tree structure saved to /path/to/catalog_directory/merkle_tree.json ``` -- **Merkle Tree JSON**: Verify that the `merkle_tree.json` (or your specified output file) accurately represents the hierarchical structure of your STAC catalog with correct `merkle:object_hash` and `merkle:root` values. +- The tool will generate a `merkle_tree.json` file (or the specified output file), which represents the hierarchical structure of your STAC catalog, including `merkle:object_hash` and `merkle:root` values. + +### 2. Verify Merkle Tree + +Use the verify command to validate the integrity of the generated Merkle tree JSON file. + +```bash +stac-merkle-tree-cli verify path/to/catalog_directory/merkle_tree.json +``` + +#### Example Output (Success): + +```bash +Verification Successful: The merkle:root matches. +``` + +#### Example Output (Failure): + +```bash +Verification Failed: + - Expected merkle:root: 5808b480d9bed10e7663d52c218571d053c7b5df42a5aefc11e216c66c711f77 + - Calculated merkle:root: f0ed08b316b917a98c085e699c090af1cea964b697dd0bc44491ebced4d0006c +Discrepancies found in the following nodes: + - Collection 'COP-DEM' has mismatched merkle:root. + - Catalog 'Catalogue' has mismatched merkle:root. +``` + +### 3. Validate STAC Structure Updates + +Ensure that the STAC files (e.g., `catalog.json`, `collection.json`, item files) have been updated correctly: + +#### Catalog: + +- `catalog.json` should include: + + - `merkle:object_hash` + - `merkle:root` + - `merkle:hash_method` + +#### Collections: + +- Each `collection.json` should include: + - `merkle:object_hash` + - `merkle:root` + - `merkle:hash_method` + +#### Items: -### 3. Verify Integrity: +- Each Item JSON should have `merkle:object_hash` within its properties field. -- **Catalog**: Ensure that the `catalog.json` now includes `merkle:object_hash`, `merkle:root`, and `merkle:hash_method`. +### 4. Verify Output File -- **Collections**: Each `collection.json` should include `merkle:object_hash`, `merkle:root`, and `merkle:hash_method`. +Review the generated merkle_tree.json file to confirm: -- **Items**: Each Item's JSON should have `merkle:object_hash` within the properties field. +- Proper hierarchical representation of the catalog. +- Correct merkle:object_hash and merkle:root values for each node -### 4. Run Tests: +### 5. Run Tests: Ensure that all tests pass by executing: diff --git a/example_catalog/catalog.json b/example_catalog/catalog.json index f6a3283..db3e10f 100644 --- a/example_catalog/catalog.json +++ b/example_catalog/catalog.json @@ -56,7 +56,7 @@ "https://stacchain.github.io/merkle-tree/v1.0.0/schema.json" ], "merkle:object_hash": "b14fd102417c1d673f481bc053d19946aefdc27d84c584989b23c676c897bd5a", - "merkle:root": "2c637f0bae066e89de80839f3468f73e396e9d1498faefc469f0fd1039e19e0c", + "merkle:root": "797bfc76d971db745b9b476d8c44e9fa021af889d5761476b74e982d69e7a3c2", "merkle:hash_method": { "function": "sha256", "fields": [ diff --git a/example_catalog/collections/COP-DEM/collection.json b/example_catalog/collections/COP-DEM/collection.json index 2e0e283..c959801 100644 --- a/example_catalog/collections/COP-DEM/collection.json +++ b/example_catalog/collections/COP-DEM/collection.json @@ -50,7 +50,7 @@ "href": "https://catalogue.dataspace.copernicus.eu/stac/collections/COP-DEM/items" } ], - "merkle:root": "fd2700dd6b39d254530a1a996d667620c4c7aacb69c86e9ea4ec71a7269f6aac", + "merkle:root": "aa7f89b29cb339032ec86d81d4090bdbd52199152fb657f50b08eec1b3234ee2", "merkle:hash_method": { "function": "sha256", "fields": [ diff --git a/example_catalog/merkle_tree.json b/example_catalog/merkle_tree.json index e4bf94f..2a4be8d 100644 --- a/example_catalog/merkle_tree.json +++ b/example_catalog/merkle_tree.json @@ -2,13 +2,13 @@ "node_id": "Catalogue", "type": "Catalog", "merkle:object_hash": "b14fd102417c1d673f481bc053d19946aefdc27d84c584989b23c676c897bd5a", - "merkle:root": "2c637f0bae066e89de80839f3468f73e396e9d1498faefc469f0fd1039e19e0c", + "merkle:root": "797bfc76d971db745b9b476d8c44e9fa021af889d5761476b74e982d69e7a3c2", "children": [ { "node_id": "COP-DEM", "type": "Collection", "merkle:object_hash": "17789b31f8ae304de8dbe2350a15263dbf5e31adfc0d17a997e7e55f4cfc2f53", - "merkle:root": "fd2700dd6b39d254530a1a996d667620c4c7aacb69c86e9ea4ec71a7269f6aac", + "merkle:root": "aa7f89b29cb339032ec86d81d4090bdbd52199152fb657f50b08eec1b3234ee2", "children": [ { "node_id": "DEM1_SAR_DGE_30_20101215T103647_20130405T103047_ADS_000000_oCX9", diff --git a/example_catalog_nested_items/catalog.json b/example_catalog_nested_items/catalog.json index 247fbe7..d8e5c61 100644 --- a/example_catalog_nested_items/catalog.json +++ b/example_catalog_nested_items/catalog.json @@ -56,7 +56,7 @@ "https://stacchain.github.io/merkle-tree/v1.0.0/schema.json" ], "merkle:object_hash": "b14fd102417c1d673f481bc053d19946aefdc27d84c584989b23c676c897bd5a", - "merkle:root": "2c637f0bae066e89de80839f3468f73e396e9d1498faefc469f0fd1039e19e0c", + "merkle:root": "614dbf76bf77d22fa824867beb0980c6c2c44aa12252555a8208783923a7569d", "merkle:hash_method": { "function": "sha256", "fields": [ diff --git a/example_catalog_nested_items/collections/COP-DEM/collection.json b/example_catalog_nested_items/collections/COP-DEM/collection.json index 1896065..a74f364 100644 --- a/example_catalog_nested_items/collections/COP-DEM/collection.json +++ b/example_catalog_nested_items/collections/COP-DEM/collection.json @@ -50,7 +50,7 @@ "href": "https://catalogue.dataspace.copernicus.eu/stac/collections/COP-DEM/items" } ], - "merkle:root": "2f4aa32184fbe70bd385d5b6b6e6d4ec5eb8b2e43611b441febcdf407c4e0030", + "merkle:root": "7e990e84a86dc146547b6e3a3e35196f53e0123b6f216f871abd5d1f55d2ccc5", "merkle:hash_method": { "function": "sha256", "fields": [ diff --git a/example_catalog_nested_items/merkle_tree.json b/example_catalog_nested_items/merkle_tree.json index e41adda..8836bcb 100644 --- a/example_catalog_nested_items/merkle_tree.json +++ b/example_catalog_nested_items/merkle_tree.json @@ -2,13 +2,13 @@ "node_id": "Catalogue", "type": "Catalog", "merkle:object_hash": "b14fd102417c1d673f481bc053d19946aefdc27d84c584989b23c676c897bd5a", - "merkle:root": "2c637f0bae066e89de80839f3468f73e396e9d1498faefc469f0fd1039e19e0c", + "merkle:root": "614dbf76bf77d22fa824867beb0980c6c2c44aa12252555a8208783923a7569d", "children": [ { "node_id": "COP-DEM", "type": "Collection", "merkle:object_hash": "17789b31f8ae304de8dbe2350a15263dbf5e31adfc0d17a997e7e55f4cfc2f53", - "merkle:root": "2f4aa32184fbe70bd385d5b6b6e6d4ec5eb8b2e43611b441febcdf407c4e0030", + "merkle:root": "7e990e84a86dc146547b6e3a3e35196f53e0123b6f216f871abd5d1f55d2ccc5", "children": [ { "node_id": "DEM1_SAR_DGE_30_20101212T230244_20140325T230302_ADS_000000_1jTi", diff --git a/example_catalog_simple/catalog.json b/example_catalog_simple/catalog.json new file mode 100644 index 0000000..04a048e --- /dev/null +++ b/example_catalog_simple/catalog.json @@ -0,0 +1,68 @@ +{ + "stac_version": "1.0.0", + "type": "Catalog", + "id": "Catalogue", + "description": "Catalogue of data available in the platform", + "conformsTo": [ + "https://api.stacspec.org/v1.0.0/core", + "https://api.stacspec.org/v1.0.0/collections", + "https://api.stacspec.org/v1.0.0/ogcapi-features", + "https://api.stacspec.org/v1.0.0/item-search", + "https://www.opengis.net/spec/ogcapi-features-1/1.0/conf/core", + "https://www.opengis.net/spec/ogcapi-features-1/1.0/conf/geojson" + ], + "links": [ + { + "href": "https://catalogue.dataspace.copernicus.eu/stac", + "rel": "root", + "type": "application/json" + }, + { + "href": "https://catalogue.dataspace.copernicus.eu/stac", + "rel": "self", + "type": "application/json" + }, + { + "href": "https://catalogue.dataspace.copernicus.eu/stac/api", + "rel": "service-desc", + "type": "application/vnd.oai.openapi" + }, + { + "href": "https://catalogue.dataspace.copernicus.eu/stac/conformance", + "rel": "conformance", + "type": "application/json" + }, + { + "href": "https://catalogue.dataspace.copernicus.eu/stac/collections", + "rel": "data", + "type": "application/json" + }, + { + "href": "https://catalogue.dataspace.copernicus.eu/stac/search", + "title": "STAC search", + "rel": "search", + "type": "application/json", + "method": "GET" + }, + { + "href": "https://catalogue.dataspace.copernicus.eu/stac/search", + "title": "STAC search", + "rel": "search", + "type": "application/json", + "method": "POST" + } + ], + "stac_extensions": [ + "https://stacchain.github.io/merkle-tree/v1.0.0/schema.json" + ], + "merkle:object_hash": "b14fd102417c1d673f481bc053d19946aefdc27d84c584989b23c676c897bd5a", + "merkle:root": "f0ed08b316b917a98c085e699c090af1cea964b697dd0bc44491ebced4d0006c", + "merkle:hash_method": { + "function": "sha256", + "fields": [ + "*" + ], + "ordering": "ascending", + "description": "Computed by including the merkle:root of collections and the catalogs own merkle:object_hash." + } +} diff --git a/example_catalog_simple/collections/COP-DEM/DEM1_SAR_DGE_30_20101212T230244_20140325T230302_ADS_000000_1jTi.json b/example_catalog_simple/collections/COP-DEM/DEM1_SAR_DGE_30_20101212T230244_20140325T230302_ADS_000000_1jTi.json new file mode 100644 index 0000000..392cb0c --- /dev/null +++ b/example_catalog_simple/collections/COP-DEM/DEM1_SAR_DGE_30_20101212T230244_20140325T230302_ADS_000000_1jTi.json @@ -0,0 +1,92 @@ +{ + "type": "Feature", + "stac_version": "1.0.0", + "stac_extensions": [ + "https://stac-extensions.github.io/alternate-assets/v1.1.0/schema.json", + "https://stac-extensions.github.io/storage/v1.0.0/schema.json", + "https://stacchain.github.io/merkle-tree/v1.0.0/schema.json" + ], + "id": "DEM1_SAR_DGE_30_20101212T230244_20140325T230302_ADS_000000_1jTi", + "collection": "COP-DEM", + "geometry": { + "type": "Polygon", + "coordinates": [ + [ + [ + 99, + -1 + ], + [ + 100, + -1 + ], + [ + 100, + 0 + ], + [ + 99, + 0 + ], + [ + 99, + -1 + ] + ] + ] + }, + "properties": { + "authority": "ESA", + "productType": "DGE_30", + "spatialResolution": 30, + "polarisationChannels": "HH", + "datetime": "2010-12-12T23:02:44.000000Z", + "end_datetime": "2014-03-25T23:03:02.000000Z", + "start_datetime": "2010-12-12T23:02:44.000000Z", + "merkle:object_hash": "ce9f56e695ab1751b8f0c8d9ef1f1ecedaf04574ec3077e70e7426ec9fc61ea4" + }, + "bbox": [ + 99, + -1, + 100, + 0 + ], + "links": [ + { + "rel": "root", + "type": "application/json", + "href": "https://catalogue.dataspace.copernicus.eu/stac" + }, + { + "rel": "self", + "type": "application/json", + "href": "https://catalogue.dataspace.copernicus.eu/stac/collections/COP-DEM/items/DEM1_SAR_DGE_30_20101212T230244_20140325T230302_ADS_000000_1jTi" + }, + { + "rel": "collection", + "type": "application/json", + "href": "https://catalogue.dataspace.copernicus.eu/stac/collections/COP-DEM" + } + ], + "assets": { + "QUICKLOOK": { + "href": "https://catalogue.dataspace.copernicus.eu/odata/v1/Assets(a75813a4-8a45-4227-9234-5dfab628d907)/$value", + "title": "QUICKLOOK", + "type": "image/jpeg" + }, + "PRODUCT": { + "href": "https://catalogue.dataspace.copernicus.eu/odata/v1/Products(6ef3485f-2eb0-5463-9d3c-a67264c47eb8)/$value", + "title": "Product", + "type": "application/octet-stream", + "alternate": { + "s3": { + "href": "/eodata/auxdata/CopDEM/COP-DEM_GLO-30-DGED/DEM1_SAR_DGE_30_20101212T230244_20140325T230302_ADS_000000_1jTi.DEM", + "storage:platform": "CLOUDFERRO", + "storage:region": "waw", + "storage:requester_pays": false, + "storage:tier": "Online" + } + } + } + } +} diff --git a/example_catalog_simple/collections/COP-DEM/collection.json b/example_catalog_simple/collections/COP-DEM/collection.json new file mode 100644 index 0000000..a74f364 --- /dev/null +++ b/example_catalog_simple/collections/COP-DEM/collection.json @@ -0,0 +1,63 @@ +{ + "stac_version": "1.0.0", + "stac_extensions": [ + "https://stacchain.github.io/merkle-tree/v1.0.0/schema.json" + ], + "id": "COP-DEM", + "title": "COP-DEM", + "description": "The Copernicus DEM is a Digital Surface Model (DSM) that represents the surface of the Earth including buildings, infrastructure and vegetation. Data were acquired through the TanDEM-X mission between 2011 and 2015 [https://spacedata.copernicus.eu/collections/copernicus-digital-elevation-model].", + "type": "Collection", + "license": "proprietary", + "extent": { + "spatial": { + "bbox": [ + [ + -180, + -90, + 180, + 90 + ] + ] + }, + "temporal": { + "interval": [ + [ + "2010-12-12T00:00:00Z", + null + ] + ] + } + }, + "links": [ + { + "rel": "root", + "type": "application/json", + "href": "https://catalogue.dataspace.copernicus.eu/stac" + }, + { + "rel": "parent", + "type": "application/json", + "href": "https://catalogue.dataspace.copernicus.eu/stac" + }, + { + "rel": "self", + "type": "application/json", + "href": "https://catalogue.dataspace.copernicus.eu/stac/collections/COP-DEM" + }, + { + "rel": "items", + "type": "application/json", + "href": "https://catalogue.dataspace.copernicus.eu/stac/collections/COP-DEM/items" + } + ], + "merkle:root": "7e990e84a86dc146547b6e3a3e35196f53e0123b6f216f871abd5d1f55d2ccc5", + "merkle:hash_method": { + "function": "sha256", + "fields": [ + "*" + ], + "ordering": "ascending", + "description": "Computed by including merkle:object_hash values in ascending order and building the Merkle tree." + }, + "merkle:object_hash": "17789b31f8ae304de8dbe2350a15263dbf5e31adfc0d17a997e7e55f4cfc2f53" +} diff --git a/example_catalog_simple/merkle_tree.json b/example_catalog_simple/merkle_tree.json new file mode 100644 index 0000000..0164f53 --- /dev/null +++ b/example_catalog_simple/merkle_tree.json @@ -0,0 +1,21 @@ +{ + "node_id": "Catalogue", + "type": "Catalog", + "merkle:object_hash": "b14fd102417c1d673f481bc053d19946aefdc27d84c584989b23c676c897bd5a", + "merkle:root": "f0ed08b316b917a98c085e699c090af1cea964b697dd0bc44491ebced4d0006c", + "children": [ + { + "node_id": "COP-DEM", + "type": "Collection", + "merkle:object_hash": "17789b31f8ae304de8dbe2350a15263dbf5e31adfc0d17a997e7e55f4cfc2f53", + "merkle:root": "7e990e84a86dc146547b6e3a3e35196f53e0123b6f216f871abd5d1f55d2ccc5", + "children": [ + { + "node_id": "DEM1_SAR_DGE_30_20101212T230244_20140325T230302_ADS_000000_1jTi", + "type": "Item", + "merkle:object_hash": "ce9f56e695ab1751b8f0c8d9ef1f1ecedaf04574ec3077e70e7426ec9fc61ea4" + } + ] + } + ] +} \ No newline at end of file diff --git a/setup.py b/setup.py index 1a8a9c7..c123f5d 100644 --- a/setup.py +++ b/setup.py @@ -1,7 +1,7 @@ from setuptools import setup, find_packages setup( - name='stac-merkle-tree-cli', + name='stac_merkle_tree_cli', version='0.3.0', author='Jonathan Healy', author_email='jonathan.d.healy@gmail.com', @@ -16,7 +16,7 @@ ], entry_points={ 'console_scripts': [ - 'stac-merkle-tree-cli=stac_merkle_tree_cli.cli:main', + 'stac-merkle-tree-cli=stac_merkle_tree_cli.cli:cli', ], }, classifiers=[ diff --git a/stac_merkle_tree_cli/cli.py b/stac_merkle_tree_cli/cli.py index 683d65b..1a87dfe 100644 --- a/stac_merkle_tree_cli/cli.py +++ b/stac_merkle_tree_cli/cli.py @@ -4,21 +4,28 @@ import json from pathlib import Path from .compute_merkle_info import process_catalog +from .verify_merkle_tree_json import verify_merkle_tree +@click.group() +def cli(): + """ + STAC Merkle Tree CLI Tool. + + Commands: + compute Compute Merkle hashes for a STAC catalog. + verify Verify the integrity of a Merkle tree JSON file. + """ + pass -@click.command() +@cli.command() @click.argument('catalog_path', type=click.Path(exists=True, file_okay=False), required=True) @click.option('--merkle-tree-file', type=click.Path(), default='merkle_tree.json', help='Path to the output Merkle tree structure file.') -def main(catalog_path: str, merkle_tree_file: str): +def compute(catalog_path: str, merkle_tree_file: str): """ - CLI tool to compute Merkle hashes for STAC catalogs, handling nested catalogs and collections. - - Parameters: - - CATALOG_PATH: Path to the root directory containing 'catalog.json'. + Compute Merkle hashes for STAC catalogs, handling nested catalogs and collections. - Options: - --merkle-tree-file TEXT Path to the output Merkle tree structure file. Defaults to 'merkle_tree.json'. + CATALOG_PATH: Path to the root directory containing 'catalog.json'. """ catalog_dir = Path(catalog_path) catalog_json_path = catalog_dir / 'catalog.json' @@ -43,7 +50,7 @@ def main(catalog_path: str, merkle_tree_file: str): exit(1) # Save the merkle_tree.json - output_path = Path(f"{catalog_path}/{merkle_tree_file}") + output_path = Path(catalog_path) / merkle_tree_file try: with output_path.open('w', encoding='utf-8') as f: json.dump(merkle_tree, f, indent=2) @@ -52,6 +59,22 @@ def main(catalog_path: str, merkle_tree_file: str): click.echo(f"Error writing to {output_path}: {e}", err=True) exit(1) +@cli.command() +@click.argument('merkle_tree_file', type=click.Path(exists=True, dir_okay=False), required=True) +def verify(merkle_tree_file: str): + """ + Verify that the merkle:root in the Merkle tree JSON matches the recalculated root. + + MERKLE_TREE_FILE: Path to the Merkle tree JSON file. + """ + merkle_tree_path = Path(merkle_tree_file) + verification_result = verify_merkle_tree(merkle_tree_path) + if verification_result: + click.echo("Verification Successful: The merkle:root matches.") + exit(0) + else: + click.echo("Verification Failed: The merkle:root does not match.", err=True) + exit(1) if __name__ == '__main__': - main() + cli() diff --git a/stac_merkle_tree_cli/compute_merkle_info.py b/stac_merkle_tree_cli/compute_merkle_info.py index cb8af8c..f3c3b06 100644 --- a/stac_merkle_tree_cli/compute_merkle_info.py +++ b/stac_merkle_tree_cli/compute_merkle_info.py @@ -50,47 +50,44 @@ def compute_merkle_object_hash(stac_object: Dict[str, Any], hash_method: Dict[st def compute_merkle_root(hashes: List[str], hash_method: Dict[str, Any]) -> str: - """ - Computes the merkle:root by building a Merkle tree from a list of hashes. - - Parameters: - - hashes (List[str]): List of hexadecimal hash strings. - - hash_method (Dict[str, Any]): The hash method details from merkle:hash_method. - - Returns: - - str: The computed Merkle root as a hexadecimal string. - """ if not hashes: return '' - + + # Enforce ordering + ordering = hash_method.get('ordering', 'ascending') + if ordering == 'ascending': + hashes.sort() + elif ordering == 'descending': + hashes.sort(reverse=True) + elif ordering != 'unsorted': + raise ValueError(f"Unsupported ordering: {ordering}") + # Get the hash function hash_function_name = hash_method.get('function', 'sha256').replace('-', '').lower() hash_func = getattr(hashlib, hash_function_name, None) if not hash_func: raise ValueError(f"Unsupported hash function: {hash_function_name}") - + current_level = hashes.copy() + print(f"Initial hashes for merkle:root computation: {current_level}") - # Continue until we have only one hash (the root) while len(current_level) > 1: next_level = [] - # Process pairs for i in range(0, len(current_level), 2): left = current_level[i] - if i + 1 < len(current_level): - right = current_level[i + 1] - else: - # If odd number, duplicate the last hash - right = left - # Combine and hash + right = current_level[i + 1] if i + 1 < len(current_level) else left combined = bytes.fromhex(left) + bytes.fromhex(right) new_hash = hash_func(combined).hexdigest() next_level.append(new_hash) + print(f"Combined '{left}' + '{right}' => '{new_hash}'") current_level = next_level + print(f"Next level hashes: {current_level}") + print(f"Final merkle:root: {current_level[0]}") return current_level[0] + def process_item(item_path: Path, hash_method: Dict[str, Any]) -> Dict[str, Any]: """ Processes a STAC Item to compute and return its object hash. @@ -212,25 +209,31 @@ def process_collection(collection_path: Path, parent_hash_method: Dict[str, Any] own_object_hash = compute_merkle_object_hash(collection_json, hash_method) collection_json['merkle:object_hash'] = own_object_hash - # Compute merkle:root from own hash and child object hashes # Collect all hashes: own_object_hash + child hashes - child_hashes = [child.get('merkle:object_hash') for child in children if 'merkle:object_hash' in child] + child_hashes = [] + for child in children: + if child['type'] in {'Collection', 'Catalog'}: + child_hashes.append(child.get('merkle:root')) + else: + child_hashes.append(child.get('merkle:object_hash')) + # Exclude None values child_hashes = [h for h in child_hashes if h] + # Include own_object_hash all_hashes = child_hashes + [own_object_hash] + # Compute merkle:root merkle_root = compute_merkle_root(all_hashes, hash_method) collection_json['merkle:root'] = merkle_root collection_json['merkle:hash_method'] = hash_method - # Ensure the Merkle extension is listed - collection_json.setdefault('stac_extensions', []) + # Ensure the Merkle extension is listed and sorted extension_url = 'https://stacchain.github.io/merkle-tree/v1.0.0/schema.json' + collection_json.setdefault('stac_extensions', []) if extension_url not in collection_json['stac_extensions']: collection_json['stac_extensions'].append(extension_url) - # Sort stac_extensions for consistent ordering collection_json['stac_extensions'].sort() # Save the updated Collection JSON @@ -305,25 +308,31 @@ def process_catalog(catalog_path: Path, parent_hash_method: Dict[str, Any] = Non own_object_hash = compute_merkle_object_hash(catalog_json, hash_method) catalog_json['merkle:object_hash'] = own_object_hash - # Compute merkle:root from own hash and child object hashes # Collect all hashes: own_object_hash + child hashes - child_hashes = [child.get('merkle:object_hash') for child in children if 'merkle:object_hash' in child] + child_hashes = [] + for child in children: + if child['type'] in {'Collection', 'Catalog'}: + child_hashes.append(child.get('merkle:root')) + else: + child_hashes.append(child.get('merkle:object_hash')) + # Exclude None values child_hashes = [h for h in child_hashes if h] + # Include own_object_hash all_hashes = child_hashes + [own_object_hash] + # Compute merkle:root merkle_root = compute_merkle_root(all_hashes, hash_method) catalog_json['merkle:root'] = merkle_root catalog_json['merkle:hash_method'] = hash_method - # Ensure the Merkle extension is listed - catalog_json.setdefault('stac_extensions', []) + # Ensure the Merkle extension is listed and sorted extension_url = 'https://stacchain.github.io/merkle-tree/v1.0.0/schema.json' + catalog_json.setdefault('stac_extensions', []) if extension_url not in catalog_json['stac_extensions']: catalog_json['stac_extensions'].append(extension_url) - # Sort stac_extensions for consistent ordering catalog_json['stac_extensions'].sort() # Save the updated Catalog JSON diff --git a/stac_merkle_tree_cli/verify_merkle_tree_json.py b/stac_merkle_tree_cli/verify_merkle_tree_json.py new file mode 100644 index 0000000..2313083 --- /dev/null +++ b/stac_merkle_tree_cli/verify_merkle_tree_json.py @@ -0,0 +1,118 @@ +import hashlib +import logging +import json +from pathlib import Path +from typing import List, Dict, Any + +def compute_merkle_root(hashes: List[str], hash_method: Dict[str, Any]) -> str: + """ + Computes the Merkle root from a list of hashes based on the provided hash method. + """ + if not hashes: + return '' + + # Determine ordering + ordering = hash_method.get('ordering', 'ascending') + if ordering == 'ascending': + hashes.sort() + elif ordering == 'descending': + hashes.sort(reverse=True) + elif ordering == 'unsorted': + pass # Keep the original order + else: + raise ValueError(f"Unsupported ordering method: {ordering}") + + # Get the hash function + hash_function_name = hash_method.get('function', 'sha256').replace('-', '').lower() + hash_func = getattr(hashlib, hash_function_name, None) + if not hash_func: + raise ValueError(f"Unsupported hash function: {hash_function_name}") + + current_level = hashes.copy() + + while len(current_level) > 1: + next_level = [] + for i in range(0, len(current_level), 2): + left = current_level[i] + if i + 1 < len(current_level): + right = current_level[i + 1] + else: + right = left # Duplicate the last hash if odd number + + combined = bytes.fromhex(left) + bytes.fromhex(right) + new_hash = hash_func(combined).hexdigest() + next_level.append(new_hash) + current_level = next_level + + return current_level[0] + +def verify_merkle_tree(merkle_tree_path: Path) -> bool: + """ + Verifies that the merkle:root in the Merkle tree JSON matches the recalculated root. + """ + try: + with merkle_tree_path.open('r', encoding='utf-8') as f: + merkle_tree = json.load(f) + + discrepancies = [] + calculated_root = calculate_merkle_root_with_discrepancies(merkle_tree, discrepancies) + + original_root = merkle_tree.get('merkle:root') + + if not original_root: + print("Error: 'merkle:root' not found in the JSON.") + return False + + if calculated_root == original_root: + print(f"Verification Successful: The merkle:root matches ({calculated_root}).") + return True + else: + print(f"Verification Failed:") + print(f" - Expected merkle:root: {original_root}") + print(f" - Calculated merkle:root: {calculated_root}") + if discrepancies: + print("Discrepancies found in the following nodes:") + for discrepancy in discrepancies: + print(f" - {discrepancy}") + return False + + except Exception as e: + print(f"Error during verification: {e}") + return False + +def calculate_merkle_root_with_discrepancies(node: Dict[str, Any], discrepancies: List[str]) -> str: + """ + Recursively calculates the Merkle root and records discrepancies. + """ + hash_method = node.get('merkle:hash_method', { + 'function': 'sha256', + 'fields': ['*'], + 'ordering': 'ascending', + 'description': 'Default hash method.' + }) + + # If the node is an Item, its merkle:root is its own merkle:object_hash + if node['type'] == 'Item': + return node['merkle:object_hash'] + + # For Catalogs and Collections, collect child hashes + child_hashes = [] + for child in node.get('children', []): + child_root = calculate_merkle_root_with_discrepancies(child, discrepancies) + if child_root: + child_hashes.append(child_root) + + # Include own merkle:object_hash + own_hash = node.get('merkle:object_hash') + if own_hash: + child_hashes.append(own_hash) + + # Compute the Merkle root from child hashes + calculated_root = compute_merkle_root(child_hashes, hash_method) + + # Compare with the node's merkle:root + original_root = node.get('merkle:root') + if original_root != calculated_root: + discrepancies.append(f"{node['type']} '{node['node_id']}' has mismatched merkle:root.") + + return calculated_root \ No newline at end of file