JSON/TOML backend: introduce abbreviated IO modes #1493

franzpoeschel · 2023-08-04T13:00:56Z

Factored out of #1277, based on #1436

Introduces two options:

json.dataset.mode = "template" (default: = "dataset"), write just the dataset extent and not the actual content
json.attribute.mode = "short" (default: = "long"), write an attribute as {"software": "openPMD-api"} instead of {"software": {"value": "openPMD-api", "type": "STRING"}}

TODO:

Documentation
More testing: openpmd-pipe, erroring IO workflows – fixed, but add test
Merge TOML Backend #1436 first
In TOML: Enable abbreviated modes by default: Done, but document this
Maybe add an option to skip the 'Trying to write data to a template dataset. Will skip.' warning

Diff: https://github.com/franzpoeschel/openPMD-api/compare/parallel-json..topic-json-short-modes

src/IO/JSON/JSONIOHandlerImpl.cpp

+        switch (j.type())
+        {
+        case nlohmann::json::value_t::null:
+            throw std::runtime_error(
+                "[JSON backend] Attribute must not be null: '" +
+                nameForErrorMessages + "'.");
+        case nlohmann::json::value_t::object:
+            throw std::runtime_error(
+                "[JSON backend] Shorthand-style attribute must not be an "
+                "object: '" +
+                nameForErrorMessages + "'.");
+        case nlohmann::json::value_t::array:
+            if (j.empty())
+            {
+                std::cerr << "Cannot recover datatype of empty vector without "
+                             "explicit type annotation for attribute '"
+                          << nameForErrorMessages
+                          << "'. Will continue with VEC_INT datatype."
+                          << std::endl;
+                return std::vector<int>{};
+            }
+            else
+            {
+                auto valueType = j[0].type();
+                /*
+                 * If the vector is of numeric type, it might happen that the
+                 * first entry is an integer, but a later entry is a float.
+                 * We need to pick the most generic datatype in that case.
+                 */
+                if (valueType == nlohmann::json::value_t::number_float ||
+                    valueType == nlohmann::json::value_t::number_unsigned ||
+                    valueType == nlohmann::json::value_t::number_integer)
+                {
+                    valueType = unifyNumericType(j);
+                }
+                switch (valueType)
+                {
+                case nlohmann::json::value_t::null:
+                    throw std::runtime_error(
+                        "[JSON backend] Attribute must not be null: '" +
+                        nameForErrorMessages + "'.");
+                case nlohmann::json::value_t::object:
+                    throw std::runtime_error(
+                        "[JSON backend] Invalid contained datatype (object) "
+                        "inside vector-type attribute: '" +
+                        nameForErrorMessages + "'.");
+                case nlohmann::json::value_t::array:
+                    throw std::runtime_error(
+                        "[JSON backend] Invalid contained datatype (array) "
+                        "inside vector-type attribute: '" +
+                        nameForErrorMessages + "'.");
+                case nlohmann::json::value_t::string:
+                    return recoverVectorAttributeFromJson<std::string>(j);
+                case nlohmann::json::value_t::boolean:
+                    throw std::runtime_error(
+                        "[JSON backend] Attribute must not be vector of bool: "
+                        "'" +
+                        nameForErrorMessages + "'.");
+                case nlohmann::json::value_t::number_integer:
+                    return recoverVectorAttributeFromJson<
+                        nlohmann::json::number_integer_t>(j);
+                case nlohmann::json::value_t::number_unsigned:
+                    return recoverVectorAttributeFromJson<
+                        nlohmann::json::number_unsigned_t>(j);
+                case nlohmann::json::value_t::number_float:
+                    return recoverVectorAttributeFromJson<
+                        nlohmann::json::number_float_t>(j);
+                case nlohmann::json::value_t::binary:
+                    throw std::runtime_error(
+                        "[JSON backend] Attribute must not have binary type: "
+                        "'" +
+                        nameForErrorMessages + "'.");
+                case nlohmann::json::value_t::discarded:
+                    throw std::runtime_error(
+                        "Internal JSON parser datatype leaked into JSON "
+                        "value.");
+                }
+                throw std::runtime_error("Unreachable!");
+            }
+        case nlohmann::json::value_t::string:
+            return j.get<std::string>();
+        case nlohmann::json::value_t::boolean:
+            return j.get<bool>();
+        case nlohmann::json::value_t::number_integer:
+            return j.get<nlohmann::json::number_integer_t>();
+        case nlohmann::json::value_t::number_unsigned:
+            return j.get<nlohmann::json::number_unsigned_t>();
+        case nlohmann::json::value_t::number_float:
+            return j.get<nlohmann::json::number_float_t>();
+        case nlohmann::json::value_t::binary:
+            throw std::runtime_error(
+                "[JSON backend] Attribute must not have binary type: '" +
+                nameForErrorMessages + "'.");
+        case nlohmann::json::value_t::discarded:
+            throw std::runtime_error(
+                "Internal JSON parser datatype leaked into JSON value.");
+        }


test/SerialIOTest.cpp

src/RecordComponent.cpp

examples/14_toml_template.cpp

+    // std::string config = R"(
+    // {
+    //   "iteration_encoding": "variable_based",
+    //   "toml": {
+    //     "dataset": {"mode": "template"},
+    //     "attribute": {"mode": "short"}
+    //   }
+    // }
+    // )";


src/IO/JSON/JSONIOHandlerImpl.cpp

franzpoeschel · 2023-10-11T08:06:07Z

With this, the JSON backend is now sensibly usable from codes such as PIConGPU:

{
  "__openPMD_internal": {
    "attribute_mode": "short",
    "dataset_mode": "template"
  },
  "attributes": {
    "basePath": "/data/%T/",
    "date": "2023-10-11 09:57:53 +0200",
    "iterationEncoding": "fileBased",
    "iterationFormat": "simData_%06T",
    "meshesPath": "fields/",
    "openPMD": "1.1.0",
    "openPMDextension": 0,
    "particlesPath": "particles/",
    "picongpuIOVersionMajor": 2,
    "picongpuIOVersionMinor": 0,
    "software": "PIConGPU",
    "softwareVersion": "0.7.0-dev"
  },
  "data": {
    "100": {
      "attributes": {
        "cell_depth": 4.252342224121094,
        "cell_height": 1.0630855560302734,
        "cell_width": 4.252342224121094,
        "dt": 1,
        "eps0": 169.19711303710938,
        "iteration": 100,
        "mue0": 0.005910266190767288,
        "particleBoundary": [
          "absorbing",
          "absorbing",
          "absorbing",
          "absorbing",
          "absorbing",
          "absorbing"
        ],
        "particleBoundaryParameters": [
          "without field correction",
          "without field correction",
          "without field correction",
          "without field correction",
          "without field correction",
          "without field correction"
        ],
        "sim_slides": 0,
        "time": 100,
        "timeUnitSI": 1.39e-16,
        "unit_bfield": 40903.82224060171,
        "unit_charge": 1.1143237516482563e-15,
        "unit_efield": 12262657411105.05,
        "unit_energy": 5.694183860145225e-10,
        "unit_length": 4.1671151662e-08,
        "unit_mass": 6.335633991170193e-27,
        "unit_speed": 299792458,
        "unit_time": 1.39e-16
      },
      "fields": {
        "B": {
          "attributes": {
            "axisLabels": [
              "z",
              "y",
              "x"
            ],
            "dataOrder": "C",
            "fieldSmoothing": "none",
            "geometry": "cartesian",
            "gridGlobalOffset": [
              0,
              0,
              0
            ],
            "gridSpacing": [
              4.252342224121094,
              1.0630855560302734,
              4.252342224121094
            ],
            "gridUnitSI": 4.1671151662e-08,
            "timeOffset": 0,
            "unitDimension": [
              0,
              1,
              -2,
              -1,
              0,
              0,
              0
            ]
          },
          "x": {
            "attributes": {
              "position": [
                0,
                0.5,
                0.5
              ],
              "unitSI": 40903.82224060171
            },
            "datatype": "FLOAT",
            "extent": [
              32,
              32,
              32
            ]
          },
          "y": {
            "attributes": {
              "position": [
                0.5,
                0,
                0.5
              ],
              "unitSI": 40903.82224060171
            },
            "datatype": "FLOAT",
            "extent": [
              32,
              32,
              32
            ]
          },
          "z": {
            "attributes": {
              "position": [
                0.5,
                0.5,
                0
              ],
              "unitSI": 40903.82224060171
            },
            "datatype": "FLOAT",
            "extent": [
              32,
              32,
              32
            ]
          }
        },
        "E": {
          "attributes": {
            "axisLabels": [
              "z",
              "y",
              "x"
            ],
            "dataOrder": "C",
            "fieldSmoothing": "none",
            "geometry": "cartesian",
            "gridGlobalOffset": [
              0,
              0,
              0
            ],
            "gridSpacing": [
              4.252342224121094,
              1.0630855560302734,
              4.252342224121094
            ],
            "gridUnitSI": 4.1671151662e-08,
            "timeOffset": 0,
            "unitDimension": [
              1,
              1,
              -3,
              -1,
              0,
              0,
              0
            ]
          },
          "x": {
            "attributes": {
              "position": [
                0.5,
                0,
                0
              ],
              "unitSI": 12262657411105.05
            },
            "datatype": "FLOAT",
            "extent": [
              32,
              32,
              32
            ]
          },
          "y": {
            "attributes": {
              "position": [
                0,
                0.5,
                0
              ],
              "unitSI": 12262657411105.05
            },
            "datatype": "FLOAT",
            "extent": [
              32,
              32,
              32
            ]
          },
          "z": {
            "attributes": {
              "position": [
                0,
                0,
                0.5
              ],
              "unitSI": 12262657411105.05
            },
            "datatype": "FLOAT",
            "extent": [
              32,
              32,
              32
            ]
          }
        },
        "attributes": {
          "chargeCorrection": "none",
          "currentSmoothing": "none",
          "fieldBoundary": [
            "open",
            "open",
            "open",
            "open",
            "open",
            "open"
          ],
          "fieldBoundaryParameters": [
            "convolutional PML over 12 cells",
            "convolutional PML over 12 cells",
            "convolutional PML over 12 cells",
            "convolutional PML over 12 cells",
            "convolutional PML over 12 cells",
            "convolutional PML over 12 cells"
          ],
          "fieldSolver": "Yee"
        },
        "e_all_chargeDensity": {
          "attributes": {
            "axisLabels": [
              "z",
              "y",
              "x"
            ],
            "dataOrder": "C",
            "fieldSmoothing": "none",
            "geometry": "cartesian",
            "gridGlobalOffset": [
              0,
              0,
              0
            ],
            "gridSpacing": [
              4.252342224121094,
              1.0630855560302734,
              4.252342224121094
            ],
            "gridUnitSI": 4.1671151662e-08,
            "position": [
              0,
              0,
              0
            ],
            "timeOffset": 0,
            "unitDimension": [
              -3,
              0,
              1,
              1,
              0,
              0,
              0
            ],
            "unitSI": 15399438.226078901
          },
          "datatype": "FLOAT",
          "extent": [
            32,
            32,
            32
          ]
        },
        "e_all_energyDensity": {
          "attributes": {
            "axisLabels": [
              "z",
              "y",
              "x"
            ],
            "dataOrder": "C",
            "fieldSmoothing": "none",
            "geometry": "cartesian",
            "gridGlobalOffset": [
              0,
              0,
              0
            ],
            "gridSpacing": [
              4.252342224121094,
              1.0630855560302734,
              4.252342224121094
            ],
            "gridUnitSI": 4.1671151662e-08,
            "position": [
              0,
              0,
              0
            ],
            "timeOffset": 0,
            "unitDimension": [
              -1,
              1,
              -2,
              0,
              0,
              0,
              0
            ],
            "unitSI": 7869098408118.734
          },
          "datatype": "FLOAT",
          "extent": [
            32,
            32,
            32
          ]
        },
        "picongpu_idProvider": {
          "attributes": {
            "axisLabels": [
              "x"
            ],
            "dataOrder": "C",
            "geometry": "cartesian",
            "gridGlobalOffset": [
              0
            ],
            "gridSpacing": [
              1
            ],
            "gridUnitSI": 1,
            "timeOffset": 0,
            "unitDimension": [
              0,
              0,
              0,
              0,
              0,
              0,
              0
            ]
          },
          "nextId": {
            "attributes": {
              "position": [
                0
              ],
              "unitSI": 1
            },
            "datatype": "ULONG",
            "extent": [
              1,
              1,
              1
            ]
          },
          "startId": {
            "attributes": {
              "maxNumProc": 1,
              "position": [
                0
              ],
              "unitSI": 1
            },
            "datatype": "ULONG",
            "extent": [
              1,
              1,
              1
            ]
          }
        }
      },
      "particles": {
        "e": {
          "attributes": {
            "currentDeposition": "Esirkepov",
            "particleInterpolation": "uniform",
            "particlePush": "Boris",
            "particleShape": 2,
            "particleSmoothing": "none"
          },
          "charge": {
            "attributes": {
              "macroWeighted": 0,
              "shape": [
                0
              ],
              "timeOffset": 0,
              "unitDimension": [
                0,
                0,
                1,
                1,
                0,
                0,
                0
              ],
              "unitSI": 1.1143237516482563e-15,
              "value": -0.0001437801111023873,
              "weightingPower": 1
            }
          },
          "mass": {
            "attributes": {
              "macroWeighted": 0,
              "shape": [
                0
              ],
              "timeOffset": 0,
              "unitDimension": [
                0,
                1,
                0,
                0,
                0,
                0,
                0
              ],
              "unitSI": 6.335633991170193e-27,
              "value": 0.0001437801111023873,
              "weightingPower": 1
            }
          },
          "momentum": {
            "attributes": {
              "macroWeighted": 1,
              "timeOffset": 0,
              "unitDimension": [
                1,
                1,
                -1,
                0,
                0,
                0,
                0
              ],
              "weightingPower": 1
            },
            "x": {
              "attributes": {
                "shape": [
                  0
                ],
                "unitSI": 1.8993752872012626e-18,
                "value": 0
              }
            },
            "y": {
              "attributes": {
                "shape": [
                  0
                ],
                "unitSI": 1.8993752872012626e-18,
                "value": 0
              }
            },
            "z": {
              "attributes": {
                "shape": [
                  0
                ],
                "unitSI": 1.8993752872012626e-18,
                "value": 0
              }
            }
          },
          "particlePatches": {
            "extent": {
              "attributes": {
                "unitDimension": [
                  0,
                  0,
                  0,
                  0,
                  0,
                  0,
                  0
                ]
              },
              "x": {
                "attributes": {
                  "unitSI": 1
                },
                "datatype": "ULONG",
                "extent": [
                  1
                ]
              },
              "y": {
                "attributes": {
                  "unitSI": 1
                },
                "datatype": "ULONG",
                "extent": [
                  1
                ]
              },
              "z": {
                "attributes": {
                  "unitSI": 1
                },
                "datatype": "ULONG",
                "extent": [
                  1
                ]
              }
            },
            "numParticles": {
              "attributes": {
                "unitSI": 1
              },
              "datatype": "ULONG",
              "extent": [
                1
              ]
            },
            "numParticlesOffset": {
              "attributes": {
                "unitSI": 1
              },
              "datatype": "ULONG",
              "extent": [
                1
              ]
            },
            "offset": {
              "attributes": {
                "unitDimension": [
                  0,
                  0,
                  0,
                  0,
                  0,
                  0,
                  0
                ]
              },
              "x": {
                "attributes": {
                  "unitSI": 1
                },
                "datatype": "ULONG",
                "extent": [
                  1
                ]
              },
              "y": {
                "attributes": {
                  "unitSI": 1
                },
                "datatype": "ULONG",
                "extent": [
                  1
                ]
              },
              "z": {
                "attributes": {
                  "unitSI": 1
                },
                "datatype": "ULONG",
                "extent": [
                  1
                ]
              }
            }
          },
          "position": {
            "attributes": {
              "macroWeighted": 0,
              "timeOffset": 0,
              "unitDimension": [
                1,
                0,
                0,
                0,
                0,
                0,
                0
              ],
              "weightingPower": 0
            },
            "x": {
              "attributes": {
                "shape": [
                  0
                ],
                "unitSI": 1.7719999774007647e-07,
                "value": 0
              }
            },
            "y": {
              "attributes": {
                "shape": [
                  0
                ],
                "unitSI": 4.429999943501912e-08,
                "value": 0
              }
            },
            "z": {
              "attributes": {
                "shape": [
                  0
                ],
                "unitSI": 1.7719999774007647e-07,
                "value": 0
              }
            }
          },
          "positionOffset": {
            "attributes": {
              "macroWeighted": 0,
              "timeOffset": 0,
              "unitDimension": [
                1,
                0,
                0,
                0,
                0,
                0,
                0
              ],
              "weightingPower": 0
            },
            "x": {
              "attributes": {
                "shape": [
                  0
                ],
                "unitSI": 1.7719999774007647e-07,
                "value": 0
              }
            },
            "y": {
              "attributes": {
                "shape": [
                  0
                ],
                "unitSI": 4.429999943501912e-08,
                "value": 0
              }
            },
            "z": {
              "attributes": {
                "shape": [
                  0
                ],
                "unitSI": 1.7719999774007647e-07,
                "value": 0
              }
            }
          },
          "weighting": {
            "attributes": {
              "macroWeighted": 1,
              "shape": [
                0
              ],
              "timeOffset": 0,
              "unitDimension": [
                0,
                0,
                0,
                0,
                0,
                0,
                0
              ],
              "unitSI": 1,
              "value": 0,
              "weightingPower": 1
            }
          }
        }
      }
    }
  }
}

src/IO/JSON/JSONIOHandlerImpl.cpp

@@ -1269,40 +2009,165 @@
    return (*obtainJsonContents(file))[filePosition->id];
 }

-void JSONIOHandlerImpl::putJsonContents(
+auto JSONIOHandlerImpl::putJsonContents(


test/SerialIOTest.cpp

@@ -1560,12 +1603,13 @@
    }
 };

-inline void write_test(const std::string &backend)
+inline void write_test(


for more information, see https://pre-commit.ci

Reading the chunk table requires NOT using template mode, otherwise the string just consists of '\0' bytes.

for more information, see https://pre-commit.ci

franzpoeschel added the backend: JSON label Aug 4, 2023

franzpoeschel mentioned this pull request Aug 4, 2023

[WIP] Use JSON/TOML template for defining openPMD metadata in a config file #1277

Open

15 tasks

franzpoeschel force-pushed the topic-json-short-modes branch from 737da87 to be7d165 Compare August 4, 2023 13:55

github-advanced-security bot found potential problems Aug 4, 2023

View reviewed changes

franzpoeschel force-pushed the topic-json-short-modes branch 3 times, most recently from 4a969c5 to 7969864 Compare August 10, 2023 09:38

franzpoeschel mentioned this pull request Aug 17, 2023

TOML Backend #1436

Merged

4 tasks

franzpoeschel force-pushed the topic-json-short-modes branch from 7969864 to ef8236f Compare August 18, 2023 13:01

franzpoeschel force-pushed the topic-json-short-modes branch from ef8236f to 92cf227 Compare September 5, 2023 11:51

franzpoeschel force-pushed the topic-json-short-modes branch from 92cf227 to c63d56f Compare September 22, 2023 14:34

github-advanced-security bot found potential problems Sep 22, 2023

View reviewed changes

src/IO/JSON/JSONIOHandlerImpl.cpp Fixed Show fixed Hide fixed

franzpoeschel force-pushed the topic-json-short-modes branch from 8ee5ae8 to 09a70bc Compare October 11, 2023 07:44

franzpoeschel force-pushed the topic-json-short-modes branch 3 times, most recently from c511364 to 544923b Compare October 12, 2023 09:39

franzpoeschel mentioned this pull request Oct 18, 2023

Non-spatial meshes #1534

Merged

4 tasks

franzpoeschel force-pushed the topic-json-short-modes branch 2 times, most recently from 8ca52da to 6464af8 Compare November 20, 2023 16:36

github-advanced-security bot found potential problems Nov 20, 2023

View reviewed changes

franzpoeschel force-pushed the topic-json-short-modes branch 3 times, most recently from baf807e to 914aaaa Compare November 27, 2023 09:54

franzpoeschel force-pushed the topic-json-short-modes branch 3 times, most recently from 9269933 to 047e571 Compare December 22, 2023 18:35

franzpoeschel force-pushed the topic-json-short-modes branch from 047e571 to 7cc629e Compare January 4, 2024 10:57

franzpoeschel force-pushed the topic-json-short-modes branch from 7cc629e to a2cf97a Compare January 23, 2024 16:20

franzpoeschel force-pushed the topic-json-short-modes branch from a2cf97a to 4d68f6d Compare February 5, 2024 10:32

github-advanced-security bot found potential problems Jun 7, 2024

View reviewed changes

test/SerialIOTest.cpp

@@ -1560,12 +1603,13 @@

}

};

inline void write_test(const std::string &backend)

inline void write_test(

Check warning

Code scanning / CodeQL

Poorly documented large function Warning test

Poorly documented function: fewer than 2% comments for a function of 134 lines.

franzpoeschel force-pushed the topic-json-short-modes branch from 2914e8c to 3061d0a Compare June 26, 2024 11:45

franzpoeschel force-pushed the topic-json-short-modes branch from 3061d0a to db18e5e Compare July 16, 2024 14:03

franzpoeschel force-pushed the topic-json-short-modes branch from db18e5e to a4a0771 Compare August 5, 2024 09:38

franzpoeschel force-pushed the topic-json-short-modes branch 2 times, most recently from c1871d5 to 1728894 Compare October 29, 2024 12:05

franzpoeschel and others added 22 commits November 15, 2024 16:11

Introduce dataset template mode to JSON backend

553b2fb

Write used mode to JSON file

5684558

Use Attribute::getOptional for snapshot attribute

bf4f177

Introduce attribute mode

1ad11d5

Add example 14_toml_template.cpp

81aa19a

Use Datatype::UNDEFINED to indicate no dataset definition in template

0943e1b

Extend example

249a9b9

Test short attribute mode

df94349

Copy datatypeToString to JSON implementation

6044ff3

Fix after rebase: Init JSON config in parallel mode

49ac875

Fix after rebase: Don't erase JSON datasets when writing

90cbfa2

openpmd-pipe: use short modes for test

1348e89

Less intrusive warnings, allow disabling them

81479c2

TOML: Use short modes by default

53849d9

[pre-commit.ci] auto fixes from pre-commit.com hooks

1f92eec

for more information, see https://pre-commit.ci

Documentation

d7096a9

Short mode in default in openPMD >= 2.

aeb6154

Short value by default in TOML

b6b27fd

Store the openPMD version information in the IOHandler

e2d7b8e

Fixes

c696be4

Adapt test to recent rebase

bdb29e3

Reading the chunk table requires NOT using template mode, otherwise the string just consists of '\0' bytes.

toml11 4.0 compatibility

1ea3b60

franzpoeschel force-pushed the topic-json-short-modes branch from 1728894 to 1ea3b60 Compare November 15, 2024 15:31

[pre-commit.ci] auto fixes from pre-commit.com hooks

f071271

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON/TOML backend: introduce abbreviated IO modes #1493

JSON/TOML backend: introduce abbreviated IO modes #1493

franzpoeschel commented Aug 4, 2023 •

edited

Loading

franzpoeschel commented Oct 11, 2023

JSON/TOML backend: introduce abbreviated IO modes #1493

Are you sure you want to change the base?

JSON/TOML backend: introduce abbreviated IO modes #1493

Conversation

franzpoeschel commented Aug 4, 2023 • edited Loading

franzpoeschel commented Oct 11, 2023

franzpoeschel commented Aug 4, 2023 •

edited

Loading