Feature/better hdf5 #83

staleyLANL · 2022-06-03T08:04:29Z

No description provided.

Some files aren't finished yet, and will be uploaded in a few days. So, this particular update won't build!! Added broad infrastructure for HDF5 handling... ...Wrapper class ...Assignment operators ...convert() functions ...Constructors ...read() functions ...write() functions Etc. Completed many (not all) HDF5 capabilities. Very basic HDF5 write needs to be smarter (not string-only). HDF5 reading still needs work. Touchups are needed here and there throughout the new HDF5 code. In particular, we need smart, non-string handling. Small tweaks were made to XML and JSON material, for consistency. Some comment changes and other odds and ends.

…ature/hdf5

To be used for non-GNDS products, e.g. NDI3. When ProjectDir is given, namespaces and various other constructs are done differently. Made miscellaneous small improvements in various other files. For example: direct read() and write() in Component, aligning with Node's. So: autogenerated classes have those read()s and write(); there's no need for a user to explicitly go through Node.

Won't compile right now, but wanted to get this in.

…ature/hdf5

For [optional] vector child-node fields, there's a setter that takes an element to push_back. If the optional has no value, it's given one first. Experience showed that something like this would be helpful. Updated code generator accordingly. Also, combined the default and "from fields" constructors in generated classes. This makes the generated classes shorter, and also, based on our experience with a prototype data format, proves to make object construction simpler. A generated class might have, for instance, a few metadata, followed by one or more vector or optional-vector child nodes. It's convenient for someone to initialize the object with metadata, and then to add vector elements individually later. This change helps make this process simpler and shorter. I also implemented another feature that experience with a prototype autogenerated set of classes suggested would be helpful. Before, getters with index or label looked for *metadata* named "index" or "label". Now, if no metadatum called "index" exists in the type of the vector's element (or any of its alternatives, if it's a variant), then index is interpreted to be a regular vector index, as in the "[]" operator. Note that someone who uses the code generator must be aware that the interpretation of (index) getters therefore depends fundamentally on whether or not the relevant vector element type has, or doesn't have, an "index" metadatum. This change means we essentially get an additional feature for free, as (index) previously would just fail unless an "index" metadatum were found. Removed a "GNDS"-centric comment in Component prettyprinted output. The comment probably wasn't particularly informative anyway. Updated test codes to reflect this change.

…ature/hdf5

…an if-constexpr.

Regenerated the prototype codes. The changes to these also reflect other work I'd done recently on the code generator, as they hadn't been regenerated for a while.

That is to say, classes that have no metadata or child nodes, but still have data, e.g. with <foo>1.2 3.4 5.6</foo>. OCD'd a cmake-related file. Some more work regarding HDF5 format. Still not entirely complete. I think ".hdf"/".HDF" aren't used as HDF5 file extensions. They need a '5'. Made some diagnostic-message terminology more consistent.

Isn't very smart for output; just keeps everything as strings. That will be remedied at a later time. Was tested for functionality on entire set of GNDS v1.9 files. All converted. ...original total XML size: 2.9G ...new total HDF5 size: 21G So, a little over a factor of 7 larger. :-/ Miscellaneous other code changes were put in place during this work. Made a few diagnostic messages, and other minor constructs here and there, more consistent with one another. Added a convert() that proved to be needed for a certain disambiguation. This was noticed while working on some HDF5 input capabilities.

The former was taken from GNDS, but the latter is much more descriptive.

This is the first of a few simplifications I've been planning to do with the code base. Having `Child<>` objects contain a flag, indicating "is a node of this name allowable as a top-level GNDS node," originally seemed like a good idea. With our later decision to autogenerate GNDS Standard Interfaces from GNDS specs, having such a capability in the Core Interface really isn't necessary. The Core Interface can be safely generic, uncluttered with the capability of checking such a thing. Users will prefer our GNDS Standard Interfaces, which, having been generated from GNDS specifications, will be designed so that GNDS hierarchies will have the correct structure. Our Code Generator is also being used to design another (non-GNDS) library, and more such uses may follow. Checking if a root node has a valid *GNDS* name clearly wouldn't be wanted for other libraries. For this reason, we'd already switched off the check by default. At this point, however, we don't think that having even the ability to check such a thing, in the Core Interface, is worth the extra clutter that it creates in the code base.

The removed material was for a very old capability that was intended to help users make Meta<> and Child<> objects. I hadn't given it much in the way of capabilities yet, but then it was all superseded by the ability to easily manipulate `Meta<>` and `Child<>` objects, e.g. with constructs such as `int{}/Meta<>(...)` to change the type associated with the `Meta<>` object. The old keyword builder stuff never got much use, didn't have much to offer, and simply isn't needed any longer -- better capabilities exist, and in fact have existed for a long time. I just hadn't gotten around to removing the old code. Removing it means a bit less code, one less executable to build in the test suite, and nothing really lost.

Gets rid of the annoying compiler warning about `tmpnam()`. The new formulation required a number of changes in the logic here and there.

Made certain HDF5-related operations more efficient.

A while back, I renamed class BodyText to class BlockData. It turns out that the "body [text]" terminology still appeared here and there. This PR basically completes the terminology change. I renamed relevant constructs, and removed "body" terminology at least in regards to block data.

That fixed a problem with certain initializations being very time-consuming. Properly split out and clarified "write" versus "print". Removed some old debug messages. Changed some terminology. Trimmed some unused stuff from the KeyTuple (formerly KeywordTup) class.

Fix (I hope) error in g++ compilation.

I'll write much more in an upcoming pull request.

An earlier branch introduced the ability to write HDF5 files in different ways, according to two boolean flags: *reduced* and *typed*. This branch enhances our JSON capabilities in the same manner, reflecting, with JSON, an analog of the options we have for HDF5. The above was done in respect to GNDStk's general philosophy of making its support of different file types as consistent with one another as reasonably possible. Similarly to HDF5, the JSON class now has two static booleans: "reduced" and "typed". The "reduced" flag tells the JSON writer whether or not to reduce -- basically, to fold into a shorter and simpler representation -- certain special constructs that GNDStk's Node uses internally in order to handle what are called cdata, pcdata, and comment nodes in XML. Basically, if reduced == true, then we make this simplification, which shortens the JSON output. If reduced == false, then the JSON will closely reflect what Node uses internally. The "typed" flag tells the JSON writer whether or not it should use our type-guessing code to guess whether "strings" in certain contexts (in particular, metadata values and pcdata) actually contain what look like numbers. Examples: "12" looks like an int, "321 476" looks like a vector of ints, "3.14" looks like a double, and "3.14 2.72 1.41" looks like a vector of doubles. If typed == false, we use the original strings, with no type guessing. If typed == true, we apply the type guesser, and, where appropriate, get JSON numbers in place of strings in the JSON output. This is a work in progress, with a couple of things that still need doing... Still to do: modify the JSON *reading* code so that it recognizes the various different ways that the JSON *writing* code writes things, and can reliably reverse the writing process and recover a GNDStk Node. Also still to do: at present, the nlohmann JSON library, which we use under the hood, doesn't provide a way to write JSON numbers (unquoted, as opposed to quoted JSON strings) beginning from an existing string representation that we might provide for the number. Consider this discussion that I started: nlohmann/json#3460 This is relevant to us because GNDStk provides very fine control over exactly how numbers, in particular floating-point numbers, are formatted, in terms of the number of significant digits, fixed vs. scientific form, etc. At the time of this writing, if we use typed == true, so that certain strings that look like numbers are written as numbers, the numbers will, unfortunately, be formatted by the JSON library itself. We'd like to have the capability of writing an original string (one that we've already determined looks like a number!) - but writing it as a JSON number (so, without quotes) rather than a JSON string. In this commit, we also Modified an old Tree/ test that depended on the previous default (but still available, via appropriate flags, even though no longer the default) JSON writing behavior. And, we changed a variable name in json2class.cpp. We refactored detail-node2hdf5.hpp (for HDF5) considerably, and then reflected this in the modified detail-node2json.hpp (for JSON) that we're primarily working on. And, we've enhanced the JSON test code so that it tries out the new flags. This is a work in progress as well. In order to finish it, we need to finish the ability to read JSON files that were written not just in our original manner, but with any variations of the above-described flags.

Also some small tweaks and comment changes here and there.

Small improvements.

…into feature/better-json

Some tweaks to HDF5-related code. Extended and improved some comments, to clarify things.

Additional tests for those, too.

There are still some instances of the strings in question, typically appearing amongst other text inside of string literals. At some point I'll look into fixing up those cases too, but that's not a priority right now.

Updated the HDF5 test suite to ensure that we test the modifications.

staleyLANL and others added 30 commits November 1, 2021 22:19

A few build-system preliminaries.

50ee3ee

Updatong cmake dependencies

e4d56ea

Updating CI to get hdf5 installed

fc0c91e

Removing compiler definition

90c38c8

Updating to develop

396e07e

Merge branch 'feature/hdf5' of https://github.com/njoy/GNDStk into fe…

2a148d6

…ature/hdf5

Lots of not-yet-finished work on an HDF5 read/write capability.

5d8653b

Won't compile right now, but wanted to get this in.

comment

31a4b65

fix

44c71db

Moving something around.

20fb5a2

Addition to .gitignore.

769916a

Merge branch 'feature/hdf5' of https://github.com/njoy/GNDStk into fe…

e683ce7

…ature/hdf5

Merge branch 'feature/hdf5' of https://github.com/njoy/GNDStk into fe…

7179380

…ature/hdf5

Suppress potential warning due to unused parameters in one branch of …

cdff3fc

…an if-constexpr.

Small change to fix issue Tom found with a particular compiler.

d58d5be

Some work on the code generator, mostly to simplify some things.

06e3297

Regenerated the prototype codes. The changes to these also reflect other work I'd done recently on the code generator, as they hadn't been regenerated for a while.

Regenerated prototype classes.

f944930

More work on HDF5.

d111a40

Name change: BodyText to BlockData.

3e70eed

The former was taken from GNDS, but the latter is much more descriptive.

Reformulation of HDF5-related temporary file handling.

b1fb7fc

Gets rid of the annoying compiler warning about `tmpnam()`. The new formulation required a number of changes in the logic here and there.

Cleaned-up and improved logic related to HDF5 handling.

f4c8620

Made certain HDF5-related operations more efficient.

Name change.

04df13e

staleyLANL added 19 commits March 29, 2022 13:45

Fix some terminology.

90942cc

Fix (I hope) error in g++ compilation.

Lots of miscellaneous changes.

5e5141a

I'll write much more in an upcoming pull request.

Updating autogenerated files.

1cf8d87

Adjust some terminology, fix some fixmes.

56ea4c0

Simplified some Component stuff.

8cc0cb3

Some simplifications.

e29e55a

Uploading some unfinished JSON-related work-in-progress.

50c42c6

Also some small tweaks and comment changes here and there.

Added some vetter JSON files, to be used in the test suite.

c36ceff

Small improvements.

Merge branch 'feature/better-json' of https://github.com/njoy/GNDStk …

b6f942d

…into feature/better-json

More work on some JSON-related code.

faf2e67

Some tweaks to HDF5-related code. Extended and improved some comments, to clarify things.

Some extra material in resources for the JSON tests.

86d5527

Saving some work.

0f346b7

More updates to the work on JSON improvements.

3ee65d7

Additional tests for those, too.

More JSON code updates.

91da1bf

Finishing up some JSON work.

7669c83

Put most instances of certain reappearing strings into constants.

5caca0f

There are still some instances of the strings in question, typically appearing amongst other text inside of string literals. At some point I'll look into fixing up those cases too, but that's not a priority right now.

Small but worthwhile improvements in the HDF5 files GNDStk creates.

7f84bbd

Updated the HDF5 test suite to ensure that we test the modifications.

staleyLANL requested a review from whaeck June 3, 2022 08:04

staleyLANL self-assigned this Jun 3, 2022

whaeck added 8 commits June 30, 2022 08:00

Updating ...

16fc7b7

Updating ...

a951980

Updating ...

4e04868

Updating ...

6b20355

Updating ...

4755fd3

Updating ...

15c0d7a

Updating

2172b83

Updating ...

d24ceb8

whaeck changed the base branch from master to develop June 30, 2022 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/better hdf5 #83

Feature/better hdf5 #83

staleyLANL commented Jun 3, 2022

Feature/better hdf5 #83

Are you sure you want to change the base?

Feature/better hdf5 #83

Conversation

staleyLANL commented Jun 3, 2022