Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/gnds 2.0 alpha #90

Open
wants to merge 324 commits into
base: master
Choose a base branch
from
Open

Feature/gnds 2.0 alpha #90

wants to merge 324 commits into from

Conversation

staleyLANL
Copy link
Contributor

Includes python interface.

Both Node-to-HDF5 and5 HDF-to-Node (write, and read) now have the ability to deal with original string-form content (as from an original XML file, in GNDStk's Node), and "type-ified" versions thereof. For example, `"1.2 3.4"` is a string, but for HDF5 we can write it as two doubles. Also, importantly, content related to "special" nodes, in particular what we'd see in XML as CDATA, comments, and PCDATA, are now handled in a full, and I believe proper, manner. For both input and output of HDF5.

At some point I'll have to document all the HDF5 capabilities. For now, for anyone who wants to play around with HDF5, I recommend trying different combinations of the HDF5::reduced and HDF5::typed boolean flags; and then using perhaps the `h5dump` tool, or any other good HDF5 tool, for seeing what GNDStk's HDF5 capabilities can produce.

Also: I filled out the main HDF5 test code with tons of new tests. This was definitely needed at this point.

Also...

Changed some HDF5-related function names that are in the detail:: namespace. The earlier names had tried to use capitalization (e.g. HDF5 instead of hdf5) in a manner that was consistent with the non-detail:: API names. For detail stuff, however, certain capitalization and long names felt heavy and painful to the eyes.

Added a couple of tests for constructs that, it turned out, had not been interpreted correctly by the "type guesser" algorithm that the generic Node-to-HDF5 conversion uses.

Did a few things that were unrelated to the main work (HDF5) here, but I was
thinking about them...

Tweaked a couple of autogen files, per some remarks about earlier pull requests.

Figured out how to provide certain *constexpr* `has()` functions for classes
derived from `Component`. We anticipate that these may eventually prove to be
quite useful. Importantly, these are fully in `class Component` itself. Nothing
had to be added to the generated classes. Things like this is why `Component`
is awesome.
FileType::null to FileType::guess
FileType::text to FileType::debug
This is a prerequisite for some upcoming work.
Could replace fileName with filePtr->getName().
Basically, the former was redundant.
Possible in part because of some other recent work.
For example, >> and << (relative to streams and to strings) for Component.
Slightly changed some names and parameters here and there, to make things more consistent.
That's really the right thing to do (until/unless they fix it).
XML, JSON, etc. are now more consistent - no spurious newline in XML.
Updated test codes accordingly.

Fixed some fixmes.

General detail work, with the aim of tightening up the code.

Tweaked some comments.

More Node/Tree consolidation.
We use Node in most places, Tree only when necessary.

Removed some "inline"s when the function was a template anyway. Inline doesn't really mean "inline it" these days (compilers are good at making that decision), and it isn't necessary for header-only if the function is a template.
Rearranged some code.
Tweaked some terminology.
In the generated classes, and the code that works with them, struct content is now struct Content. That is, we just capitalized Content.

This simple change is consistent with NJOY classes generally having capitalized names.

More importantly, perhaps, I'm trying to make the code generator allow for as much flexibility as reasonably possible, given that users might be designing their own data format. (Not just using the code generator to build GNDS version-specific data structures.)

If someone actually writes a code-generator input spec that has a node called "content", it'll work now. :-) Not that we really expect that someone would call anything "content", but it *would* be a completely reasonable node name. By using upper-case terms for what we generate automatically - aside from terminology in someone's spec - people can use lower-case words in their specs and be confident that no conflict will arise.

That's the main thinking behind this simple name change.
Made miscellaneous small improvements in comments, names, etc.

BlockData now has implicit conversion to vector, when warranted.
It can be used where .get() was previously required.

Fixed the performance issue when default-constructing certain generated classes under certain circumstances.

Fixed some outdated (so, wrong) comments and strings in a couple of the test codes.
That fixed a problem with certain initializations being very time-consuming.
Properly split out and clarified "write" versus "print".
Removed some old debug messages.
Changed some terminology.
Trimmed some unused stuff from the KeyTuple (formerly KeywordTup) class.
Fix (I hope) error in g++ compilation.
I'll write much more in an upcoming pull request.
An earlier branch introduced the ability to write HDF5 files in different ways, according to two boolean flags: *reduced* and *typed*.

This branch enhances our JSON capabilities in the same manner, reflecting, with JSON, an analog of the options we have for HDF5.

The above was done in respect to GNDStk's general philosophy of making its support of different file types as consistent with one another as reasonably possible.

Similarly to HDF5, the JSON class now has two static booleans: "reduced" and "typed".

The "reduced" flag tells the JSON writer whether or not to reduce -- basically, to fold into a shorter and simpler representation -- certain special constructs that GNDStk's Node uses internally in order to handle what are called cdata, pcdata, and comment nodes in XML. Basically, if reduced == true, then we make this simplification, which shortens the JSON output. If reduced == false, then the JSON will closely reflect what Node uses internally.

The "typed" flag tells the JSON writer whether or not it should use our type-guessing code to guess whether "strings" in certain contexts (in particular, metadata values and pcdata) actually contain what look like numbers. Examples: "12" looks like an int, "321 476" looks like a vector of ints, "3.14" looks like a double, and "3.14 2.72 1.41" looks like a vector of doubles. If typed == false, we use the original strings, with no type guessing. If typed == true, we apply the type guesser, and, where appropriate, get JSON numbers in place of strings in the JSON output.

This is a work in progress, with a couple of things that still need doing...

Still to do: modify the JSON *reading* code so that it recognizes the various different ways that the JSON *writing* code writes things, and can reliably reverse the writing process and recover a GNDStk Node.

Also still to do: at present, the nlohmann JSON library, which we use under the hood, doesn't provide a way to write JSON numbers (unquoted, as opposed to quoted JSON strings) beginning from an existing string representation that we might provide for the number. Consider this discussion that I started:

     nlohmann/json#3460

This is relevant to us because GNDStk provides very fine control over exactly how numbers, in particular floating-point numbers, are formatted, in terms of the number of significant digits, fixed vs. scientific form, etc. At the time of this writing, if we use typed == true, so that certain strings that look like numbers are written as numbers, the numbers will, unfortunately, be formatted by the JSON library itself. We'd like to have the capability of writing an original string (one that we've already determined looks like a number!) - but writing it as a JSON number (so, without quotes) rather than a JSON string.

In this commit, we also Modified an old Tree/ test that depended on the previous default (but still available, via appropriate flags, even though no longer the default) JSON writing behavior. And, we changed a variable name in json2class.cpp.

We refactored detail-node2hdf5.hpp (for HDF5) considerably, and then reflected this in the modified detail-node2json.hpp (for JSON) that we're primarily working on.

And, we've enhanced the JSON test code so that it tries out the new flags. This is a work in progress as well. In order to finish it, we need to finish the ability to read JSON files that were written not just in our original manner, but with any variations of the above-described flags.
Also some small tweaks and comment changes here and there.
Some tweaks to HDF5-related code.
Extended and improved some comments, to clarify things.
Additional tests for those, too.
Make some constructors and conversion operators explicit.
Use component_t to disambiguate "documentation", which appears in both GNDStk's class Component -- and in some generated classes, because <documentation> is a node in GNDS 2.0 files.
(Without luck, Python tests not up and running again.)
Regenerated GNDS 2.0 alpha classes and Python bindings.
These reflect some improvements in the code generator.
The code generator itself is still undergoing work.
Also did some work with colors, hdf5 format, and miscellaneous other things.
…enerator.

Code generator improvements themselves will be uploaded soon.
Some relate to the generation of Python bindings.

Some small details are changed in terms of C-language interface generation.

Two functions were added, regarding field names. These will eventually be used, in some form, by the Component prettyprinter (which, importantly, also underlies our Python __repr__ for each class.)
Some of this stuff should eventually be deprecated, as it amounted to earlier attempts to implement one or another GNDS standard.
Note that it doesn't yet add shortcuts to generated classes; that'll come soon.
Changed the way a map was done. Faster, and leads to better printing of shortcut information.
Some refactoring related to the above.
Miscellaneous cosmetic and clarity-related changes.
@staleyLANL staleyLANL self-assigned this Feb 27, 2023
Some small printing-related changes and fixes as well.
… offers.

There's lots more to be done in the above respect; this is just a start.

Also, provided some preliminary support for another approach to customization.

The original (and still in place) "customization" system involves inserting code from custom.hpp files directly into generated classes. Inserting additional bits and pieces of code into already-existing classes is actually a somewhat goofy concept. (Imagine someone customizing, say, std::complex by adding a new piece of information. They really should derive from it, or contain it in another class. Not try to mess with the existing class.)

I'm fleshing out *one* new way, at least (with possibly others to come) for doing customizations. There's more to be done, but I'll work with a particular user right now, to see how this works.

JSON specs now allow metadata to have converters. It turns out that this will help with the new customization business. Briefly: we want an ability to insert our own fields into classes (say, to contain data that we'll compute as functions of other fields), but without those new fields playing a role in I/O from/to files. A new (and simple) "Noop" (no-op[eration]) class fills the necessary role of representing the "no conversion is taking place" for I/O on the new fields. Some new if-constexprs, elsewhere in the code, allow convert() functions to return a bool (but they can still return void, as before). If a convert() returns bool, and is used in the context of Node add() metadatum or child node, then a return value of false means, "don't actually add the new metadatum or children." This is all in support of our new customization approach, because we don't necessarily want auxiliary computed fields to be written to files. (But still want them in the multi-query, so they work with print() and such.)

Tweaked some SFINAE. This detail:: stuff really ought to be made more internally consistent, even though it's in detail::. Note: the _v and _t business is modeled after what's in std::. (Examples: enable_if_t, is_convertible_v.)

Adjusted some comments to reflect recent changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants