WIP: User doc on packaging and plugins. Drafts #90

mbeckerle · 2022-06-02T21:58:25Z

First draft of pages about packaging dfdl schemas, and about plugins.

All applications using Daffodil, at least via Runtime1 should, ideally, be taking advantage of packaging DFDL schemas in Jar files, using 'sbt publish' and managed-dependencies for inter-schema dependencies, etc.

stevedlawrence · 2022-06-03T14:43:38Z

site/extensions.adoc

+// You will want to change plugin settings to enable diagrams (they're off by default.)
+//
+// You need to view this page with Chrome or Firefox.
+//


Do we need this blurb? All our asciidoc is rendered to HTML, no one should need an extension to view this.

Also, it looks like all of this content could just be down with markdown, there's nothing adoc graphs or anything complicated.

I was reserving the right to add the diagrams in the future. I do think they will be needed.

The boilerplate blurb can be deleted.

However, for those who like to write asciidoc using vi, and a browser, having the asciidoctor plugin in the browser allows using those two tools only, instead of an IDE, yet being able to mostly see the page in rendered form in real time, diagrams and all.

Instructions on that should be centralized elsewhere however, not in every asciidoc file. We have a page about asciidoc and a one-line reference to that shoudl be enough.

stevedlawrence · 2022-06-03T14:44:45Z

site/extensions.adoc

+:page-layout: page
+:url-asciidoctor: http://asciidoctor.org
+:keywords: plugins layering UDF charset
+// ///////////////////////////////////////////////////////////////////////////


Is there a way with adoc pages to specify the page title? That way it's added to the tab title and to the top of the page.

This I need to explore. I am hoping this is possible. I would like these pages to look more seamlessly like the rest of the pages. We may have to create a specific asciidoc stylesheet that achieves this however.

stevedlawrence · 2022-06-03T14:45:45Z

site/extensions.adoc

+//
+// //////////////////////////////////////////////////////////////////////////
+
+= DFDL Language Extensions in Daffodil


We already have a https://daffodil.apache.org/dfdl-extensions, should this content be merged with that page?

Yes. Not sure why I didn't see that.

That other page talks about DFDL extensions while this page talks about extensions to Daffodil schemas (slightly more general). Probably better to merge that page into this page than the other way around.

stevedlawrence · 2022-06-03T14:49:03Z

site/extensions.adoc

+- Layering Transformer (e.g., unzip/zip, verify/recompute checksums)
+- User Defined Function (UDF) (e.g., convert mean-sea-level elevation to height-above-ellipsoid)
+
+There is one additional kind of plugin that will be supported by Daffodil 3.4.0


We'll need to remember to update this page when we release 3.4.0.

Yes. I might comment this out for now.

Daffodil 3.4,0 has been released so you can update this PR accordingly.

stevedlawrence · 2022-06-03T14:51:23Z

site/extensions.adoc

+
+Configuring an application must put these jar files on the CLASSPATH so that the executing instance of Daffodil for a specific configured data processing flow finds them on the class path for the data format(s) that flow is processing.
+
+For greater assurance/trust, the plugin jars could be digitally signed by their creators, and applications could verify these signatures (using public keys) as a startup condition.


This all seems reasonable. Do you imagine additional pages will be created for and linked to that describe the different plugins in more detail?

Yes.

Short term these might lead off to existing wiki pages on confluence, this page being more of just an index to start with. But eventually yes real doc. Plugins may have javadoc/scaladoc to reference as well.

I also want to refer to java code examples (currently on OpenDFDL, probably need to come back to an Apache Daffodil examples repo) so people can see the managed dependencies between schemas working by all the schemas transitively showing up in lib_managed directory, etc.

stevedlawrence · 2022-06-03T14:53:35Z

site/packagingSchemas.adoc

+=== Advance Summary
+
+- The best way to use DFDL schemas is accessing them from Jar files
+- Include pre-compiled binary DFDL schema files also in the same Jar file.


Should we suggest this? The main argument against this is that pre-compiled binaries are daffodil version specific. So if you update Daffodil you also need to rebuild the jars.

Perhaps not. One of the reasons I started writing this was to get to exactly this sort of issue.

The best practice is likely that build processes should take the signed daffodil schema jar, and compile the schema to binary for that application, using the right version of daffodil.

I am no longer hopeful that DFDL schema compilation time for Daffodil will ever be so fast as to be negligible. It needs to be fast enough for schema developers to tolerate, but a 300 file DFDL schema is never going to schema compile in under a second. So there will be an ongoing need for pre-compiled DFDL schemas.

Note that some of the plugins, layering transforms at least, possibly other plugins, are not supported by SAPI/JAPI discipline as yet, so until those APIs are finalized, the APIs plugins depend on could tie them to specific Daffodil versions as well. So users may have no choice but to rebuild those as well using the specific deployment Daffodil version.

That's part of why I was suggesting putting the compiled DFDL schema into the jar as well, because the compiled scala/java for plugins would naturally go in that jar, and that's also Daffodil version specific.

We still don't have a trivial way for sbt to do that though. I'd like the basic build.sbt we create with a DFDL schema to contain a 'sbt dfdlCompile' command that creates the binary using the required version of daffodil, and perhaps creates the whole combined jar file.

stevedlawrence · 2022-06-03T14:55:04Z

site/packagingSchemas.adoc

+
+The organization of the files into these directory structures is not arbitrary.
+It can be needed to avoid file name clashes and serves the same role as the Java package-name directory structure does for Java programs.
+The directory hierarchy defines a Java package-like namespace structure for DFDL schemas.


Aren't we trying to move away from this hierarchy?

Well, this is tricky.

The reason to flatten the hierarchy was primarily for training - people working from command-line tools were struggling with the file tree depth for simple examples. Too much typing of ' cd src/test/org/foo/bar/baz' to get to source then the opposed path to get back to src/main/... etc. Really that's why we did the flattening.

For larger schemas I think the hierarchy is still needed. But...

Daffodil has it's classpath-oriented resolver so that schemas can reference includes/imports from places found via classpath search.

But if a DFDL schema is used as an XSD, to validate XML separately from the parse, and this is done by some other XML tool, then that tool may need to use a similar classpath-oriented resolver to find the various pieces of the schema, based on the schemaLocation attributes in the DFDL schema import/include statements. That is if they want to directly pull the schema files from jars.

For Java-based XML tools we could perhaps specify they use exactly the Daffodil resolver.

(There's actually some centralized Apache resolver project (part of XML commons?) intended to address exactly this weak spot in the w3c XML specs about how schema locations work. Possibly what we're doing in the Daffodil resolver with classpath search should become a part of that effort, or the relationship of the two things should be explored anyway.)

But for other technology bases than Java, well XML Catalogs are also a possibility, but we don't really test that, and there's even a JIRA ticket suggesting we deprecate catalog support entirely.

Hence, we need the "un-jar" technique to also work: "un-zip all the jars on top of the same directory tree", knowing that the directory structure insures no name collisions. I believe if the XML processing always resolves schemaLocations relative to the root of that tree, then all schema files would be resolvable. This is what we hope to be the lowest-common-denominator for schemaLocation resolvers.

I think we need to document this un-jar technique.

But as you know it gets worse because in some applications, people have to break up the set of files further. E.g., in cybersecurity people want each data flow to have exactly and only the DFDL schema files it needs as part of that flow's configurations, and commonly a particular flow allows only a subset of all the things in the format.

For now I believe this reorganization of a schema also has to be done by hand. But we need to document how, and then if automated tools that chase the include/imports can be created so as to help automate this, that would be an improvement.

I don't like the notion of the DFDL schema having to cater to the needs of the application too much. It makes reuse of the schema harder for other applications such as data integration.

stevedlawrence · 2022-06-03T14:57:41Z

site/packagingSchemas.adoc

+//
+// //////////////////////////////////////////////////////////////////////////
+
+= Packaging DFDL Schemas for use in Daffodil Applications


I wonder if this page wants to be merged into the standard project layout? There's seems to be quite a bit of overlap?

Also, a normal markdown file seems reasonable. This doesn't use any adoc features.

Agreed.

Though I expect this to have a nested box diagram showing [header1[header2[payload]]] type compositions of schemas.

tuxji

+1

Just some minor comments.

tuxji · 2022-06-05T23:18:27Z

site/extensions.adoc

+//
+// //////////////////////////////////////////////////////////////////////////
+
+= DFDL Language Extensions in Daffodil


That other page talks about DFDL extensions while this page talks about extensions to Daffodil schemas (slightly more general). Probably better to merge that page into this page than the other way around.

tuxji · 2022-06-05T23:20:54Z

site/extensions.adoc

+
+To provide some new advanced format capabilities such as checksums, compressed or encoded data regions, and user-defined-functions, DFDL schemas sometimes must use Daffodil-specific extensions and incorporate Daffodil plugins that provide the small algorithmic aspects needed by these formats.
+
+There are 2 kinds of plugins today supported by Daffodil 3.3.0


Consider ending both this sentence and the following sentence at line 44 with a colon as punctuation.

tuxji · 2022-06-05T23:24:37Z

site/extensions.adoc

+Different DFDL schemas for different kinds of data will need their own such plugins.
+Hence the plugins, like the DFDL schema files themselves, are used in applications as part of a specific data-processing flow.
+
+Keeping in the spirit of DFDL in describing a format declaratively, plugins need to be very small pieces of code (ex: a character set definition should be 10 lines of code.)


I think it's fine to break a sentence across multiple lines as long as the next sentence begins on a new line (not immediately after the period punctuating the previous sentence). We break long lines of code, so we should break long sentences as well.

mbeckerle · 2023-03-01T18:18:36Z

Replaced by PR #108

stevedlawrence reviewed Jun 3, 2022

View reviewed changes

tuxji approved these changes Jun 5, 2022

View reviewed changes

mbeckerle force-pushed the main branch 2 times, most recently from 6ac4d06 to f0e4efa Compare September 15, 2022 17:01

mbeckerle force-pushed the main branch 2 times, most recently from 81b14e0 to 0c0fd49 Compare November 4, 2022 16:23

Add Mike McGann to PMC contributors

353905c

mbeckerle force-pushed the main branch from 0c0fd49 to 353905c Compare March 1, 2023 17:31

mbeckerle mentioned this pull request Mar 1, 2023

WIP: User doc on packaging and plugins. Drafts #108

Open

mbeckerle closed this Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: User doc on packaging and plugins. Drafts #90

WIP: User doc on packaging and plugins. Drafts #90

mbeckerle commented Jun 2, 2022

stevedlawrence Jun 3, 2022

mbeckerle Jun 3, 2022

stevedlawrence Jun 3, 2022

mbeckerle Jun 3, 2022

stevedlawrence Jun 3, 2022

mbeckerle Jun 3, 2022

tuxji Jun 5, 2022

stevedlawrence Jun 3, 2022

mbeckerle Jun 3, 2022

tuxji Nov 12, 2022

stevedlawrence Jun 3, 2022

mbeckerle Jun 3, 2022

stevedlawrence Jun 3, 2022

mbeckerle Jun 3, 2022 •

edited

Loading

stevedlawrence Jun 3, 2022

mbeckerle Jun 3, 2022

stevedlawrence Jun 3, 2022

mbeckerle Jun 3, 2022

tuxji left a comment

tuxji Jun 5, 2022

tuxji Jun 5, 2022

tuxji Jun 5, 2022

mbeckerle commented Mar 1, 2023


		Configuring an application must put these jar files on the CLASSPATH so that the executing instance of Daffodil for a specific configured data processing flow finds them on the class path for the data format(s) that flow is processing.

		For greater assurance/trust, the plugin jars could be digitally signed by their creators, and applications could verify these signatures (using public keys) as a startup condition.


		To provide some new advanced format capabilities such as checksums, compressed or encoded data regions, and user-defined-functions, DFDL schemas sometimes must use Daffodil-specific extensions and incorporate Daffodil plugins that provide the small algorithmic aspects needed by these formats.

		There are 2 kinds of plugins today supported by Daffodil 3.3.0

WIP: User doc on packaging and plugins. Drafts #90

WIP: User doc on packaging and plugins. Drafts #90

Conversation

mbeckerle commented Jun 2, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbeckerle Jun 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tuxji left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbeckerle commented Mar 1, 2023

mbeckerle Jun 3, 2022 •

edited

Loading