Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: User doc on packaging and plugins. Drafts #90

Closed
wants to merge 1 commit into from

Conversation

mbeckerle
Copy link
Contributor

First draft of pages about packaging dfdl schemas, and about plugins.

All applications using Daffodil, at least via Runtime1 should, ideally, be taking advantage of packaging DFDL schemas in Jar files, using 'sbt publish' and managed-dependencies for inter-schema dependencies, etc.

// You will want to change plugin settings to enable diagrams (they're off by default.)
//
// You need to view this page with Chrome or Firefox.
//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this blurb? All our asciidoc is rendered to HTML, no one should need an extension to view this.

Also, it looks like all of this content could just be down with markdown, there's nothing adoc graphs or anything complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reserving the right to add the diagrams in the future. I do think they will be needed.

The boilerplate blurb can be deleted.

However, for those who like to write asciidoc using vi, and a browser, having the asciidoctor plugin in the browser allows using those two tools only, instead of an IDE, yet being able to mostly see the page in rendered form in real time, diagrams and all.

Instructions on that should be centralized elsewhere however, not in every asciidoc file. We have a page about asciidoc and a one-line reference to that shoudl be enough.

:page-layout: page
:url-asciidoctor: http://asciidoctor.org
:keywords: plugins layering UDF charset
// ///////////////////////////////////////////////////////////////////////////
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way with adoc pages to specify the page title? That way it's added to the tab title and to the top of the page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I need to explore. I am hoping this is possible. I would like these pages to look more seamlessly like the rest of the pages. We may have to create a specific asciidoc stylesheet that achieves this however.

//
// //////////////////////////////////////////////////////////////////////////

= DFDL Language Extensions in Daffodil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a https://daffodil.apache.org/dfdl-extensions, should this content be merged with that page?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Not sure why I didn't see that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That other page talks about DFDL extensions while this page talks about extensions to Daffodil schemas (slightly more general). Probably better to merge that page into this page than the other way around.

- Layering Transformer (e.g., unzip/zip, verify/recompute checksums)
- User Defined Function (UDF) (e.g., convert mean-sea-level elevation to height-above-ellipsoid)

There is one additional kind of plugin that will be supported by Daffodil 3.4.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to remember to update this page when we release 3.4.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I might comment this out for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Daffodil 3.4,0 has been released so you can update this PR accordingly.


Configuring an application must put these jar files on the CLASSPATH so that the executing instance of Daffodil for a specific configured data processing flow finds them on the class path for the data format(s) that flow is processing.

For greater assurance/trust, the plugin jars could be digitally signed by their creators, and applications could verify these signatures (using public keys) as a startup condition.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems reasonable. Do you imagine additional pages will be created for and linked to that describe the different plugins in more detail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Short term these might lead off to existing wiki pages on confluence, this page being more of just an index to start with. But eventually yes real doc. Plugins may have javadoc/scaladoc to reference as well.

I also want to refer to java code examples (currently on OpenDFDL, probably need to come back to an Apache Daffodil examples repo) so people can see the managed dependencies between schemas working by all the schemas transitively showing up in lib_managed directory, etc.

=== Advance Summary

- The best way to use DFDL schemas is accessing them from Jar files
- Include pre-compiled binary DFDL schema files also in the same Jar file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we suggest this? The main argument against this is that pre-compiled binaries are daffodil version specific. So if you update Daffodil you also need to rebuild the jars.

Copy link
Contributor Author

@mbeckerle mbeckerle Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not. One of the reasons I started writing this was to get to exactly this sort of issue.

The best practice is likely that build processes should take the signed daffodil schema jar, and compile the schema to binary for that application, using the right version of daffodil.

I am no longer hopeful that DFDL schema compilation time for Daffodil will ever be so fast as to be negligible. It needs to be fast enough for schema developers to tolerate, but a 300 file DFDL schema is never going to schema compile in under a second. So there will be an ongoing need for pre-compiled DFDL schemas.

Note that some of the plugins, layering transforms at least, possibly other plugins, are not supported by SAPI/JAPI discipline as yet, so until those APIs are finalized, the APIs plugins depend on could tie them to specific Daffodil versions as well. So users may have no choice but to rebuild those as well using the specific deployment Daffodil version.

That's part of why I was suggesting putting the compiled DFDL schema into the jar as well, because the compiled scala/java for plugins would naturally go in that jar, and that's also Daffodil version specific.

We still don't have a trivial way for sbt to do that though. I'd like the basic build.sbt we create with a DFDL schema to contain a 'sbt dfdlCompile' command that creates the binary using the required version of daffodil, and perhaps creates the whole combined jar file.


The organization of the files into these directory structures is not arbitrary.
It can be needed to avoid file name clashes and serves the same role as the Java package-name directory structure does for Java programs.
The directory hierarchy defines a Java package-like namespace structure for DFDL schemas.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we trying to move away from this hierarchy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is tricky.

The reason to flatten the hierarchy was primarily for training - people working from command-line tools were struggling with the file tree depth for simple examples. Too much typing of ' cd src/test/org/foo/bar/baz' to get to source then the opposed path to get back to src/main/... etc. Really that's why we did the flattening.

For larger schemas I think the hierarchy is still needed. But...

Daffodil has it's classpath-oriented resolver so that schemas can reference includes/imports from places found via classpath search.

But if a DFDL schema is used as an XSD, to validate XML separately from the parse, and this is done by some other XML tool, then that tool may need to use a similar classpath-oriented resolver to find the various pieces of the schema, based on the schemaLocation attributes in the DFDL schema import/include statements. That is if they want to directly pull the schema files from jars.

For Java-based XML tools we could perhaps specify they use exactly the Daffodil resolver.

(There's actually some centralized Apache resolver project (part of XML commons?) intended to address exactly this weak spot in the w3c XML specs about how schema locations work. Possibly what we're doing in the Daffodil resolver with classpath search should become a part of that effort, or the relationship of the two things should be explored anyway.)

But for other technology bases than Java, well XML Catalogs are also a possibility, but we don't really test that, and there's even a JIRA ticket suggesting we deprecate catalog support entirely.

Hence, we need the "un-jar" technique to also work: "un-zip all the jars on top of the same directory tree", knowing that the directory structure insures no name collisions. I believe if the XML processing always resolves schemaLocations relative to the root of that tree, then all schema files would be resolvable. This is what we hope to be the lowest-common-denominator for schemaLocation resolvers.

I think we need to document this un-jar technique.

But as you know it gets worse because in some applications, people have to break up the set of files further. E.g., in cybersecurity people want each data flow to have exactly and only the DFDL schema files it needs as part of that flow's configurations, and commonly a particular flow allows only a subset of all the things in the format.

For now I believe this reorganization of a schema also has to be done by hand. But we need to document how, and then if automated tools that chase the include/imports can be created so as to help automate this, that would be an improvement.

I don't like the notion of the DFDL schema having to cater to the needs of the application too much. It makes reuse of the schema harder for other applications such as data integration.

//
// //////////////////////////////////////////////////////////////////////////

= Packaging DFDL Schemas for use in Daffodil Applications
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this page wants to be merged into the standard project layout? There's seems to be quite a bit of overlap?

Also, a normal markdown file seems reasonable. This doesn't use any adoc features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Though I expect this to have a nested box diagram showing [header1[header2[payload]]] type compositions of schemas.

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Just some minor comments.

//
// //////////////////////////////////////////////////////////////////////////

= DFDL Language Extensions in Daffodil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That other page talks about DFDL extensions while this page talks about extensions to Daffodil schemas (slightly more general). Probably better to merge that page into this page than the other way around.


To provide some new advanced format capabilities such as checksums, compressed or encoded data regions, and user-defined-functions, DFDL schemas sometimes must use Daffodil-specific extensions and incorporate Daffodil plugins that provide the small algorithmic aspects needed by these formats.

There are 2 kinds of plugins today supported by Daffodil 3.3.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider ending both this sentence and the following sentence at line 44 with a colon as punctuation.

Different DFDL schemas for different kinds of data will need their own such plugins.
Hence the plugins, like the DFDL schema files themselves, are used in applications as part of a specific data-processing flow.

Keeping in the spirit of DFDL in describing a format declaratively, plugins need to be very small pieces of code (ex: a character set definition should be 10 lines of code.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to break a sentence across multiple lines as long as the next sentence begins on a new line (not immediately after the period punctuating the previous sentence). We break long lines of code, so we should break long sentences as well.

@mbeckerle
Copy link
Contributor Author

Replaced by PR #108

@mbeckerle mbeckerle closed this Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants