Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derive output file from input-file/graph #2

Open
RickMoynihan opened this issue Mar 1, 2016 · 1 comment
Open

Derive output file from input-file/graph #2

RickMoynihan opened this issue Mar 1, 2016 · 1 comment
Assignees

Comments

@RickMoynihan
Copy link
Member

Some grafter projects compose multiple pipelines together into a -main function that calls multiple pipelines, associating each with pipeline with an input file and output file.

This boilerplate could be removed if the command line plugin was smarter.

The command line plugin should let you do something like this:

$ lein grafter run swirrl.pipelines/my-pipeline ./data/my-file inputarg2 -outputer=from-input-file

The outputer specifies the method the plugin will use to derive the output destination from each quad (or dataset in the case of a tabular->tabular transformation). Thefrom-input-file outputer will derive the output file destination by calling (-> (meta quad) :grafter.tabular/dataset :grafter.tabular/data-source) on each quad and appending it to the file returned* .

Another outputer will look at the quads graph, and generate/append to an output file determined by a sensible but arbitrary conversion of a URI into a file name.

Additionally it would be useful to support the following arguments:

  • -f to set the desired output format (default trig)
  • -o to specify an explicit output file - this should probably be chosen in preference to the default outputer.

NOTE the above arguments are suggestions... and I als think outputer is a bit of a lame name... so if we can think of a better one we should use it.

  • This is possible because graph-fn sets metadata on the input-file for each quad, where the input-file is assumed to be the dataset that graph-fn was called on.
@RickMoynihan
Copy link
Member Author

Note also that the plugin will need to close all opened files, and it may also need to implement a batching strategy to avoid openning and closing file on each and every quad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants