Skip to content
This repository has been archived by the owner on Jan 3, 2021. It is now read-only.

Various issues with custom metadata #25

Open
cygri opened this issue Oct 22, 2013 · 1 comment
Open

Various issues with custom metadata #25

cygri opened this issue Oct 22, 2013 · 1 comment

Comments

@cygri
Copy link
Owner

cygri commented Oct 22, 2013

This is a catch-all issue for Pubby's custom metadata features.

There are four ways how metadata can end up in Pubby-served representations:

  1. DataURLServlet and ValuesDataURLServlet each add some hardcoded metadata triples (primaryTopic, label); this ends up in the RDF variants only.
  2. conf:rdfDocumentMetadata can be defined on each dataset to add some custom properties that will be asserted about the document in the RDF variants. It supports triples with fixed predicate and object only.
  3. conf:metadataTemplate can be defined on each dataset to add a metadata graph based on a flexible template, where various “magic” IRIs in the template are replaced with values provided by the system. The generated triples show up in the RDF representations, and as a separate metadata table on the HTML representations.
  4. The generated HTML pages contain some “metadata” that is coded in the header and footer templates: site title, page title, links to RDF variants, link to SPARQL endpoint, link to RDF browsers.

These are quite redundant. Ideally, there would be a single mechanism.

For 2. and 3., the specification of metadata happens on the dataset level. This decision was made because different data sources may have different metadata (provenance, creator, etc.). But there are some issues with this:

  1. In the easiest case, one may want to simply specify metadata on the configuration level, for example a license triple. This is currently not supported. Simple things should be simple, hard things possible.
  2. Pubby should only add metadata for a given dataset if that data source actually contributed to the result. Currently, all metadata from all datasets is always added to the response. Changing this is difficult because the distinction happens deep within some DataSource implementation, and at the point where we deal with metadata (in the servlets) it is no longer easily visible.

There are a number of other issues:

  • There is an ugly hack where the metadata code tries to get hold of the query that was used to describe the resource. This is not thread-safe and turns DataSource into a leaky abstraction. It is also broken now because we may use multiple queries to assemble a single response. Maybe DataSource needs an additional ProvenanceLog argument on some/all methods?
  • The only way to make additional metadata show up in the HTML pages is by using number 3 above, or by modifying the templates. I find 3 a bit heavyweight for things like stating a license. Why do the metadata tables look so different?
  • I find the use case for the metadata templates somewhat unclear. Who needs a detailed trace of the operations that were performed to create the representation? I understand the point of metadata on the document level (publisher, source, etc.), but on the representation level it seems like useless noise. Also, things that “peek under the hood”, like an account of the database queries performed, seem of limited value and potentially a security risk. Is the use case clearly articulated somewhere? What would be a template that most Pubby users would find useful?
@cygri
Copy link
Owner Author

cygri commented Oct 26, 2013

Another issue:

  • The metadata rendering code is completely disconnected from the data rendering code, meaning that the latter doesn't get the benefit of work on the former, like the new label display logic and property ordering code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant