Various issues with custom metadata #25

cygri · 2013-10-22T11:20:11Z

This is a catch-all issue for Pubby's custom metadata features.

There are four ways how metadata can end up in Pubby-served representations:

DataURLServlet and ValuesDataURLServlet each add some hardcoded metadata triples (primaryTopic, label); this ends up in the RDF variants only.
conf:rdfDocumentMetadata can be defined on each dataset to add some custom properties that will be asserted about the document in the RDF variants. It supports triples with fixed predicate and object only.
conf:metadataTemplate can be defined on each dataset to add a metadata graph based on a flexible template, where various “magic” IRIs in the template are replaced with values provided by the system. The generated triples show up in the RDF representations, and as a separate metadata table on the HTML representations.
The generated HTML pages contain some “metadata” that is coded in the header and footer templates: site title, page title, links to RDF variants, link to SPARQL endpoint, link to RDF browsers.

These are quite redundant. Ideally, there would be a single mechanism.

For 2. and 3., the specification of metadata happens on the dataset level. This decision was made because different data sources may have different metadata (provenance, creator, etc.). But there are some issues with this:

In the easiest case, one may want to simply specify metadata on the configuration level, for example a license triple. This is currently not supported. Simple things should be simple, hard things possible.
Pubby should only add metadata for a given dataset if that data source actually contributed to the result. Currently, all metadata from all datasets is always added to the response. Changing this is difficult because the distinction happens deep within some DataSource implementation, and at the point where we deal with metadata (in the servlets) it is no longer easily visible.

There are a number of other issues:

There is an ugly hack where the metadata code tries to get hold of the query that was used to describe the resource. This is not thread-safe and turns DataSource into a leaky abstraction. It is also broken now because we may use multiple queries to assemble a single response. Maybe DataSource needs an additional ProvenanceLog argument on some/all methods?
The only way to make additional metadata show up in the HTML pages is by using number 3 above, or by modifying the templates. I find 3 a bit heavyweight for things like stating a license. Why do the metadata tables look so different?
I find the use case for the metadata templates somewhat unclear. Who needs a detailed trace of the operations that were performed to create the representation? I understand the point of metadata on the document level (publisher, source, etc.), but on the representation level it seems like useless noise. Also, things that “peek under the hood”, like an account of the database queries performed, seem of limited value and potentially a security risk. Is the use case clearly articulated somewhere? What would be a template that most Pubby users would find useful?

The text was updated successfully, but these errors were encountered:

cygri · 2013-10-26T17:42:18Z

Another issue:

The metadata rendering code is completely disconnected from the data rendering code, meaning that the latter doesn't get the benefit of work on the former, like the new label display logic and property ordering code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various issues with custom metadata #25

Various issues with custom metadata #25

cygri commented Oct 22, 2013

cygri commented Oct 26, 2013

Various issues with custom metadata #25

Various issues with custom metadata #25

Comments

cygri commented Oct 22, 2013

cygri commented Oct 26, 2013