Skip to content

Quantifying conversion efficacy

Timothy Lebo edited this page Feb 14, 2012 · 17 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

|e_p| - |v_p| / |v_p| vs. |e_l| - |v_l|

enhance, verbatim, param, layer

DRAFT:

(results):

PREFIX void:       <http://rdfs.org/ns/void#>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>

SELECT *
WHERE {
  GRAPH <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?versioned 
       a conversion:VersionedDataset;
       void:subset ?layerD
    .
    ?layerD 
       a conversion:LayerDataset; 
       conversion:conversion_identifier ?layer
    .
    {OPTIONAL {?layerD conversion:num_triples ?triples}}
    UNION
    {OPTIONAL {?layerD void:subset [ a conversion:Dataset; 
                                     conversion:num_triples ?triples ]}}
  }
}
PREFIX void:       <http://rdfs.org/ns/void#>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>

SELECT ?layerD count(*) as ?count
WHERE {
  GRAPH <http://logd.tw.rpi.edu/vocab/Dataset> {
    ?versioned 
       a conversion:VersionedDataset;
       void:subset ?layerD
    .
    ?layerD 
       a conversion:LayerDataset; 
       conversion:conversion_identifier ?layer
    .
    {OPTIONAL {?layerD conversion:num_triples ?triples}}
    UNION
    {OPTIONAL {?layerD void:subset [ a conversion:Dataset; 
                                     conversion:num_triples ?triples ]}}
  }
} group by ?layerD order by desc(?count)

TODO: from download time to last conversion time (or last publish to full named graph time). (use case: data-gov 4383)

Related

Bio thread

hi all,

peter, nice article, it matches well my experience.

one thing to note is that, in this context, that the mapping is
(hopefully) a one shot deal that then can be used into the future without
much change, e.g. the bio* efforts that map to sequence database records.
also if one has a standard target that everything is mapped to, this also
helps.  my experience was mapping third party gene expression experiments
(data and annotation) to MAGE-ML.  then there was a standard mapping that
didn't have to change from MAGE-ML to our Rosetta Resolver application
which provided the UI.

cheers,
michael

-----Original Message-----
From: [email protected] [mailto:public-semweb-
[email protected]] On Behalf Of Mork, Peter D.S.
Sent: Wednesday, September 14, 2011 9:29 AM
To: HCLS IG
Subject: RE: How much does data integration cost ?

This article
(http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.5.6098&rep=re
p1&type=pdf) doesn't give absolute numbers, but it does describe what
portions of a data integration task eat up the most time.

Peter Mork


-----Original Message-----
From: [email protected] [mailto:public-semweb-
[email protected]] On Behalf Of Andrea Splendiani
Sent: Wednesday, September 14, 2011 12:25 PM
To: HCLS IG
Subject: How much does data integration cost ?

Hi,

I was wondering if anybody on this list has some figures on how much
time/resources are spent in data integration, as a percentage of the
overall
'task' performed.
I often got the impression that 'data integration' is an obscure entity
for
many final users. For instance people concerned about getting results
out of
data usually only refer to the overall process as 'analysis', and often
data
integration is an ill defined entity shadowed by a better defined
statistical analysis.
I know this varies across organizations/tasks and that the distinction
between 'data integration' and the rest is a bit fuzzy, however, in a
first
approximation, which is the size of the problem that the Semantic Web
is
trying to tackle ?
Obviously, I would be interested in the Life Sciences and Health Care
context.

best,
Andrea Splendiani

From Bob Ferris via pedantic web

DQM-Vocabulary http://lists.w3.org/Archives/Public/public-semweb-lifesci/2011Oct/0044.html

http://ckan.org/2011/01/20/data-quality-what-is-it/

Clone this wiki locally