Skip to content

Commit 1da4ac2

Browse files
committed
initial SDTH ReSpec skeleton
1 parent e0ac4ed commit 1da4ac2

11 files changed

+381
-0
lines changed

.github/workflows/ci-push-master.yml

+1
Original file line numberDiff line numberDiff line change
@@ -77,3 +77,4 @@ jobs:
7777
- name: Sync S3 dev bucket
7878
run: |
7979
aws s3 sync sdtl-outputs-${{env.SHORT_SHA}}\sphinx\build\dirhtml s3://docs.ddialliance.org/SDTL/dev/model
80+
aws s3 sync SDTH\doc\ s3://docs.ddialliance.org/SDTL/dev/SDTH

SDTH/doc/assets/ddi.svg

+1
Loading

SDTH/doc/assets/sdth_diagram.mmd

+51
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
2+
classDiagram
3+
4+
class Program["sdth:Program"] {
5+
rdfs:label
6+
}
7+
class ProgramStep["sdth:ProgramStep"] {
8+
rdfs:label
9+
sdth:hasSourceCode
10+
sdth:hasSDTL
11+
}
12+
class VariableInstance["sdth:VariableInstance"] {
13+
rdfs:label
14+
sdth:hasName
15+
}
16+
class DataframeInstance["sdth:DataframeInstance"] {
17+
rdfs:label
18+
sdth:hasName
19+
}
20+
21+
class FileInstance["sdth:FileInstance"] {
22+
rdfs:label
23+
sdth:hasName
24+
}
25+
26+
27+
ProgramStep <-- Program : sdthhasProgramStep
28+
ProgramStep <-- ProgramStep : sdth_hasProgramStep
29+
30+
ProgramStep --> VariableInstance : sdth_usesVariable
31+
ProgramStep --> VariableInstance : sdth_assignsVariable
32+
ProgramStep --> DataframeInstance : sdth_consumesDataframe
33+
ProgramStep --> DataframeInstance : sdth_producesDataframe
34+
35+
ProgramStep --> FileInstance : sdth_loadsFile
36+
ProgramStep --> FileInstance : sdth_savesFile
37+
38+
39+
DataframeInstance --> VariableInstance : sdth_hasVariableInstance
40+
FileInstance --> VariableInstance : sdth_hasVariableInstance
41+
42+
43+
DataframeInstance --> DataframeInstance : sdth_derivedFrom
44+
DataframeInstance --> DataframeInstance : sdth_elaborationOf
45+
46+
FileInstance --> FileInstance : sdth_derivedFrom
47+
FileInstance --> FileInstance : sdth_elaborationOf
48+
VariableInstance --> VariableInstance : sdth_derivedFrom
49+
VariableInstance --> VariableInstance : sdth_elaborationOf
50+
51+
```

SDTH/doc/assets/sdth_diagram.png

354 KB
Loading

SDTH/doc/content/background.markdown

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Background and Motivation
2+
3+
SDTH was created to provide a simple way to describe the key features of a data transformation script. SDTH answers questions such as: “Which variables affected variable X?” and “Which variables were affected by file Y?”
4+
5+
SDTH focuses on how data entities (variables, dataframes, and files) are related to each other and to the steps in a program that change them.
6+

SDTH/doc/content/examples.markdown

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Examples
+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Introduction
2+
3+
## SDTH and PROV
4+
5+
## Instances of data objects
6+
7+
SDTH requires a change in the way that one thinks about a computer program. We usually think of variables and dataframes as containers into which we put digital representations of numbers, text, or logical values. As the program is executed, the values assigned to a variable or dataframe change, but its identity remains the same. We may think of these sets of values as states of the variable/dataframe. Since variables and dataframes can change, they do not qualify as entities in the PROV framework. SDTH focuses on the contents of data objects, not their containers.
8+
9+
We call the set of values assigned to a variable or dataframe at a moment time an “instance” of a data object. Every time a command in a program changes the contents of a data object, a new instance is created. Many instances of data objects created during the execution of a program are ephemeral and disappear when the program ends. Only instances of data objects preserved on non-volatile media, which SDTH calls “files”, persist after the program has executed.
10+
11+
The connection between references to data objects in the program and instances of these objects in SDTH is maintained by making variable and dataframe names properties of entities in SDTH. Queries of SDTH can report variable/dataframe names, instances, or both.

0 commit comments

Comments
 (0)