dbpedia · yarikoptic · Feb 11, 2024 · Feb 11, 2024 · Feb 11, 2024 · Feb 11, 2024
diff --git a/.codespellrc b/.codespellrc
@@ -0,0 +1,8 @@
+[codespell]
+# Ref: https://github.com/codespell-project/codespell#using-a-config-file
+# Some files are with comments in German, so decided to skip entire databus-collection-manager.js
+skip = .git,*.svg,*.css,*.min.*,.codespellrc,*.css.map,dist,databus-collection-manager.js
+check-hidden = true
+# lines with umlauts or ist -- German
+ignore-regex = .*([üä]|\b[Ii]st\b|finde lokale|Streitbeilegungsverfahren).*
+# ignore-words-list =
diff --git a/.env b/.env
@@ -1,7 +1,7 @@
 ####### DATABUS SETTINGS #######
 # RESOURCE BASE URL - This should match the DNS entry pointing to your server
 DATABUS_RESOURCE_BASE_URL=http://localhost:3000
-# do not use qoutation marks for the value here and no trailing space
+# do not use quotation marks for the value here and no trailing space
 DATABUS_NAME=My Local Databus
 DATABUS_ORG_ICON=
 DATABUS_BANNER_COLOR= # Default is #81b8b2
@@ -31,5 +31,5 @@ DATABUS_PROXY_SERVER_OWN_CERT_KEY="key.pem"
 # It is necessary to know this, in order to set up ACME etc.
 # Note: the host name should be identical to DATABUS_RESOURCE_BASE_URL,
 # but without specifying a port, protocol i.e. HTTP(S) etc.
-# do not use qoutation marks for the value here and no trailing space
+# do not use quotation marks for the value here and no trailing space
 DATABUS_PROXY_SERVER_HOSTNAME=my-databus.org
diff --git a/.github/workflows/codespell.yml b/.github/workflows/codespell.yml
@@ -0,0 +1,23 @@
+# Codespell configuration is within .codespellrc
+---
+name: Codespell
+
+on:
+  push:
+    branches: [master]
+  pull_request:
+    branches: [master]
+
+permissions:
+  contents: read
+
+jobs:
+  codespell:
+    name: Check for spelling errors
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+      - name: Codespell
+        uses: codespell-project/actions-codespell@v2
diff --git a/README.md b/README.md
@@ -31,7 +31,7 @@ Databus is designed as a lightweight and agile solution and fits seamlessly into
 ## Deployment Levels
 We identified these deployment levels with our partners:
 1. **Open community**: Set up a data space in the Databus Network and jointly curate it with community contributions spanning across several organisations (see DBpedia and Open Energy) 
-2. **Organisation**: Implement your enterprise’s data strategy and optimise efficiency, integration of external data and re-use; manage research data university-wide for scientific sustainability and FAIR. Databus hooks into single sign-on authentication like Siemens ID or DLR ID
+2. **Organisation**: Implement your enterprise’s data strategy and optimise efficiency, integration of external data and reuse; manage research data university-wide for scientific sustainability and FAIR. Databus hooks into single sign-on authentication like Siemens ID or DLR ID
 3. **Department, group or team**: Systematise data workflows internally; transparently record scientific results from beginning to end.  
 4. **Collaborative projects**: Efficiently coordinate data with partners in large projects or in multi-project environments. 
 5. **Application, Product or Pipeline**: Streamline and automate data flow and data dependencies within a target application, product or pipeline. It's an essential tool for agile and data-driven decision making and shines in managing input/output for data-intensive applications such as: Search, AI, Deep Learning, Natural Language Processing (NLP), Knowledge Graph Construction and Evolution, Databases, Continuous Integration and Microservice Orchestration.  

diff --git a/docs/api.md b/docs/api.md
@@ -145,7 +145,7 @@ PUT /$username/$group/$artifact/$version
 | 200 | `OK` | Artifact version updated |
 | 201 | `CREATED` | Artifact version created | 
 | 400 | `BAD REQUEST` | Request or request data was formatted incorrectly | 
-| 403 | `FORBIDDEN` | Invalid API Token or request targetting the namespace of another user | 
+| 403 | `FORBIDDEN` | Invalid API Token or request targeting the namespace of another user | 
 | 500 | `INTERNAL SERVER ERROR` | Internal server error | 
 
 ### Remove Version
@@ -161,7 +161,7 @@ DELETE /$username/$group/$artifact/$version
 | Status Codes | Status | Description |
 | :--- | :--- | :--- | 
 | 204 | `NO CONTENT` | Artifact version deleted successfully |
-| 403 | `FORBIDDEN` | Invalid API Token or request targetting the namespace of another user | 
+| 403 | `FORBIDDEN` | Invalid API Token or request targeting the namespace of another user | 
 | 500 | `INTERNAL SERVER ERROR` | Internal server error | 
 
 ## Generic
@@ -193,5 +193,5 @@ PUT /system/publish
 | :--- | :--- | :--- | 
 | 200 | `OK` | Content created or updated |
 | 400 | `BAD REQUEST` | Request or request data was formatted incorrectly | 
-| 403 | `FORBIDDEN` | Invalid API Token or request targetting the namespace of another user | 
+| 403 | `FORBIDDEN` | Invalid API Token or request targeting the namespace of another user | 
 | 500 | `INTERNAL SERVER ERROR` | Internal server error | 
diff --git a/docs/artifact.md b/docs/artifact.md
@@ -79,7 +79,7 @@ dct:title
 		sh:path dct:title ;
 		sh:severity sh:Violation ;
 	    sh:maxLength 100 ;
-		sh:message "dct:title must have less than 100 characters and each language must occure only once."@en ;
+		sh:message "dct:title must have less than 100 characters and each language must occur only once."@en ;
 		sh:uniqueLang true ;
 	] . 
 ```
@@ -117,7 +117,7 @@ dct:abstract
 	sh:property [
 		sh:path dct:abstract ;
 	    sh:severity sh:Violation ;
-	    sh:message "dct:abstract must have less than 300 characters and each language must occure only once."@en ;
+	    sh:message "dct:abstract must have less than 300 characters and each language must occur only once."@en ;
 	    sh:uniqueLang true;
 	    sh:maxLength 300 ;
 	] . 
@@ -156,7 +156,7 @@ dct:description
         sh:property [
 		sh:path dct:description ;
 		sh:severity sh:Violation ;
-		sh:message "Each language of dct:description must occure only once."@en ;
+		sh:message "Each language of dct:description must occur only once."@en ;
 		sh:uniqueLang true ;
 	] . 
 ```

diff --git a/docs/auto-completion.md b/docs/auto-completion.md
@@ -5,7 +5,7 @@ When trying to publish data on the Databus, the HTTP API accepts not only fully
 ## Properties
 
 
-The following table shows a list of inferrable properties that can optionally be omitted in the input.
+The following table shows a list of inferable properties that can optionally be omitted in the input.
 
 ### Version
 | Property   | Value inferred from |

diff --git a/docs/collection.md b/docs/collection.md
@@ -109,7 +109,7 @@ dct:abstract
 	sh:property [
 		sh:path dct:abstract ;
 	    sh:severity sh:Violation ;
-	    sh:message "dct:abstract must have less than 300 characters and each language must occure only once. "@en ;
+	    sh:message "dct:abstract must have less than 300 characters and each language must occur only once. "@en ;
 	    sh:uniqueLang true;
 	    sh:maxLength 300 ;
 	] . 

diff --git a/docs/content-variants.md b/docs/content-variants.md
@@ -9,7 +9,7 @@ The main rule for content variant setup is the following:
 
 This ensures that each file in the databus:Version can be selected individually by querying for its unique tuple of *format*, *compression type* and *content variants*.
 
-A content variant is a key-value pair with the key being a sub-property of `databus:contentVariant` and the value being a (preferrably short) string that can be chosen freely. Content variants could describe either a property of the file or its content.
+A content variant is a key-value pair with the key being a sub-property of `databus:contentVariant` and the value being a (preferably short) string that can be chosen freely. Content variants could describe either a property of the file or its content.
 
 
 **Examples:**

diff --git a/docs/distribution.md b/docs/distribution.md
@@ -160,7 +160,7 @@ missing
 	a sh:PropertyShape ;
 	sh:targetClass databus:Part ;
 	sh:severity sh:Violation ;
-	sh:message """Required property databus:compression MUST occur exactly once AND have xsd:string as value AND should not inlcude a '.' in front """@en ;
+	sh:message """Required property databus:compression MUST occur exactly once AND have xsd:string as value AND should not include a '.' in front """@en ;
 	sh:pattern "^[a-z0-9]{1,8}$" ;
 	sh:path databus:compression;
 	sh:minCount 1 ;
@@ -344,7 +344,7 @@ TODO ??
 ##  Content variants
 TODO ??
 
-The shape `<#parts-are-distinguishable-by-cv>` relies on a ordering of results in the *GROUP BY* and consequentially *GROUP_CONCAT* instruction that is agnostic of the ordering of properties in the data. This seems to work for Apache JENA and Virtuoso but has not been tested with other SPARQL engines.
+The shape `<#parts-are-distinguishable-by-cv>` relies on a ordering of results in the *GROUP BY* and consequently *GROUP_CONCAT* instruction that is agnostic of the ordering of properties in the data. This seems to work for Apache JENA and Virtuoso but has not been tested with other SPARQL engines.
 
 
 Example (JSON-LD):

diff --git a/docs/group.md b/docs/group.md
@@ -118,7 +118,7 @@ dct:abstract
 	sh:property [
 		sh:path dct:abstract ;
 	    sh:severity sh:Violation ;
-	    sh:message "dct:abstract must have less than 300 characters and each language must occure only once. "@en ;
+	    sh:message "dct:abstract must have less than 300 characters and each language must occur only once. "@en ;
 	    sh:uniqueLang true;
 	    sh:maxLength 300 ;
 	] . 

diff --git a/docs/guides/data-download-guide.md b/docs/guides/data-download-guide.md
@@ -166,7 +166,7 @@ SELECT ?file WHERE
 
 ### Download and convert selected data
 
-In order to download the data we need to pass the query as the _`-s`_ argument. Additionaly we need to specify where the query needs to be asked to. This is done using the `-e` argument. Furthermore if we want to convert the files to _.nt_ we need to specify if in the _`-f`_ parameter and finally we need to tell the client the desired compression. 
+In order to download the data we need to pass the query as the _`-s`_ argument. Additionally we need to specify where the query needs to be asked to. This is done using the `-e` argument. Furthermore if we want to convert the files to _.nt_ we need to specify if in the _`-f`_ parameter and finally we need to tell the client the desired compression. 
 
 ```
 java -jar target/databus-client-v2.1-beta.jar \

diff --git a/docs/guides/publish-guide.md b/docs/guides/publish-guide.md
@@ -41,7 +41,7 @@ and then copy raw links to file data:
 
 For example a link to our readme as of July 2023 will be: [https://raw.githubusercontent.com/dbpedia/databus/68f976e29e2db15472f1b664a6fd5807b88d1370/README.md](https://raw.githubusercontent.com/dbpedia/databus/68f976e29e2db15472f1b664a6fd5807b88d1370/README.md)
 
-**!NOTE! If you use links referring not to commit, but to branch, the files there may be changing over time, which will break corrspondence with the file hashes stored in Databus**
+**!NOTE! If you use links referring not to commit, but to branch, the files there may be changing over time, which will break correspondence with the file hashes stored in Databus**
 
 #### Google Drive
 

diff --git a/docs/model/how-to.md b/docs/model/how-to.md
@@ -63,7 +63,7 @@ The metadata publisher has complete control over the names of the Databus identi
 
 1. **The account name**
 
-* The account name is chosen on account creation, i.e. when registering at the particular Datbaus instane. It is advised to use your personal username or the name of your institution/company. In other words, the account name is the identifier of the data owner/publisher. E.g. DBpedia publishes the regular releases under the account name _dbpedia_.
+* The account name is chosen on account creation, i.e. when registering at the particular Datbaus instance. It is advised to use your personal username or the name of your institution/company. In other words, the account name is the identifier of the data owner/publisher. E.g. DBpedia publishes the regular releases under the account name _dbpedia_.
 
 2. **The group name**
 

diff --git a/docs/mods.md b/docs/mods.md
@@ -54,7 +54,7 @@ While the [Databus Model](model.md) is quite minimal and supports only necessary
 
 There are currently some basic examples for Databus Mods, applicable to various file types, showcasing for what Databus Mods can be used:
 
-1. Mimetype Mod: On the publishing of any file, this mod finds the correspnding mimetype and saves it
+1. Mimetype Mod: On the publishing of any file, this mod finds the corresponding mimetype and saves it
 2. VOID Mod: Collects [VOID](https://www.w3.org/TR/void/) metadata for RDF files and saves them in an SPARQL endpoint.
 3. Filemetrics Mod: Collects some addidional metrics not captured by the minimal model for any file, e.g. checking if it is sorted, the uncompressed size and some more
 

diff --git a/docs/quickstart-examples.md b/docs/quickstart-examples.md
@@ -37,7 +37,7 @@ This site describes the minimal required metadata for publishing a dataset (meta
 }
 ```
 
-If the Databus should NOT infer a certain metadatum (for example not auto-generating the `abtract` from the `description` field), it can be set explicitly and the Databus will accept it (if it fits its criteria). For a full list of inferrable properties check out the [autocompletion page](/docs/auto-completion.md)
+If the Databus should NOT infer a certain metadatum (for example not auto-generating the `abstract` from the `description` field), it can be set explicitly and the Databus will accept it (if it fits its criteria). For a full list of inferable properties check out the [autocompletion page](/docs/auto-completion.md)
 
 #### Property Description
 

diff --git a/docs/roadmap.md b/docs/roadmap.md
@@ -6,7 +6,7 @@ license can be any URI at the moment, however, these URIs are not validated and
 
 ### Mappings
 
-We implemted a prototypical CSV to RDF conversion with TARQL in the Databus Download Client. We  to integrate a full RML engine. At the moment, "[FunMap: Efficient Execution of Functional Mappings for Knowledge Graph Creation](https://arxiv.org/abs/2008.13482)" by DBpedia Member TIB seems the best candidate.&#x20;
+We implemented a prototypical CSV to RDF conversion with TARQL in the Databus Download Client. We  to integrate a full RML engine. At the moment, "[FunMap: Efficient Execution of Functional Mappings for Knowledge Graph Creation](https://arxiv.org/abs/2008.13482)" by DBpedia Member TIB seems the best candidate.&#x20;
 
 ### More Download As Options
 

diff --git a/docs/uniquesellingpoints.md b/docs/uniquesellingpoints.md
@@ -6,7 +6,7 @@ description: >-
 
 # 🚀 Unique Features (Draft)
 
-## High-degree of Automation and Re-use
+## High-degree of Automation and Reuse
 
 Over three years, we implemented and designed the Databus to build a solid foundation for automating many tedious processes in Knowledge Engineering including upload, download, low-level conversions, quality tests, generating statistics and tracking provenance (includes private key signature for authenticity). We fireproofed it using the DBpedia Snapshot release process. The result is that we **saved 92% cost in work hours while being 10 times more productive** (increase in release frequency). In particular:
 
@@ -16,7 +16,7 @@ Over three years, we implemented and designed the Databus to build a solid found
 * concepts were inspired by solid frameworks such as Maven, Git/Github, Linked Data, Steam
 * License URLs are mapped to [Dalicc](https://dalicc.net) to make them machine-understandable (see [Roadmap](roadmap.md))
 * Mappings are collected centrally to transform data and can be re-used (see [Roadmap](roadmap.md))
-* Additional metadata is computed by re-usable apps called Mods, which detect compression and format (TrueType Mod), count triples and statistics (VOID Mod), online checks (OnlineCheck Mod), syntax and encoding analysis.
+* Additional metadata is computed by reusable apps called Mods, which detect compression and format (TrueType Mod), count triples and statistics (VOID Mod), online checks (OnlineCheck Mod), syntax and encoding analysis.
 * We envision a lot more smart Mods that provide a new class of applications built on Databus metadata, such as data search, classifying data with ontologies and contextualization, automatic patching/repair and automatic selection of data to train AI.
 
 ## Low-code Application Deployment
@@ -69,7 +69,7 @@ Analogous to [Feature Creep](https://en.wikipedia.org/wiki/Feature\_creep) in so
 
 ## Interoperability
 
-The DataID model is a stable vocabulary bult on DCAT (a W3C vocabulary to describe datasets), DCMI (Dublin Core Metadata Initiative vocabulary to describe resources) and Prov-O (W3C provenance vocabulary) and forms the core of the Databus. DataID and Databus patches several shortcomings of DCAT and DCMI:
+The DataID model is a stable vocabulary built on DCAT (a W3C vocabulary to describe datasets), DCMI (Dublin Core Metadata Initiative vocabulary to describe resources) and Prov-O (W3C provenance vocabulary) and forms the core of the Databus. DataID and Databus patches several shortcomings of DCAT and DCMI:
 
 * DataID/Databus contains the "right" kind of information to re-publish to other repositories automatically, including platforms such as Kaggle, CKAN, Zenodo as well as the automated generation of Data Management Plan (DMP) deliverables for e.g. Horizon Europe research projects (these were implemented by third-parties and not included with the Databus software).
 * Databus distinguishes between version of a dataset and the dataset artifact, an important individuation that allows to discover updates, i.e. new versions of the same dataset (artifact). Databus also distinguishes between compression (\~ dozen of [lossless compression formats](https://commons.apache.org/proper/commons-compress/)) and [IANA Media Types](https://www.iana.org/assignments/media-types/media-types.xhtml) or mimetypes (over 1400 formats) and other formats that describe the actual format of files.
@@ -79,7 +79,7 @@ The DataID model is a stable vocabulary bult on DCAT (a W3C vocabulary to descri
 ## Standardized, de-central and scalable
 
 * implemented using the open W3C standards Linked Data, RDF, SPARQL, OWL, SHACL, DCAT, Prov-O complemented by our own stable DCAT extension DataID.
-* Building the Databus with Linked Data and SPARQL easily allows the Databus initative to scale regarding performance and extensibility. Databus provides stable, resolvable identifiers for account, group, dataset, version, distribution, file and collections, so it easy to:
+* Building the Databus with Linked Data and SPARQL easily allows the Databus initiative to scale regarding performance and extensibility. Databus provides stable, resolvable identifiers for account, group, dataset, version, distribution, file and collections, so it easy to:
   * comprise dataset collections of identifiers from different Databuses residing at different levels in one organisation (personal, team, project, department, whole organisation) and external Databus deployments.
   * use identifiers in other applications to provide additional information such as additional metadata, annotations and software-data dependencies
   * federate SPARQL queries over the Databus SPARQL endpoint with other Databuses, other SPARQL endpoints using Databus identifiers and Mods (our metadata enrichment extensions)

diff --git a/docs/uridesign.md b/docs/uridesign.md
@@ -25,7 +25,7 @@ The URIs in your input have to follow a specific pattern in order to be accepted
 ### Artifact URI Rules *(databus:Artifact)*
 
 * An artifact URI has exactly three path segments.
-* The first path segment identifiees the publisher, the second segment the group, while the third segment the published artifact.  
+* The first path segment identifies the publisher, the second segment the group, while the third segment the published artifact.  
 
 * An example of a valid artifact URI:* https://databus.example.org/john/animals/cats
 

diff --git a/docs/usage/quickstart-examples.md b/docs/usage/quickstart-examples.md
@@ -73,7 +73,7 @@ curl -X 'POST' \
 }'
 ```
 
-If the Databus should NOT infer a certain metadatum (for example not auto-generating the `abtract` from the `description` field), it can be set explicitly and the Databus will accept it (if it fits its criteria). For a full list of inferrable properties check out the [autocompletion page](../auto-completion.md)
+If the Databus should NOT infer a certain metadatum (for example not auto-generating the `abstract` from the `description` field), it can be set explicitly and the Databus will accept it (if it fits its criteria). For a full list of inferable properties check out the [autocompletion page](../auto-completion.md)
 
 #### Property Description
 

diff --git a/docs/usage/web-interface/collections.md b/docs/usage/web-interface/collections.md
@@ -11,7 +11,7 @@ In the following section, we will cover the user interface of the collection edi
 The DBpedia Databus Collections are useful in many ways.
 
 * You can share a specific dataset with your community or colleagues.
-* You can re-use dataset others created
+* You can reuse dataset others created
 * You can plug collections into Databus-ready applications and avoid spending time on the download and setup process
 * You can point to a specific piece of data (e.g. for testing) with a single URI in your publications
 * You can help others to create data queries more easily
-Original file line number
+Diff line change
@@ Expand Up @@
     }
     ```
-    If the Databus should NOT infer a certain metadatum (for example not auto-generating the `abtract` from the `description` field), it can be set explicitly and the Databus will accept it (if it fits its criteria). For a full list of inferrable properties check out the [autocompletion page](/docs/auto-completion.md)
+    If the Databus should NOT infer a certain metadatum (for example not auto-generating the `abstract` from the `description` field), it can be set explicitly and the Databus will accept it (if it fits its criteria). For a full list of inferable properties check out the [autocompletion page](/docs/auto-completion.md)
     #### Property Description
@@ Expand Down @@