Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codespell support: workflow, config + make it fix typos #174

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .codespellrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[codespell]
# Ref: https://github.com/codespell-project/codespell#using-a-config-file
# Some files are with comments in German, so decided to skip entire databus-collection-manager.js
skip = .git,*.svg,*.css,*.min.*,.codespellrc,*.css.map,dist,databus-collection-manager.js
check-hidden = true
# lines with umlauts or ist -- German
ignore-regex = .*([üä]|\b[Ii]st\b|finde lokale|Streitbeilegungsverfahren).*
# ignore-words-list =
4 changes: 2 additions & 2 deletions .env
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
####### DATABUS SETTINGS #######
# RESOURCE BASE URL - This should match the DNS entry pointing to your server
DATABUS_RESOURCE_BASE_URL=http://localhost:3000
# do not use qoutation marks for the value here and no trailing space
# do not use quotation marks for the value here and no trailing space
DATABUS_NAME=My Local Databus
DATABUS_ORG_ICON=
DATABUS_BANNER_COLOR= # Default is #81b8b2
Expand Down Expand Up @@ -31,5 +31,5 @@ DATABUS_PROXY_SERVER_OWN_CERT_KEY="key.pem"
# It is necessary to know this, in order to set up ACME etc.
# Note: the host name should be identical to DATABUS_RESOURCE_BASE_URL,
# but without specifying a port, protocol i.e. HTTP(S) etc.
# do not use qoutation marks for the value here and no trailing space
# do not use quotation marks for the value here and no trailing space
DATABUS_PROXY_SERVER_HOSTNAME=my-databus.org
23 changes: 23 additions & 0 deletions .github/workflows/codespell.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Codespell configuration is within .codespellrc
---
name: Codespell

on:
push:
branches: [master]
pull_request:
branches: [master]

permissions:
contents: read

jobs:
codespell:
name: Check for spelling errors
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4
- name: Codespell
uses: codespell-project/actions-codespell@v2
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Databus is designed as a lightweight and agile solution and fits seamlessly into
## Deployment Levels
We identified these deployment levels with our partners:
1. **Open community**: Set up a data space in the Databus Network and jointly curate it with community contributions spanning across several organisations (see DBpedia and Open Energy)
2. **Organisation**: Implement your enterprise’s data strategy and optimise efficiency, integration of external data and re-use; manage research data university-wide for scientific sustainability and FAIR. Databus hooks into single sign-on authentication like Siemens ID or DLR ID
2. **Organisation**: Implement your enterprise’s data strategy and optimise efficiency, integration of external data and reuse; manage research data university-wide for scientific sustainability and FAIR. Databus hooks into single sign-on authentication like Siemens ID or DLR ID
3. **Department, group or team**: Systematise data workflows internally; transparently record scientific results from beginning to end.
4. **Collaborative projects**: Efficiently coordinate data with partners in large projects or in multi-project environments.
5. **Application, Product or Pipeline**: Streamline and automate data flow and data dependencies within a target application, product or pipeline. It's an essential tool for agile and data-driven decision making and shines in managing input/output for data-intensive applications such as: Search, AI, Deep Learning, Natural Language Processing (NLP), Knowledge Graph Construction and Evolution, Databases, Continuous Integration and Microservice Orchestration.
Expand Down
6 changes: 3 additions & 3 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ PUT /$username/$group/$artifact/$version
| 200 | `OK` | Artifact version updated |
| 201 | `CREATED` | Artifact version created |
| 400 | `BAD REQUEST` | Request or request data was formatted incorrectly |
| 403 | `FORBIDDEN` | Invalid API Token or request targetting the namespace of another user |
| 403 | `FORBIDDEN` | Invalid API Token or request targeting the namespace of another user |
| 500 | `INTERNAL SERVER ERROR` | Internal server error |

### Remove Version
Expand All @@ -161,7 +161,7 @@ DELETE /$username/$group/$artifact/$version
| Status Codes | Status | Description |
| :--- | :--- | :--- |
| 204 | `NO CONTENT` | Artifact version deleted successfully |
| 403 | `FORBIDDEN` | Invalid API Token or request targetting the namespace of another user |
| 403 | `FORBIDDEN` | Invalid API Token or request targeting the namespace of another user |
| 500 | `INTERNAL SERVER ERROR` | Internal server error |

## Generic
Expand Down Expand Up @@ -193,5 +193,5 @@ PUT /system/publish
| :--- | :--- | :--- |
| 200 | `OK` | Content created or updated |
| 400 | `BAD REQUEST` | Request or request data was formatted incorrectly |
| 403 | `FORBIDDEN` | Invalid API Token or request targetting the namespace of another user |
| 403 | `FORBIDDEN` | Invalid API Token or request targeting the namespace of another user |
| 500 | `INTERNAL SERVER ERROR` | Internal server error |
6 changes: 3 additions & 3 deletions docs/artifact.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ dct:title
sh:path dct:title ;
sh:severity sh:Violation ;
sh:maxLength 100 ;
sh:message "dct:title must have less than 100 characters and each language must occure only once."@en ;
sh:message "dct:title must have less than 100 characters and each language must occur only once."@en ;
sh:uniqueLang true ;
] .
```
Expand Down Expand Up @@ -117,7 +117,7 @@ dct:abstract
sh:property [
sh:path dct:abstract ;
sh:severity sh:Violation ;
sh:message "dct:abstract must have less than 300 characters and each language must occure only once."@en ;
sh:message "dct:abstract must have less than 300 characters and each language must occur only once."@en ;
sh:uniqueLang true;
sh:maxLength 300 ;
] .
Expand Down Expand Up @@ -156,7 +156,7 @@ dct:description
sh:property [
sh:path dct:description ;
sh:severity sh:Violation ;
sh:message "Each language of dct:description must occure only once."@en ;
sh:message "Each language of dct:description must occur only once."@en ;
sh:uniqueLang true ;
] .
```
Expand Down
2 changes: 1 addition & 1 deletion docs/auto-completion.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ When trying to publish data on the Databus, the HTTP API accepts not only fully
## Properties


The following table shows a list of inferrable properties that can optionally be omitted in the input.
The following table shows a list of inferable properties that can optionally be omitted in the input.

### Version
| Property | Value inferred from |
Expand Down
2 changes: 1 addition & 1 deletion docs/collection.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ dct:abstract
sh:property [
sh:path dct:abstract ;
sh:severity sh:Violation ;
sh:message "dct:abstract must have less than 300 characters and each language must occure only once. "@en ;
sh:message "dct:abstract must have less than 300 characters and each language must occur only once. "@en ;
sh:uniqueLang true;
sh:maxLength 300 ;
] .
Expand Down
2 changes: 1 addition & 1 deletion docs/content-variants.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The main rule for content variant setup is the following:

This ensures that each file in the databus:Version can be selected individually by querying for its unique tuple of *format*, *compression type* and *content variants*.

A content variant is a key-value pair with the key being a sub-property of `databus:contentVariant` and the value being a (preferrably short) string that can be chosen freely. Content variants could describe either a property of the file or its content.
A content variant is a key-value pair with the key being a sub-property of `databus:contentVariant` and the value being a (preferably short) string that can be chosen freely. Content variants could describe either a property of the file or its content.


**Examples:**
Expand Down
4 changes: 2 additions & 2 deletions docs/distribution.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ missing
a sh:PropertyShape ;
sh:targetClass databus:Part ;
sh:severity sh:Violation ;
sh:message """Required property databus:compression MUST occur exactly once AND have xsd:string as value AND should not inlcude a '.' in front """@en ;
sh:message """Required property databus:compression MUST occur exactly once AND have xsd:string as value AND should not include a '.' in front """@en ;
sh:pattern "^[a-z0-9]{1,8}$" ;
sh:path databus:compression;
sh:minCount 1 ;
Expand Down Expand Up @@ -344,7 +344,7 @@ TODO ??
## Content variants
TODO ??

The shape `<#parts-are-distinguishable-by-cv>` relies on a ordering of results in the *GROUP BY* and consequentially *GROUP_CONCAT* instruction that is agnostic of the ordering of properties in the data. This seems to work for Apache JENA and Virtuoso but has not been tested with other SPARQL engines.
The shape `<#parts-are-distinguishable-by-cv>` relies on a ordering of results in the *GROUP BY* and consequently *GROUP_CONCAT* instruction that is agnostic of the ordering of properties in the data. This seems to work for Apache JENA and Virtuoso but has not been tested with other SPARQL engines.


Example (JSON-LD):
Expand Down
2 changes: 1 addition & 1 deletion docs/group.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ dct:abstract
sh:property [
sh:path dct:abstract ;
sh:severity sh:Violation ;
sh:message "dct:abstract must have less than 300 characters and each language must occure only once. "@en ;
sh:message "dct:abstract must have less than 300 characters and each language must occur only once. "@en ;
sh:uniqueLang true;
sh:maxLength 300 ;
] .
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/data-download-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ SELECT ?file WHERE

### Download and convert selected data

In order to download the data we need to pass the query as the _`-s`_ argument. Additionaly we need to specify where the query needs to be asked to. This is done using the `-e` argument. Furthermore if we want to convert the files to _.nt_ we need to specify if in the _`-f`_ parameter and finally we need to tell the client the desired compression.
In order to download the data we need to pass the query as the _`-s`_ argument. Additionally we need to specify where the query needs to be asked to. This is done using the `-e` argument. Furthermore if we want to convert the files to _.nt_ we need to specify if in the _`-f`_ parameter and finally we need to tell the client the desired compression.

```
java -jar target/databus-client-v2.1-beta.jar \
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/publish-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ and then copy raw links to file data:

For example a link to our readme as of July 2023 will be: [https://raw.githubusercontent.com/dbpedia/databus/68f976e29e2db15472f1b664a6fd5807b88d1370/README.md](https://raw.githubusercontent.com/dbpedia/databus/68f976e29e2db15472f1b664a6fd5807b88d1370/README.md)

**!NOTE! If you use links referring not to commit, but to branch, the files there may be changing over time, which will break corrspondence with the file hashes stored in Databus**
**!NOTE! If you use links referring not to commit, but to branch, the files there may be changing over time, which will break correspondence with the file hashes stored in Databus**

#### Google Drive

Expand Down
2 changes: 1 addition & 1 deletion docs/model/how-to.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ The metadata publisher has complete control over the names of the Databus identi

1. **The account name**

* The account name is chosen on account creation, i.e. when registering at the particular Datbaus instane. It is advised to use your personal username or the name of your institution/company. In other words, the account name is the identifier of the data owner/publisher. E.g. DBpedia publishes the regular releases under the account name _dbpedia_.
* The account name is chosen on account creation, i.e. when registering at the particular Datbaus instance. It is advised to use your personal username or the name of your institution/company. In other words, the account name is the identifier of the data owner/publisher. E.g. DBpedia publishes the regular releases under the account name _dbpedia_.

2. **The group name**

Expand Down
2 changes: 1 addition & 1 deletion docs/mods.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ While the [Databus Model](model.md) is quite minimal and supports only necessary

There are currently some basic examples for Databus Mods, applicable to various file types, showcasing for what Databus Mods can be used:

1. Mimetype Mod: On the publishing of any file, this mod finds the correspnding mimetype and saves it
1. Mimetype Mod: On the publishing of any file, this mod finds the corresponding mimetype and saves it
2. VOID Mod: Collects [VOID](https://www.w3.org/TR/void/) metadata for RDF files and saves them in an SPARQL endpoint.
3. Filemetrics Mod: Collects some addidional metrics not captured by the minimal model for any file, e.g. checking if it is sorted, the uncompressed size and some more

Expand Down
2 changes: 1 addition & 1 deletion docs/quickstart-examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ This site describes the minimal required metadata for publishing a dataset (meta
}
```

If the Databus should NOT infer a certain metadatum (for example not auto-generating the `abtract` from the `description` field), it can be set explicitly and the Databus will accept it (if it fits its criteria). For a full list of inferrable properties check out the [autocompletion page](/docs/auto-completion.md)
If the Databus should NOT infer a certain metadatum (for example not auto-generating the `abstract` from the `description` field), it can be set explicitly and the Databus will accept it (if it fits its criteria). For a full list of inferable properties check out the [autocompletion page](/docs/auto-completion.md)

#### Property Description

Expand Down
2 changes: 1 addition & 1 deletion docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ license can be any URI at the moment, however, these URIs are not validated and

### Mappings

We implemted a prototypical CSV to RDF conversion with TARQL in the Databus Download Client. We to integrate a full RML engine. At the moment, "[FunMap: Efficient Execution of Functional Mappings for Knowledge Graph Creation](https://arxiv.org/abs/2008.13482)" by DBpedia Member TIB seems the best candidate.&#x20;
We implemented a prototypical CSV to RDF conversion with TARQL in the Databus Download Client. We to integrate a full RML engine. At the moment, "[FunMap: Efficient Execution of Functional Mappings for Knowledge Graph Creation](https://arxiv.org/abs/2008.13482)" by DBpedia Member TIB seems the best candidate.&#x20;

### More Download As Options

Expand Down
8 changes: 4 additions & 4 deletions docs/uniquesellingpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: >-

# 🚀 Unique Features (Draft)

## High-degree of Automation and Re-use
## High-degree of Automation and Reuse

Over three years, we implemented and designed the Databus to build a solid foundation for automating many tedious processes in Knowledge Engineering including upload, download, low-level conversions, quality tests, generating statistics and tracking provenance (includes private key signature for authenticity). We fireproofed it using the DBpedia Snapshot release process. The result is that we **saved 92% cost in work hours while being 10 times more productive** (increase in release frequency). In particular:

Expand All @@ -16,7 +16,7 @@ Over three years, we implemented and designed the Databus to build a solid found
* concepts were inspired by solid frameworks such as Maven, Git/Github, Linked Data, Steam
* License URLs are mapped to [Dalicc](https://dalicc.net) to make them machine-understandable (see [Roadmap](roadmap.md))
* Mappings are collected centrally to transform data and can be re-used (see [Roadmap](roadmap.md))
* Additional metadata is computed by re-usable apps called Mods, which detect compression and format (TrueType Mod), count triples and statistics (VOID Mod), online checks (OnlineCheck Mod), syntax and encoding analysis.
* Additional metadata is computed by reusable apps called Mods, which detect compression and format (TrueType Mod), count triples and statistics (VOID Mod), online checks (OnlineCheck Mod), syntax and encoding analysis.
* We envision a lot more smart Mods that provide a new class of applications built on Databus metadata, such as data search, classifying data with ontologies and contextualization, automatic patching/repair and automatic selection of data to train AI.

## Low-code Application Deployment
Expand Down Expand Up @@ -69,7 +69,7 @@ Analogous to [Feature Creep](https://en.wikipedia.org/wiki/Feature\_creep) in so

## Interoperability

The DataID model is a stable vocabulary bult on DCAT (a W3C vocabulary to describe datasets), DCMI (Dublin Core Metadata Initiative vocabulary to describe resources) and Prov-O (W3C provenance vocabulary) and forms the core of the Databus. DataID and Databus patches several shortcomings of DCAT and DCMI:
The DataID model is a stable vocabulary built on DCAT (a W3C vocabulary to describe datasets), DCMI (Dublin Core Metadata Initiative vocabulary to describe resources) and Prov-O (W3C provenance vocabulary) and forms the core of the Databus. DataID and Databus patches several shortcomings of DCAT and DCMI:

* DataID/Databus contains the "right" kind of information to re-publish to other repositories automatically, including platforms such as Kaggle, CKAN, Zenodo as well as the automated generation of Data Management Plan (DMP) deliverables for e.g. Horizon Europe research projects (these were implemented by third-parties and not included with the Databus software).
* Databus distinguishes between version of a dataset and the dataset artifact, an important individuation that allows to discover updates, i.e. new versions of the same dataset (artifact). Databus also distinguishes between compression (\~ dozen of [lossless compression formats](https://commons.apache.org/proper/commons-compress/)) and [IANA Media Types](https://www.iana.org/assignments/media-types/media-types.xhtml) or mimetypes (over 1400 formats) and other formats that describe the actual format of files.
Expand All @@ -79,7 +79,7 @@ The DataID model is a stable vocabulary bult on DCAT (a W3C vocabulary to descri
## Standardized, de-central and scalable

* implemented using the open W3C standards Linked Data, RDF, SPARQL, OWL, SHACL, DCAT, Prov-O complemented by our own stable DCAT extension DataID.
* Building the Databus with Linked Data and SPARQL easily allows the Databus initative to scale regarding performance and extensibility. Databus provides stable, resolvable identifiers for account, group, dataset, version, distribution, file and collections, so it easy to:
* Building the Databus with Linked Data and SPARQL easily allows the Databus initiative to scale regarding performance and extensibility. Databus provides stable, resolvable identifiers for account, group, dataset, version, distribution, file and collections, so it easy to:
* comprise dataset collections of identifiers from different Databuses residing at different levels in one organisation (personal, team, project, department, whole organisation) and external Databus deployments.
* use identifiers in other applications to provide additional information such as additional metadata, annotations and software-data dependencies
* federate SPARQL queries over the Databus SPARQL endpoint with other Databuses, other SPARQL endpoints using Databus identifiers and Mods (our metadata enrichment extensions)
Expand Down
2 changes: 1 addition & 1 deletion docs/uridesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ The URIs in your input have to follow a specific pattern in order to be accepted
### Artifact URI Rules *(databus:Artifact)*

* An artifact URI has exactly three path segments.
* The first path segment identifiees the publisher, the second segment the group, while the third segment the published artifact.
* The first path segment identifies the publisher, the second segment the group, while the third segment the published artifact.

* An example of a valid artifact URI:* https://databus.example.org/john/animals/cats

Expand Down
2 changes: 1 addition & 1 deletion docs/usage/quickstart-examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ curl -X 'POST' \
}'
```

If the Databus should NOT infer a certain metadatum (for example not auto-generating the `abtract` from the `description` field), it can be set explicitly and the Databus will accept it (if it fits its criteria). For a full list of inferrable properties check out the [autocompletion page](../auto-completion.md)
If the Databus should NOT infer a certain metadatum (for example not auto-generating the `abstract` from the `description` field), it can be set explicitly and the Databus will accept it (if it fits its criteria). For a full list of inferable properties check out the [autocompletion page](../auto-completion.md)

#### Property Description

Expand Down
2 changes: 1 addition & 1 deletion docs/usage/web-interface/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ In the following section, we will cover the user interface of the collection edi
The DBpedia Databus Collections are useful in many ways.

* You can share a specific dataset with your community or colleagues.
* You can re-use dataset others created
* You can reuse dataset others created
* You can plug collections into Databus-ready applications and avoid spending time on the download and setup process
* You can point to a specific piece of data (e.g. for testing) with a single URI in your publications
* You can help others to create data queries more easily
Expand Down
Loading