Skip to content

Commit

Permalink
Add query patterns (#273)
Browse files Browse the repository at this point in the history
* Add new QuerySource

* Add pattern queries

* Update tests

* Update schema

* Update test file

* Update documentation

* Fix default value of config

* Remove exception

* Fix hashcode

* Revert "Fix hashcode"

This reverts commit 8e2a09c.

* Revert "Remove exception"

This reverts commit f9ce17a.

* Refactoring of QueryList

* QueryList is now an interface
* added new abstract class FileBasedQueryList
* old FileBasedQueryList is now FileReadingQueryList
* InMemQueryList.java is no FileCachingQueryList
* add StringListQueryList
* remove StringListQuerySource.java

* Remove logging statement

* Update doc

* Update graalvm suite

* Fix native compilation

* Update logging statement

* Rename query pattern caching to save

* Update doc

* Update doc

* Fix most change requests

* Fix test

* Enable stdout for graal script

* Fix schema

* Fix schema 2

* Cleanup

* Add detection of normal queries

* Update doc

* Generate random variable if it already exists

* Add additional variable prefix

Co-authored-by: Alexander Bigerl <[email protected]>

* Change placeholder pattern

* Add hash to instance file

* Update doc

* don't mengle file and limit filename identifier

---------

Co-authored-by: Alexander Bigerl <[email protected]>
  • Loading branch information
nck-mlcnv and bigerl authored Sep 13, 2024
1 parent 38064d9 commit 956a59d
Show file tree
Hide file tree
Showing 19 changed files with 529 additions and 61 deletions.
52 changes: 52 additions & 0 deletions docs/configuration/queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ The `queries` property is an object that contains the following properties:
| order | no | `linear` | The order in which the queries are executed. If set to `linear` the queries will be executed in their order inside the file. If `format` is set to `folder`, queries will be sorted by their file name first. | `random` or `linear` |
| seed | no | `0` | The seed for the random number generator that selects the queries. If multiple workers use the same query handler, their seed will be the sum of the given seed and their worker id. | `12345` |
| lang | no | `SPARQL` | Not used for anything at the moment. | |
| template | no | | If set, queries from `path` will be treated as query templates. See [Query Templates](#query-templates) for more information. | |

## Format

Expand Down Expand Up @@ -98,3 +99,54 @@ tasks:
lang: "SPARQL"
# ... additional worker properties
```

## Query Templates
Query templates are queries containing placeholders for some terms.
Replacement candidates are identified by querying a given endpoint.
This is done in a way that the resulting queries will yield results against endpoints with the same data.

The placeholders are written in the form of `%%[a-zA-Z0-9_]+%%`, which means that any character sequence consisting
of letters, numbers, and underscores, enclosed by `%%` will be interpreted as a placeholder.
The query templates originated from WatDiv,
where the placeholders are of [similar form](https://dsg.uwaterloo.ca/watdiv/basic-testing.shtml).
If the placeholder name is equal to a variable name in the query, the placeholder will not be assigned
the same variable name during candidate generation.

Query templates and normal queries can be mixed in the same file or folder.

An exemplary template:
`SELECT * WHERE {?s %%var1%% ?o . ?o <http://exa.com> %%var2%%}`

This template will then be converted to:
`SELECT ?var1 ?var2 WHERE {?s ?var1 ?o . ?o <http://exa.com> ?var2}`

The SELECT query will then be requested from the given sparql endpoint (e.g DBpedia).
The solutions for this query are used to instantiate the template.
The results may look like the following:
- `SELECT * WHERE {?s <http://prop/1> ?o . ?o <http://exa.com> "123"}`
- `SELECT * WHERE {?s <http://prop/1> ?o . ?o <http://exa.com> "12"}`
- `SELECT * WHERE {?s <http://prop/2> ?o . ?o <http://exa.com> "1234"}`

### Configuration
The `template` attribute has the following properties:

| property | required | default | description | example |
|----------|----------|---------|---------------------------------------------------------------------|-----------------------------|
| endpoint | yes | | The endpoint to query. | `http://dbpedia.org/sparql` |
| limit | no | `2000` | The maximum number of instances per query template. | `100` |
| save | no | `true` | If set to `true`, query instances will be saved in a separate file. | `false` |

If the `save` attribute is set to `true`,
the instances will be saved in a separate file in the same directory as the query templates.
If the query templates are stored in a folder, the instances will be saved in the parent directory.

Example of query configuration with query templates:
```yaml
queries:
path: "./example/suite/queries/"
format: "folder"
template:
endpoint: "http://dbpedia.org/sparql"
limit: 100
save: true
```
6 changes: 5 additions & 1 deletion example-suite.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,11 @@ tasks:
number: 16
requestType: post query
queries:
path: "./example/queries.txt"
path: "./example/query_pattern.txt"
pattern:
endpoint: "https://dbpedia.org/sparql"
limit: 1000
save: false
timeout: 180s
completionTarget:
duration: 1000s
Expand Down
2 changes: 1 addition & 1 deletion graalvm/queries.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
placeholder
SELECT * WHERE {?s %%var1%% ?o . ?o %%var3%% %%var2%%}
4 changes: 4 additions & 0 deletions graalvm/suite.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ tasks:
order: "random"
seed: 123
lang: "SPARQL"
template:
endpoint: "http://dbpedia.org/sparql"
limit: 1
save: false
timeout: 2s
connection: Blazegraph
completionTarget:
Expand Down
2 changes: 2 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,8 @@
-O3
-H:-UseCompressedReferences
-H:+UnlockExperimentalVMOptions
--enable-http
--enable-https
</buildArgs>
<metadataRepository>
<enabled>true</enabled>
Expand Down
29 changes: 26 additions & 3 deletions schema/iguana-schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -183,8 +183,8 @@
}
},
"required": [
"type",
"directory"
"type",
"directory"
],
"title": "CSVStorage"
},
Expand Down Expand Up @@ -335,9 +335,29 @@
"type": "object",
"unevaluatedProperties": false,
"required": [
"duration"
"duration"
]
},
"Template": {
"type": "object",
"additionalProperties": false,
"properties": {
"endpoint": {
"type": "string"
},
"limit": {
"type": "integer",
"minimum": 1
},
"save": {
"type": "boolean"
}
},
"required": [
"endpoint"
],
"title": "Template"
},
"QueryMixes": {
"properties": {
"number": {
Expand Down Expand Up @@ -379,6 +399,9 @@
"lang": {
"type": "string",
"enum": [ "", "SPARQL" ]
},
"template": {
"$ref": "#/definitions/Template"
}
},
"required": [
Expand Down
Loading

0 comments on commit 956a59d

Please sign in to comment.