Skip to content

Commit

Permalink
HTTP worker refactoring (#221)
Browse files Browse the repository at this point in the history
* SPARQLProtocolWorker is a draft for a better, more reliable worker that is tailored towards SPARQL Protocol. Each worker uses a single HttpClient and handles work completion conditions itself.

* Add workerId and ExecutionStats to SPARQLProtocolWorker

Refactored SPARQLProtocolWorker to record workerId and execution stats for each worker. WorkerId was added to uniquely identify each worker. An ExecutionStats inner class was created to track start time, duration, HTTP status code, content length, number of bindings, and number of solutions for each worker's task.

* "Refactor SPARQLProtocolWorker to handle query streams.

This commit changes the query building mechanism within SPARQLProtocolWorker.java, shifting from StringBuilder to InputStream, aiming to support processing of large queries, and reduce overhead from using String for queryID. Now it reads queries directly from QueryHandler's data stream, with modifications to a number of HTTP Request methods to accommodate this change. The refactor also includes addition of new method in Query Handler which returns 'QueryHandle' record—a container for index and InputStream for a query."

* Add streaming support for handling large queries

Introduced InputStream support in the QueryList and QuerySource to handle large queries more efficiently. Changes have been made to IndexedQueryReader, QuerySource, QueryHandler, and several other classes to accommodate the new streaming feature. Previously, all queries were loaded into memory which might cause OutOfMemoryError for large queries. It still depends on the SPARQL worker used if queries are streamed to the client.

* Refactored BigByteArrayOutputStream

* Hashing and large response body support for SPARQLProtocolWorker

* remove dangling javadoc comment

* Scaffold ResponseBodyProcessor. This class keeps track of already handled responses to avoid repeated processing. It uses a concurrent hash map to store the responses identified by unique keys. This approach aims to improve the efficiency of handling response bodies in multi-threaded scenarios.

* Use unsynchronized ByteArrayOutputStream for BigByteArrayInput/BigArrayOutputStream and complete rewrite of BigByteArrayInputStream. This should increase the performance of both streams significantly.

* Add Language Processor and SparqlJsonResultCountingParser

Implemented the AbstractLanguageProcessor interface to process InputStreams. A new SAX Parser (SaxSparqlJsonResultCountingParser) was introduced for SPARQL JSON results, returning solutions, bound values, and variables.

* Completed ResponseBodyProcessor and integrated it into SPARQLProtocolWorker

* Worker integration and removal of a lot of code

* small fixes

* changes to the SPARQLProtocolWorker

* delegated executeQuery method
* reuse bbaos if not consumed
* removed assert for non-differing content-length header value and actual content length
* better logging for malformed url

* Add basic logging for Suite class

* remove JUnit 4 and add surefire plugin

The surefire plugin is used for better control over the available system resources for the test, because the BigByteArrayStream tests can take a lot of them.

* update iguana-schema.json

* Update config file validation and change suiteID generation

This also removes some unused redundant code. The suiteID has also been changed to a string type, that consists of an epoch timestamp in seconds and the hashcode of the configuration file.

* Remove CLIProcessManager.java

* Update schema file and re-enable tests

The validation function has also been made public, for better testing.

* Remove test files for IndexQueryReader

See issue #214.

* Add start and end-time for each worker.

Adjusted the test as well and integrate it in the StresstestResultProcessor and Storages.

* Remove unused dependencies

* Document possible problem with the SPARQLProtocolWorker and the connected client

---------

Co-authored-by: Alexander Bigerl <[email protected]>
Co-authored-by: Alexander Bigerl <[email protected]>
  • Loading branch information
3 people authored Nov 14, 2023
1 parent f44b101 commit 82dd89e
Show file tree
Hide file tree
Showing 218 changed files with 7,377 additions and 10,250 deletions.
1 change: 0 additions & 1 deletion docs/develop/extend-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ The following gives you an examples on how to work with the `data` parameter:
}

@Override
@Nonnull
public Model createMetricModel(StresstestMetadata task, Map<String, List<QueryExecutionStats>> data) {
for (String queryID : task.queryIDS()) {
// This list contains every query execution statistics of one query from
Expand Down
2 changes: 1 addition & 1 deletion docs/develop/extend-queryhandling.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ implements the following methods:
public class MyQuerySource extends QuerySource {
public MyQuerySource(String filepath) {
// your constructor
// filepath is the value, specified in the "location"-key inside the configuration file
// filepath is the value, specified in the "path"-key inside the configuration file
}

@Override
Expand Down
4 changes: 2 additions & 2 deletions docs/usage/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -316,14 +316,14 @@ For example, instead of:

```yaml
storages:
- className: "org.aksw.iguana.rp.storage.impl.NTFileStorage"
- className: "org.aksw.iguana.rp.storage.impl.RDFFileStorage"
```

you can use the shortname NTFileStorage:

```yaml
storages:
- className: "NTFileStorage"
- className: "RDFFileStorage"
```


Expand Down
97 changes: 56 additions & 41 deletions example-suite.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ connections:
user: "dba"
password: "dba"
endpoint: "http://localhost:8890/sparql"
dataset: DatasetName
- name: "Virtuoso6"
user: "dba"
password: "dba"
Expand All @@ -19,49 +20,63 @@ connections:
updateEndpoint: "http://localhost:3030/ds/update"

tasks:
- className: "org.aksw.iguana.cc.tasks.impl.Stresstest"
configuration:
# 1 hour (time Limit is in ms)
timeLimit: 360000
# warmup is optional
warmup:
# 1 minutes (is in ms)
timeLimit: 600000
workers:
- threads: 1
className: "HttpGetWorker"
queries:
location: "queries_warmup.txt"
timeOut: 180000
workers:
- threads: 16
className: "HttpGetWorker"
queries:
location: "queries_easy.txt"
timeOut: 180000
- threads: 4
className: "HttpGetWorker"
queries:
location: "queries_complex.txt"
fixedLatency: 100
gaussianLatency: 50
parameterName: "query"
responseType: "application/sparql-results+json"

# both are optional and can be used to load and start as well as stop the connection before and after every task
preScriptHook: "./triplestores/{{connection}}/start.sh {{dataset.file}} {{dataset.name}} {{taskID}}"
postScriptHook: "./triplestores/{{connection}}/stop.sh"

#optional otherwise the same metrics will be used as default
metrics:
- className: "QMPH"
- className: "QPS"
- className: "NoQPH"
- className: "AvgQPS"
- className: "NoQ"
# 1 hour (time Limit is in ms)
- type: stresstest
warmupWorkers:
# 1 minutes (is in ms)
- type: SPARQLProtocolWorker
number: 16
queries:
path: "/home/bigerl/IdeaProjects/IGUANA/LICENSE"
timeout: 0.02s
connection: Virtuoso7
completionTarget:
duration: 1000s
workers:
- type: "SPARQLProtocolWorker"
number: 16
queries:
path: "/home/bigerl/IdeaProjects/IGUANA/LICENSE"
timeout: 3m
connection: Virtuoso7
completionTarget:
duration: 1000s
requestType: get query
- number: 4
type: "SPARQLProtocolWorker"
connection: Virtuoso7
completionTarget:
duration: 1000s
queries:
path: "/home/bigerl/IdeaProjects/IGUANA/LICENSE"
timeout: 100s
acceptHeader: "application/sparql-results+json"
- type: stresstest
workers:
- type: "SPARQLProtocolWorker"
connection: Virtuoso7
number: 16
requestType: get query
queries:
path: "/home/bigerl/IdeaProjects/IGUANA/LICENSE"
timeout: 180s
completionTarget:
duration: 1000s
- number: 4
requestType: get query
connection: Virtuoso7
completionTarget:
duration: 1000s
type: "SPARQLProtocolWorker"
queries:
path: "/home/bigerl/IdeaProjects/IGUANA/LICENSE"
timeout: 100s
parseResults: true
acceptHeader: "application/sparql-results+json"

#optional otherwise an nt file will be used
storages:
- className: "NTFileStorage"
- type: "RDF file"
path: "some.ttl"
#configuration:
#fileName: YOUR_RESULT_FILE_NAME.nt
95 changes: 55 additions & 40 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
<maven.compile.target>17</maven.compile.target>
<maven.compile.source>17</maven.compile.source>

<log4j.version>2.17.1</log4j.version>
<log4j.version>2.19.0</log4j.version>
</properties>

<distributionManagement>
Expand Down Expand Up @@ -78,27 +78,11 @@
<artifactId>jena-querybuilder</artifactId>
<version>${jena.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
<dependency>
<groupId>commons-configuration</groupId>
<artifactId>commons-configuration</artifactId>
<version>1.10</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-exec</artifactId>
<version>1.3</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
Expand All @@ -119,25 +103,15 @@
<artifactId>log4j-1.2-api</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.simpleframework</groupId>
<artifactId>simple</artifactId>
<version>5.1.6</version>
</dependency>
<dependency>
<groupId>org.reflections</groupId>
<artifactId>reflections</artifactId>
<version>0.9.9</version>
</dependency>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.15</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-yaml</artifactId>
<version>2.11.2</version>
<version>2.12.5</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.datatype</groupId>
<artifactId>jackson-datatype-jsr310</artifactId>
<version>2.12.5</version>
</dependency>
<dependency>
<groupId>com.networknt</groupId>
Expand All @@ -161,17 +135,47 @@
<version>5.9.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.vintage</groupId>
<artifactId>junit-vintage-engine</artifactId>
<version>5.9.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>5.7.1</version>
</dependency>
<dependency>
<groupId>org.lz4</groupId>
<artifactId>lz4-pure-java</artifactId>
<version>1.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>2.5.5</version>
</dependency>
<dependency>
<groupId>com.beust</groupId>
<artifactId>jcommander</artifactId>
<version>1.82</version>
</dependency>
<dependency>
<groupId>com.github.tomakehurst</groupId>
<artifactId>wiremock-jre8-standalone</artifactId>
<version>2.35.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.1.2</version>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-commons</artifactId>
<version>3.1.2</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-context</artifactId>
<version>6.0.11</version>
</dependency>
</dependencies>


Expand All @@ -194,6 +198,16 @@
</configuration>
</plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.1.2</version>
<configuration>
<argLine>-Xmx16384M</argLine>
</configuration>
</plugin>


<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
Expand All @@ -210,7 +224,8 @@
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.aksw.iguana.cc.controller.MainController</mainClass>
</transformer>
</transformers>
Expand Down
Loading

0 comments on commit 82dd89e

Please sign in to comment.