HTTP worker refactoring (#221)

* SPARQLProtocolWorker is a draft for a better, more reliable worker that is tailored towards SPARQL Protocol. Each worker uses a single HttpClient and handles work completion conditions itself. * Add workerId and ExecutionStats to SPARQLProtocolWorker Refactored SPARQLProtocolWorker to record workerId and execution stats for each worker. WorkerId was added to uniquely identify each worker. An ExecutionStats inner class was created to track start time, duration, HTTP status code, content length, number of bindings, and number of solutions for each worker's task. * "Refactor SPARQLProtocolWorker to handle query streams. This commit changes the query building mechanism within SPARQLProtocolWorker.java, shifting from StringBuilder to InputStream, aiming to support processing of large queries, and reduce overhead from using String for queryID. Now it reads queries directly from QueryHandler's data stream, with modifications to a number of HTTP Request methods to accommodate this change. The refactor also includes addition of new method in Query Handler which returns 'QueryHandle' record—a container for index and InputStream for a query." * Add streaming support for handling large queries Introduced InputStream support in the QueryList and QuerySource to handle large queries more efficiently. Changes have been made to IndexedQueryReader, QuerySource, QueryHandler, and several other classes to accommodate the new streaming feature. Previously, all queries were loaded into memory which might cause OutOfMemoryError for large queries. It still depends on the SPARQL worker used if queries are streamed to the client. * Refactored BigByteArrayOutputStream * Hashing and large response body support for SPARQLProtocolWorker * remove dangling javadoc comment * Scaffold ResponseBodyProcessor. This class keeps track of already handled responses to avoid repeated processing. It uses a concurrent hash map to store the responses identified by unique keys. This approach aims to improve the efficiency of handling response bodies in multi-threaded scenarios. * Use unsynchronized ByteArrayOutputStream for BigByteArrayInput/BigArrayOutputStream and complete rewrite of BigByteArrayInputStream. This should increase the performance of both streams significantly. * Add Language Processor and SparqlJsonResultCountingParser Implemented the AbstractLanguageProcessor interface to process InputStreams. A new SAX Parser (SaxSparqlJsonResultCountingParser) was introduced for SPARQL JSON results, returning solutions, bound values, and variables. * Completed ResponseBodyProcessor and integrated it into SPARQLProtocolWorker * Worker integration and removal of a lot of code * small fixes * changes to the SPARQLProtocolWorker * delegated executeQuery method * reuse bbaos if not consumed * removed assert for non-differing content-length header value and actual content length * better logging for malformed url * Add basic logging for Suite class * remove JUnit 4 and add surefire plugin The surefire plugin is used for better control over the available system resources for the test, because the BigByteArrayStream tests can take a lot of them. * update iguana-schema.json * Update config file validation and change suiteID generation This also removes some unused redundant code. The suiteID has also been changed to a string type, that consists of an epoch timestamp in seconds and the hashcode of the configuration file. * Remove CLIProcessManager.java * Update schema file and re-enable tests The validation function has also been made public, for better testing. * Remove test files for IndexQueryReader See issue #214. * Add start and end-time for each worker. Adjusted the test as well and integrate it in the StresstestResultProcessor and Storages. * Remove unused dependencies * Document possible problem with the SPARQLProtocolWorker and the connected client --------- Co-authored-by: Alexander Bigerl <[email protected]> Co-authored-by: Alexander Bigerl <[email protected]>
dice-group · Nov 14, 2023 · 82dd89e · 82dd89e
1 parent f44b101
commit 82dd89e
Show file tree

Hide file tree

Showing 218 changed files with 7,377 additions and 10,250 deletions.
diff --git a/docs/develop/extend-metrics.md b/docs/develop/extend-metrics.md
@@ -46,7 +46,6 @@ The following gives you an examples on how to work with the `data` parameter:
     }
 
     @Override
-    @Nonnull
     public Model createMetricModel(StresstestMetadata task, Map<String, List<QueryExecutionStats>> data) {
         for (String queryID : task.queryIDS()) {
             // This list contains every query execution statistics of one query from

diff --git a/docs/develop/extend-queryhandling.md b/docs/develop/extend-queryhandling.md
@@ -66,7 +66,7 @@ implements the following methods:
 public class MyQuerySource extends QuerySource {
 	public MyQuerySource(String filepath) {
 		// your constructor
-		// filepath is the value, specified in the "location"-key inside the configuration file
+		// filepath is the value, specified in the "path"-key inside the configuration file
 	}
 
 	@Override

diff --git a/docs/usage/configuration.md b/docs/usage/configuration.md
@@ -316,14 +316,14 @@ For example, instead of:
 
 ```yaml
 storages:
-   - className: "org.aksw.iguana.rp.storage.impl.NTFileStorage"
+   - className: "org.aksw.iguana.rp.storage.impl.RDFFileStorage"
 ```
 
 you can use the shortname NTFileStorage:
 
 ```yaml
 storages:
-   - className: "NTFileStorage"
+   - className: "RDFFileStorage"
 ```
 
 

diff --git a/example-suite.yml b/example-suite.yml
@@ -7,6 +7,7 @@ connections:
     user: "dba"
     password: "dba"
     endpoint: "http://localhost:8890/sparql"
+    dataset: DatasetName
   - name: "Virtuoso6"
     user: "dba"
     password: "dba"
@@ -19,49 +20,63 @@ connections:
     updateEndpoint: "http://localhost:3030/ds/update"
 
 tasks:
-  - className: "org.aksw.iguana.cc.tasks.impl.Stresstest"
-    configuration:
-      # 1 hour (time Limit is in ms)
-      timeLimit: 360000
-      # warmup is optional
-      warmup:
-        # 1 minutes (is in ms)
-        timeLimit: 600000
-        workers:
-          - threads: 1
-            className: "HttpGetWorker"
-            queries:
-              location: "queries_warmup.txt"
-            timeOut: 180000
-      workers:
-        - threads: 16
-          className: "HttpGetWorker"
-          queries:
-            location: "queries_easy.txt"
-          timeOut: 180000
-        - threads: 4
-          className: "HttpGetWorker"
-          queries:
-            location: "queries_complex.txt"
-          fixedLatency: 100
-          gaussianLatency: 50
-          parameterName: "query"
-          responseType: "application/sparql-results+json"
-
-# both are optional and can be used to load and start as well as stop the connection before and after every task
-preScriptHook: "./triplestores/{{connection}}/start.sh {{dataset.file}} {{dataset.name}} {{taskID}}"
-postScriptHook: "./triplestores/{{connection}}/stop.sh"
-
-#optional otherwise the same metrics will be used as default
-metrics:
-  - className: "QMPH"
-  - className: "QPS"
-  - className: "NoQPH"
-  - className: "AvgQPS"
-  - className: "NoQ"
+  # 1 hour (time Limit is in ms)
+  - type: stresstest
+    warmupWorkers:
+      # 1 minutes (is in ms)
+    - type: SPARQLProtocolWorker
+      number: 16
+      queries:
+        path: "/home/bigerl/IdeaProjects/IGUANA/LICENSE"
+      timeout: 0.02s
+      connection: Virtuoso7
+      completionTarget:
+        duration: 1000s
+    workers:
+      - type: "SPARQLProtocolWorker"
+        number: 16
+        queries:
+          path: "/home/bigerl/IdeaProjects/IGUANA/LICENSE"
+        timeout: 3m
+        connection: Virtuoso7
+        completionTarget:
+          duration: 1000s
+        requestType: get query
+      - number: 4
+        type: "SPARQLProtocolWorker"
+        connection: Virtuoso7
+        completionTarget:
+          duration: 1000s
+        queries:
+          path: "/home/bigerl/IdeaProjects/IGUANA/LICENSE"
+        timeout: 100s
+        acceptHeader: "application/sparql-results+json"
+  - type: stresstest
+    workers:
+      - type: "SPARQLProtocolWorker"
+        connection: Virtuoso7
+        number: 16
+        requestType: get query
+        queries:
+          path: "/home/bigerl/IdeaProjects/IGUANA/LICENSE"
+        timeout: 180s
+        completionTarget:
+          duration: 1000s
+      - number: 4
+        requestType: get query
+        connection: Virtuoso7
+        completionTarget:
+          duration: 1000s
+        type: "SPARQLProtocolWorker"
+        queries:
+          path: "/home/bigerl/IdeaProjects/IGUANA/LICENSE"
+        timeout: 100s
+        parseResults: true
+        acceptHeader: "application/sparql-results+json"
 
 #optional otherwise an nt file will be used
 storages:
-  - className: "NTFileStorage"
+  - type: "RDF file"
+    path: "some.ttl"
     #configuration:
       #fileName: YOUR_RESULT_FILE_NAME.nt
diff --git a/pom.xml b/pom.xml
@@ -46,7 +46,7 @@
         <maven.compile.target>17</maven.compile.target>
         <maven.compile.source>17</maven.compile.source>
 
-        <log4j.version>2.17.1</log4j.version>
+        <log4j.version>2.19.0</log4j.version>
     </properties>
 
     <distributionManagement>
@@ -78,27 +78,11 @@
             <artifactId>jena-querybuilder</artifactId>
             <version>${jena.version}</version>
         </dependency>
-        <dependency>
-            <groupId>junit</groupId>
-            <artifactId>junit</artifactId>
-            <version>4.13.1</version>
-            <scope>test</scope>
-        </dependency>
         <dependency>
             <groupId>org.apache.httpcomponents</groupId>
             <artifactId>httpclient</artifactId>
             <version>4.5.13</version>
         </dependency>
-        <dependency>
-            <groupId>commons-configuration</groupId>
-            <artifactId>commons-configuration</artifactId>
-            <version>1.10</version>
-        </dependency>
-        <dependency>
-            <groupId>org.apache.commons</groupId>
-            <artifactId>commons-exec</artifactId>
-            <version>1.3</version>
-        </dependency>
         <dependency>
             <groupId>org.apache.logging.log4j</groupId>
             <artifactId>log4j-slf4j-impl</artifactId>
@@ -119,25 +103,15 @@
             <artifactId>log4j-1.2-api</artifactId>
             <version>${log4j.version}</version>
         </dependency>
-        <dependency>
-            <groupId>org.simpleframework</groupId>
-            <artifactId>simple</artifactId>
-            <version>5.1.6</version>
-        </dependency>
-        <dependency>
-            <groupId>org.reflections</groupId>
-            <artifactId>reflections</artifactId>
-            <version>0.9.9</version>
-        </dependency>
-        <dependency>
-            <groupId>commons-codec</groupId>
-            <artifactId>commons-codec</artifactId>
-            <version>1.15</version>
-        </dependency>
         <dependency>
             <groupId>com.fasterxml.jackson.dataformat</groupId>
             <artifactId>jackson-dataformat-yaml</artifactId>
-            <version>2.11.2</version>
+            <version>2.12.5</version>
+        </dependency>
+        <dependency>
+            <groupId>com.fasterxml.jackson.datatype</groupId>
+            <artifactId>jackson-datatype-jsr310</artifactId>
+            <version>2.12.5</version>
         </dependency>
         <dependency>
             <groupId>com.networknt</groupId>
@@ -161,17 +135,47 @@
             <version>5.9.2</version>
             <scope>test</scope>
         </dependency>
-        <dependency>
-            <groupId>org.junit.vintage</groupId>
-            <artifactId>junit-vintage-engine</artifactId>
-            <version>5.9.2</version>
-            <scope>test</scope>
-        </dependency>
         <dependency>
             <groupId>com.opencsv</groupId>
             <artifactId>opencsv</artifactId>
             <version>5.7.1</version>
         </dependency>
+        <dependency>
+            <groupId>org.lz4</groupId>
+            <artifactId>lz4-pure-java</artifactId>
+            <version>1.8.0</version>
+        </dependency>
+        <dependency>
+            <groupId>org.apache.hbase</groupId>
+            <artifactId>hbase-common</artifactId>
+            <version>2.5.5</version>
+        </dependency>
+        <dependency>
+            <groupId>com.beust</groupId>
+            <artifactId>jcommander</artifactId>
+            <version>1.82</version>
+        </dependency>
+        <dependency>
+            <groupId>com.github.tomakehurst</groupId>
+            <artifactId>wiremock-jre8-standalone</artifactId>
+            <version>2.35.0</version>
+            <scope>test</scope>
+        </dependency>
+        <dependency>
+            <groupId>org.apache.maven.plugins</groupId>
+            <artifactId>maven-surefire-plugin</artifactId>
+            <version>3.1.2</version>
+        </dependency>
+        <dependency>
+            <groupId>org.springframework.data</groupId>
+            <artifactId>spring-data-commons</artifactId>
+            <version>3.1.2</version>
+        </dependency>
+        <dependency>
+            <groupId>org.springframework</groupId>
+            <artifactId>spring-context</artifactId>
+            <version>6.0.11</version>
+        </dependency>
     </dependencies>
 
 
@@ -194,6 +198,16 @@
                 </configuration>
             </plugin>
 
+            <plugin>
+                <groupId>org.apache.maven.plugins</groupId>
+                <artifactId>maven-surefire-plugin</artifactId>
+                <version>3.1.2</version>
+                <configuration>
+                    <argLine>-Xmx16384M</argLine>
+                </configuration>
+            </plugin>
+
+
             <plugin>
                 <groupId>org.apache.maven.plugins</groupId>
                 <artifactId>maven-shade-plugin</artifactId>
@@ -210,7 +224,8 @@
                         </goals>
                         <configuration>
                             <transformers>
-                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
+                                <transformer
+                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                     <mainClass>org.aksw.iguana.cc.controller.MainController</mainClass>
                                 </transformer>
                             </transformers>