Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtuoso S1TAT Error Query did not complete due to ANYTIME timeout #1340

Closed
nicolastoira opened this issue Jan 27, 2025 · 5 comments
Closed

Comments

@nicolastoira
Copy link

I'm using the SPARQLWrapper Python library to call the Virtuoso /sparql endpoint to trigger a query against the database. For example:

sparql = SPARQLWrapper(endpoint="http://graph:8890/sparql", returnFormat=return_format, defaultGraph=default_graph_uri)
        sparql.setQuery(query)
        if http_method == HTTPMethod.POST:
            sparql.setMethod(POST)

The problem I'm facing is related to long running queries. I'm not able to define a consistent timeout with the Virtuoso parameters. I tried to configure MaxQueryExecutionTime and VDBDisconnectTimeout to different values but it does not change the behavior. After approximately 16 minutes, I assume after 1000 seconds, the SPARQLWrapper call returns an EbndPointInternalError with status code 500 and message Virtuoso S1TAT Error Query did not complete due to ANYTIME timeout.

I also tried to increase the timeout with sparql.setTimeout(timeout=3600) on the SPARQLWrapper object but that didn't change the results.

What I would like to understand is how this particular timeout on the call of the Virtuoso /sparql endpoints is configured. Which value should I change and how? What changing the above parameters does not lead to any visible effect?

Thanks a lot.

@TallTed
Copy link
Collaborator

TallTed commented Jan 27, 2025

I'm guessing that the Virtuoso instance you're querying has a lower timeout setting than you're trying to set on the query side. (Server-side settings take precedence.)

I will also note that the client-side value is set in milliseconds — i.e., timeout=3600 is 3.6 seconds, timeout=1000 is 1 second. A timeout of 16 minutes would be 16*60*1000 or 960000 milliseconds.

I would normally first refer you to this OpenLink Community Forum post on the subject , but there is something odd happening there. (Admins have been alerted!) (Admins have fixed it!)

In the meantime, you You can see that same content in the Internet Archive.

Please let us know if that information is sufficient to explain what's going on, and to resolve it if you are (or are in contact with) the administrator of that Virtuoso instance.

@timhaynesopenlink
Copy link
Collaborator

Hi,

I think there is some confusion as to what timeout applies where.

First, VDBDisconnectTimeout is for the virtual database layer - this is not applicable to an incoming sparql query unless it involves the VDB layer behind the scenes. That can be set aside for now.

Second, MaxQueryExecutionTime is a server-side cap on the query execution time. According to the docs,

If SPARQL Virtuoso configuration file contains MaxQueryExecutionTime parameter and its value is greater than or equal to 1000 then the actual "anytime" timeout is the minimum of the requested value and the value in the configuration file.

Third, Virtuoso's Anytime query system allows the client to ask for a desired max runtime via the ?timeout= URL parameter. By default this is 0, ie no specific upper bound.

The behaviour matrix is:

  • if the client either sets ?timeout=0 or doesn't set it, and the server has a MaxQueryExecutionTime set, we see it as asking for all or nothing, so you get either HTTP 200 or 500 back
  • if the client sets a ?timeout=something, and the server has a MaxQueryExecutionTime set, we see it as being open to receiving a partial resultset, so you get either HTTP 200 or 206
  • if the server MaxQueryExecutionTime is unset or 0, it allows potentially infinite runtime up to the ?timeout= which may also be 0 or unset in which case the query will run indefinitely.

For live public production purposes, we advise you to specify timeouts.

The short answer here is: for development / internal use, just comment-out the MaxQueryExecutionTime in virtuoso.ini and you should be OK.

The complication is that it seems SPARQLWrapper has a bug preventing you from enjoying timeouts to the full.

I added a debug to dump the request object:

diff --git a/SPARQLWrapper/Wrapper.py b/SPARQLWrapper/Wrapper.py
index e4ec6c0..c350548 100644
--- a/SPARQLWrapper/Wrapper.py
+++ b/SPARQLWrapper/Wrapper.py
@@ -36,6 +36,8 @@ from urllib.request import (
 )  # don't change the name: tests override it
 from xml.dom.minidom import Document, parse

+import json
+
 from SPARQLWrapper import __agent__

 if TYPE_CHECKING:
@@ -920,6 +922,9 @@ class SPARQLWrapper(object):
         """
         request = self._createRequest()

+        print("DEBUG: request=", request)
+        print(json.dumps(request.__dict__))
+
         try:
             if self.timeout:
                 response = urlopener(request, timeout=self.timeout)

which shows two things:

  1. SPARQLWrapper thinks a query timeout applies to the HTTP(S) connection

  2. the outgoing query URL is of the form

DEBUG: request= <urllib.request.Request object at 0x7f8b31ce28d0>
{"_full_url": "http://localhost:8889/sparql?query=%0A++++PREFIX+gts%3A+%3Chttp%3A//resource.geosciml.org/ontology/timescale/gts%23%3E%0A%0A++++select+count%28%2A%29+where+%7B%3Fs+%3Fp+%3Fo+%7D%0A++++&format=json&output=json&results=json", "fragment": null, "type": "http", "host": "localhost:8889", "selector": "/sparql?query=%0A++++PREFIX+gts%3A+%3Chttp%3A//resource.geosciml.org/ontology/timescale/gts%23%3E%0A%0A++++select+count%28%2A%29+where+%7B%3Fs+%3Fp+%3Fo+%7D%0A++++&format=json&output=json&results=json", "headers": {"User-agent": "sparqlwrapper 2.0.1a0 (rdflib.github.io/sparqlwrapper)", "Accept": "application/x-sparqlstar-results+json,application/sparql-results+json,application/json,text/javascript,application/javascript"}, "unredirected_hdrs": {}, "_data": null, "_tunnel_host": null, "origin_req_host": "localhost", "unverifiable": false}

which lacks mention of the word "timeout".

Accordingly, I have raised an issue with SPARQLWrapper to ask for Virtuoso Anytime support via ?timeout= in the URLs, one way or another.

HTH

@nicolastoira
Copy link
Author

nicolastoira commented Jan 28, 2025

Thank you for the explanations and recommendations. I did some extra tests and I'm still seeing a strange behavior. These are the cases I tested:

  • MayQueryExecutionTime is configured to a low value, e.g. 60 seconds. The triggered query does not terminate with a timeout after 60 seconds but times out after 16 minutes as before.
  • MaxQueryExecutionTime is configured to a high value, e.g. 3600 seconds. As explained before, this has no effect and the query still times out after about 16 minutes.
  • MayQueryExecutionTime is not configured when starting the database. That seems to remove the 16 minutes time out at least and the query returns a result after about 20 minutes.

Therefore, removing the parameter seems at least to remove the upper limit defined by the default timeout. Still it would be useful to define a configurable timeout to limit the execution time of the query. Setting MaxQueryExecutionTime to a higher value does not have any effect. Do you think is there a way to set such a limit?

I didn't change any other parameter related to the SPARQLWrapper and therefore assumed that the default timeout is set to 0.

UPDATE: actually it looks like that if MaxQueryExecutionTime is higher than 1000 then no timeout is used, i.e. the query runs indefinitely.

@TallTed
Copy link
Collaborator

TallTed commented Jan 28, 2025

I want to confirm that you did test settings of MaxQueryExecutionTime in all cases, rather than MayQueryExecutionTime as is seen in your first and third bullets (i.e, Max vs May).

@nicolastoira
Copy link
Author

Yes, I confirm. Sorry for that, it was just a typo from my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants