Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed training leads to error when the next request is received #400

Open
MichaelRoeder opened this issue May 23, 2024 · 2 comments
Open
Labels
bug Something isn't working

Comments

@MichaelRoeder
Copy link
Member

Error description

If a training attempt fails, following requests with the same pre-trained model path lead to an exception. Steps to recreate it:

  1. Send a (slightly) faulty request that contains a path to a pre-trained model that does not exist and that contains a small error (e.g., a wrong path to the embeddings)
curl -X 'GET' -H 'accept: application/json' -H 'Content-Type: application/json' --data '{"pos":["http://www.wikidata.org/entity/Q3895","http://www.wikidata.org/entity/Q180855"], "neg":["http://www.wikidata.org/entity/Q483915","http://www.wikidata.org/entity/Q1359568","http://www.wikidata.org/entity/Q169167","http://www.wikidata.org/entity/Q192334","http://www.wikidata.org/entity/Q695087","http://www.wikidata.org/entity/Q20165"], "model":"Drill","path_embeddings":"/data/output/Keci_entity_embeddings.csv", "path_embeddings": "mutagenesis_embeddings/Keci_entity_embeddings.csv", "path_to_pretrained_drill": "pretrained_drill",  "num_of_training_learning_problems": 10, "num_of_target_concepts": 3, "max_runtime": 60000, "iter_bound": 100 }' http://localhost:8000/cel
  1. This will lead to a situation in which the server decides to skip the training. However, internally, it will create the path to the pre-trained model. From the server log:
######### CEL Arguments ###############
Knowledgebase/Triplestore:<ontolearn.triple_store.TripleStore object at 0x78ef01cd7af0>
Input data: {'pos': ['http://www.wikidata.org/entity/Q3895', 'http://www.wikidata.org/entity/Q180855'], 'neg': ['http://www.wikidata.org/entity/Q483915', 'http://www.wikidata.org/entity/Q1359568', 'http://www.wikidata.org/entity/Q169167', 'http://www.wikidata.org/entity/Q192334', 'http://www.wikidata.org/entity/Q695087', 'http://www.wikidata.org/entity/Q20165'], 'model': 'Drill', 'path_embeddings': 'mutagenesis_embeddings/Keci_entity_embeddings.csv', 'path_to_pretrained_drill': 'pretrained_drill', 'num_of_training_learning_problems': 10, 'num_of_target_concepts': 3, 'max_runtime': 60000, 'iter_bound': 100}
######### CEL Arguments ###############
No pre-trained model...
No loading because embeddings not provided
Learning OWL Class Expression at most 100 iteration:   0%|                                                                                                                          | 0/100 [00:00<?, ?it/s]
######## Current Search Tree 11 ###########

... answering the request continues as usual from here on ...
  1. As a user, you may find the error and correct it. So we send the same request as the one above but with the corrected path_embeddings value.
  2. The server throws an exception because the directory exists:
root@96b21be83ded:/# cd pretrained_drill/
root@96b21be83ded:/pretrained_drill# ls
seen_examples.json

However, there is no pth file in it:

######### CEL Arguments ###############
Knowledgebase/Triplestore:<ontolearn.triple_store.TripleStore object at 0x78ef01cd7af0>
Input data: {'pos': ['http://www.wikidata.org/entity/Q3895', 'http://www.wikidata.org/entity/Q180855'], 'neg': ['http://www.wikidata.org/entity/Q483915', 'http://www.wikidata.org/entity/Q1359568', 'http://www.wikidata.org/entity/Q169167', 'http://www.wikidata.org/entity/Q192334', 'http://www.wikidata.org/entity/Q695087', 'http://www.wikidata.org/entity/Q20165'], 'model': 'Drill', 'path_embeddings': '/data/output/Keci_entity_embeddings.csv', 'path_to_pretrained_drill': 'pretrained_drill', 'num_of_training_learning_problems': 10, 'num_of_target_concepts': 3, 'max_runtime': 60000, 'iter_bound': 1}
######### CEL Arguments ###############
INFO:     127.0.0.1:52386 - "GET /cel HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/Ontolearn/ontolearn/scripts/run.py", line 91, in cel
    owl_learner = get_learner(data)
  File "/Ontolearn/ontolearn/scripts/run.py", line 74, in get_learner
    return get_drill(data)
  File "/Ontolearn/ontolearn/scripts/run.py", line 58, in get_drill
    drill.load(directory=data["path_to_pretrained_drill"])
  File "/Ontolearn/ontolearn/learners/drill.py", line 252, in load
    self.heuristic_func.net.load_state_dict(torch.load(directory + "/drill.pth", torch.device('cpu')))
  File "/usr/local/lib/python3.10/site-packages/torch/serialization.py", line 791, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/usr/local/lib/python3.10/site-packages/torch/serialization.py", line 271, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/local/lib/python3.10/site-packages/torch/serialization.py", line 252, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'pretrained_drill/drill.pth'
@MichaelRoeder MichaelRoeder added the bug Something isn't working label May 23, 2024
@Demirrr
Copy link
Member

Demirrr commented May 24, 2024

Dear @MichaelRoeder

Thank you for opening an issue with through details 👍

Given that path_embeddings does not lead to a CSV file corresponding to the entity embeddings, the created folder named path_to_pretrained_drill only contains seen_examples.json.
Therefore, since pretrained_drill/drill.pth is not found although pretrained_drill is created,FileNotFoundError has been thrown.

My question is How would you like the system to behave in the aforemented scenario?

@MichaelRoeder
Copy link
Member Author

MichaelRoeder commented May 24, 2024

The main issue is that the service gets stuck in a state, in which I cannot use it anymore. (Well, I could use a different pretrained path but it may take too much time until a user like me figures that out... 😅 )

If I understand the workflow correctly, the program decides whether to train or not based on the existence of the given pretrained_drill path. An easy solution would be, to extend this decision as follows: start the training if either the directory does not exist OR if the directory does not contain a model file.

However, there are other solutions possible (e.g., get better users 😉 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants