You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Coming back to #1544 we can see that there are many redundant steps which are executed. This is/was mainly introduced to have a configurable server which can be altered at runtime. Means the data integrator can change content on the database without restarting the server. It turned out, that this usecase is a rare one. Probably it is not used at all. Instead, new data is provided by a regular deploy which means the underlying data and the server is completely re setup.
Initialising things once
One of the most interesting points which was not touched in the recent refactorings and performance improvements is the initialisation of the processor. It is initialized everytime an ÖREB related endpoint is called. If we agree on the statement made in the intro, it would be one of the most efficient performance catches if we refactor pyramid_oereb to initialize the processor only once at boot time. This would cut down all initilisation process which is done in this method.
=> this has to be discussed, as it is a organizational decision to make pyramid_oereb recognizing the configuration and datasorurces only at boot time and not on every request.
Here all iterative querying to the sources is bundled and here we could take action. However we should discuss which technique we want to use and if this should be configurable.
asyncio
Since we build up onto recent python versions in this project, we are able to use asyncio in combination with SQLAlchemy. This is probably the best solution in terms of future proof setup. However it comes with some down sides. Asyncio is not 100% available in all python stack and libs we may depend on. So a major task would be to research where we might be blocked to use that.
The most up side of this solution is its scalability and the resource saving solution we would have.
A well known way of implementing iterative parallel tasks. We easily could implement that. The main disatvantage here is the forking. Threads in one solution or processes in the other, may introduce much more load onto the metal server in the end. So we should discuss how we can avoid bruteforcing wether the database or our servers in the end. In my opinion we could avoid that with some additional configuration where one can set the number of threads or processes to be allowed.
It is some home made way to improve things for long time running servers to not collect too many open DB sessions. Currently Iam not aware of the influence that would have in a parallel context. Not for Threading NOR Processing NOR Asyncio.
The text was updated successfully, but these errors were encountered:
@vvmruder After discussion in the PSC, could you provide an time estimate for the changes necessary to realise the 'Initialising things once' part above?
On our side we will check with the usergroup if everybody uses only the standard and interlis source configurations
@vvmruder, @svamaa, @voisardf
We had some more discussion in the PSC concerning the task "initialising things once". It is important for us that routine operations such as updating data of particular themes or updating real estate information can be performed without a server restart. However, changes in configuration such as the change of the data source of a topic (database/database schema) may require a server restart.
Intro
Coming back to #1544 we can see that there are many redundant steps which are executed. This is/was mainly introduced to have a configurable server which can be altered at runtime. Means the data integrator can change content on the database without restarting the server. It turned out, that this usecase is a rare one. Probably it is not used at all. Instead, new data is provided by a regular deploy which means the underlying data and the server is completely re setup.
Initialising things once
One of the most interesting points which was not touched in the recent refactorings and performance improvements is the initialisation of the processor. It is initialized everytime an ÖREB related endpoint is called. If we agree on the statement made in the intro, it would be one of the most efficient performance catches if we refactor pyramid_oereb to initialize the processor only once at boot time. This would cut down all initilisation process which is done in this method.
The first 3 pointers are already improved that way, that they initialize only the real estate source and ommit the rest. This was done to improve performance. We could improve it in general if we do that processor initialisation once on server start up.
=> this has to be discussed, as it is a organizational decision to make pyramid_oereb recognizing the configuration and datasorurces only at boot time and not on every request.
Parallelisation
I see on potential place where we could hook in for proper take advantage of parallel processing:
https://github.com/openoereb/pyramid_oereb/blob/master/pyramid_oereb/core/readers/extract.py#L51-L104
Here all iterative querying to the sources is bundled and here we could take action. However we should discuss which technique we want to use and if this should be configurable.
asyncio
Since we build up onto recent python versions in this project, we are able to use asyncio in combination with SQLAlchemy. This is probably the best solution in terms of future proof setup. However it comes with some down sides. Asyncio is not 100% available in all python stack and libs we may depend on. So a major task would be to research where we might be blocked to use that.
The most up side of this solution is its scalability and the resource saving solution we would have.
multiprocessing / threading
A well known way of implementing iterative parallel tasks. We easily could implement that. The main disatvantage here is the forking. Threads in one solution or processes in the other, may introduce much more load onto the metal server in the end. So we should discuss how we can avoid bruteforcing wether the database or our servers in the end. In my opinion we could avoid that with some additional configuration where one can set the number of threads or processes to be allowed.
SQLAlchemy Session Management
A thing we also need to research, is the way we currently implement our session sharing:
https://github.com/openoereb/pyramid_oereb/blob/master/pyramid_oereb/core/adapter.py#L12-L73
It is some home made way to improve things for long time running servers to not collect too many open DB sessions. Currently Iam not aware of the influence that would have in a parallel context. Not for Threading NOR Processing NOR Asyncio.
The text was updated successfully, but these errors were encountered: