Skip to content

Interesting comments

Nelson Moller edited this page Sep 23, 2018 · 6 revisions

From our email

Pan

The challenges for horizontal scaling an web app usually are storage and session management.

Depends on where is your K8S, you can get different storage solution. In order to share the course contents, you need a storage type supports ReadWriteMany so that multiple pods can mount it with read/write. There are a few options on the list, but most likely you need to spin up your own service if you are using GCP, AWS or on premise. If webwork supports object storage such as S3, then we can simplify the storage requirement.

You can put /opt/webwork/courses on PV/PVC. You can also put OPL (/opt/webwork/libraries/webwork-open-problem-library) on PV/PVC so that you can update it individually but you have to manually download it.

It is good that webwork uses keys in URL and session in database. So we don't need to worry about session management.

I have been wanting to move our instances for a long time, but we have some challenges here at UBC. We use shibboleth for authentication. Shib requires a mod in apache and a shibd process running. I don't like put multiple process in the same container, so I need to spend some time to see how shibd works in a clustered environment. Also we use R integration, I'm not sure how R scales.

Tani

I am quite interested in seeing if it is feasible to set up WW inside a Docker/Kubernetes setup (or maybe the Red Hat alternatives) for production use (starting in February 2019). However, getting a container based solution running by then is far less critical than getting a production system up even if it needs to be either a physical server or a VM in the short term. Like Pan, I expect to need to integrate with Shibboleth for SSO integration, so that could also be an issue which could delay a container based approach in the short term.

Some related thoughts:

The biggest technical challenge in the short term is probably the shared data/files which are not in the database. It seems that that is what Pan addressed (but much of what he wrote is still gibberish to me).

I recalled and found two forum threads about clustering, and Danny Glin seems to be the "expert" in terms of having an operational WW cluster with redundancy/fail-over. His experience and thoughts may help in determining how to handle these issues in a Docker/Kubernetes approach.

http://webwork.maa.org/moodle/mod/forum/discuss.php?d=548 http://webwork.maa.org/moodle/mod/forum/discuss.php?d=4350

Over time, it seems to me that WW probably needs to subdivided into smaller pieces for efficient scaling. Some things which can probably be moved out of the main web-facing WW container (the only handling web requests) in the future are:

  • PDF file generation (can be asynchronous, and if offloaded from the main container - the container may be able to exclude LaTeX and cut down on its size)
  • serving static files - there is something on the Wiki, if I recall, about using a different webserver (not the one interfacing to the main WW Perl code) to handle that.
  • Problem rendering - Mike mentioned there being hooks to do this in the first thread listed above. This could certainly reduce load on web-facing containers and transfer it to back-end containers.
  • Assignment initialization - I'm pretty sure I recall that this was mentioned as a bottleneck when WW systems are under load in time-dependent settings, so if in a clustered setting this could be farmed out to containers which are not public web-servers, it may help in performance.
  • I'm sure that Mike and others may have additional ideas of parts of the workflow which can be split off the web-facing container.

Mike

These are the folder where data is stored by the server (and they are owned by the server in most installations)

  • webwork/courses
  • webwork/webwork2/logs
  • webwork/webwork2/tmp — this is temporary data
  • webwork/webwork2/DATA (much of this is temporary data — so it may not need to be persisted.
  • webwork/webwork2/htdocs/tmp (often this file is redirected to /var/webwork/tmp or something like that — it stores images and pdfs generated by

the rest of htdocs doesn’t change that much — it is only read

webwebwork/webwork2/htdocs/DATA is modified by OPL-update but not by the webserver it should be persisted if the webwork-open-problem-library is persisted.

The session key is maintained in the database in the table: courseName_key The record contains the session key and a timestamp. The validity of the session_key expires after 20 minutes or so (set in defaults.config I believe and overRides in localOverRides.conf)

Database tables are described in webwork2/lib/WeBWorK/DB/Record For historical reasons each course has it’s own collection of tables hence the table names are prefixed by courseName…

The password is only used when initializing a session and after that the session key is passed either in a post command or in the url for GET (restful) request. This protects the password somewhat.

I think the logs should be persisted.
The htdocs/tmp (or /var/webwork/tmp ) is where the links for equations (when using the images mode) and auxiliary .png files are stored for each course. It is probably wise to preserve that for now, but it is essentially a cache — so the files are recreated if they are missing. Recreating the files (or the links to other files) could slow things down slightly but it’s not a bit deal. If a swap occurs between the time that a tmp file or file link is created and the time the browser sends a request for it there might be a problem — which would disappear if the problem page were refreshed.

Nearly all of the directories are aliased in site.conf

e.g. the logs directory, the htdocs directory, the htdocs/tmp directory etc. etc.

So instead of rearranging directories you can just change the pointers in site.conf This might make some things more compatible with existing setups.

💡 I think the basics to shared and persistent data is there (the pv-pvc and their mounting in the container of the deployment) as the mechanism if the need to bootstrap some of them with some data and permissions handling.

Clone this wiki locally