Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues when tracking logged in users #5254

Open
harishbalachandran opened this issue Aug 6, 2016 · 13 comments
Open

Performance issues when tracking logged in users #5254

harishbalachandran opened this issue Aug 6, 2016 · 13 comments

Comments

@harishbalachandran
Copy link

Summary

The topic tree, videos and exercise are not loading on the tablets which are connected to KA lite hotspot using the Wifi Dongle.

There were 26 students accessing KA lite and almost 16 faced this problem where they were not able to access the content since their topic tree, videos and exercises were not loading.

**Note: The server was upgraded from 0.16.5 to 0.16.8 using the following two commands:

  1. sudo pip install --upgrade ka-lite
  2. sudo kalite manage setup

We did not face these problems in version 0.16.5 and close to 35 users were able to access the content very smoothly.**

System information

OS: Ubuntu version 14.04 LTS
Installer: "pip install ka-lite" and "sudo kalite manage setup"

Version: 0.16.8 upgraded from 0.16.5

Screenshots

img_20160806_170816
img_20160806_170831
img_20160806_170908
img_20160806_170914
img_20160806_170918
img_20160806_170921
img_20160806_170939
img_20160806_171004
img_20160806_171006
img_20160806_171016

@benjaoming benjaoming added the bug label Aug 7, 2016
@benjaoming benjaoming added this to the 0.16.9 milestone Aug 7, 2016
@benjaoming
Copy link
Contributor

We'll try to recreate this, thanks for reporting @harishbalachandran !

@aronasorman
Copy link
Collaborator

Hi @harishbalachandran, we tried replicating this on our side -- this sounds like a performance issue, especially since it is only showing up when 16/26 users are experiencing this.

0.16.8 has additional performance tweaking options available. Can you add the following lines to your ~/.kalite/settings.py file:

CONTENT_DB_SQLITE_PRAGMAS = [("journal_mode", "MEMORY"),
                             ("temp_store", "2")]

That makes the content db run its journals in memory, speeding up reads and writes to the content database. Since this bug is related to the KA topic tree (stored in the content db), it should provide some speedups for your use case.

If the issue still persists, it would help if @amarkamthe can take a look at the device in question and gather some more statistics for us.

@benjaoming
Copy link
Contributor

@harishbalachandran - do you have any additional info for us to work with?

As @aronasorman noted, it's probably a performance issue.

Can you retrieve the server.log file from your hotspot device? What hardware is running KA Lite, I mean processor/memory-wise ?

@benjaoming benjaoming changed the title The topic tree, videos and exercise are not loading on the tablets for version 0.16.8 <High Priority> Investigating possibly downgraded performance in 0.16.8 Aug 17, 2016
@harishbalachandran
Copy link
Author

@benjaoming @aronasorman ,
me and @amarkamthe are trying to test out with the above tweaking and will share the server.log when we have have enough data to prove this behavior. Till then let this be open.

@benjaoming
Copy link
Contributor

@harishbalachandran I would never close such a beautiful issue report without finding an explanation / fix :)

@benjaoming
Copy link
Contributor

Hey there, did you manage to find more data on the issue? I'm bumping it to 0.17, but don't feel discouraged, it's just that there's some other important fixes that we need to get out there! :)

@benjaoming benjaoming modified the milestones: 0.17.0, 0.16.9 Sep 5, 2016
@harishbalachandran
Copy link
Author

Hey @benjaoming I guess this is pretty much same as 5317. If faced this during the upgrade from 0.16.5 to 0.16.8, and now we are facing the same for any upgrade from one version to other.

@benjaoming
Copy link
Contributor

benjaoming commented Oct 11, 2016

@harishbalachandran IMO #5317 is a release blocker and very important to get fixed (thanks for reporting it!!), whereas this issue ATM is impossible to fix because it's too hard to understand what the problem is.

Did you try @aronasorman 's suggested SQLite pragma settings?

@benjaoming
Copy link
Contributor

Commit 688b0e2 which is merged into develop and will be released in 0.17 is relevant to certain scenarios where content recommendation can get stuck in endless loops. It can affect anything since User A visiting a page that triggers content recommendation (the Home page) will make the server freeze for any other users visiting any other page.

@L-Amstutz
Copy link

We have deployed dozens of RACHEL Plus servers in schools in Liberia where they typically have about 50 students in a math classroom. We share one laptop between two students, so that means up to 25 laptops logged into KA-Lite at once. We have upgraded to KA-Lite 0.16.9, and have also made the modification to move all the KAL database from the internal eMMC to the 50GB HDD which supposedly has faster write speeds, but we still are having performance issues. We may be somewhat unique in that we are depending heavily upon the student tracking and progress stats so that we can correlate any improvement in math scores with the amount of time spent and exercises completed in Ka-Lite.
When students are NOT logged in to KA-Lite, they all can browser videos and even practice the exercises without much noticeable performance hit. But when students log in to the KA-Lite accounts so that teachers and admin can monitor their progress, we find that the system crawls to a stop after about a dozen or 15 clients active simultaneously, especially when they all do math exercises. It takes a very LONG time to load the next problem after submitting an answer, and often the browser connection "times out". Sometimes they are even forced to log out and log back in. It does appear that their usage stats are being recorded.
I've run the "dstat" tool on the server to monitor CPU, Paging, Memory, Disk I/O, and networking. Preliminary results appear to show that the bottleneck is Disk write and CPU caused by python2 processes. We are attempting to get a roomful of laptops and volunteers together to stage a live test here in USA but may have to wait until my next trip to Africa to capture stats from a real classroom with real students. When we get dependable results, I will share the logs. Hopefully this will help locate to source of the slow downs.

@benjaoming
Copy link
Contributor

@L-Amstutz - thanks for sharing detailed feedback! It does sound like we need to introspect our memory and processing footprints more in detail with special attention to content logging and calculation of recommendations from the topic tree.

I would think we should pay special attention to:

  1. Our logging calls via javascript. They hit a couple of different API end points registering student progress.
  2. Pages that use the data and whether they cause endless loops etc.

When re-creating the issue with many users hitting the same server, we should also try to isolate the issue once it starts occurring to better understand its nature. For instance, does it happen - as you say - that loading the next exercise is slow, or is this page slow because another page view has made the system stall?

As I said in my above comment, I've already implemented a fix to a problem that did indeed use 100% CPU once it started occurring. So at this point, there's already hope that the 0.17 series has fixed it... but it seems it's going to be difficult to verify :)

@L-Amstutz
Copy link

L-Amstutz commented Dec 27, 2016 via email

@benjaoming
Copy link
Contributor

benjaoming commented Dec 27, 2016

Hi @L-Amstutz - I'm not familiar with dstat, but if it only does analysis per process, then it cannot help us troubleshoot because everything is happening inside the python2 process.

You could try looking at your ~/.kalite/server.log if you're running anything but Windows. It could show you the requests made at the time when things start to go slow. The log contains timestamps and URL paths requested.

@benjaoming benjaoming modified the milestones: 0.17.0, 0.17.1 Mar 19, 2017
@benjaoming benjaoming modified the milestones: 0.18, 0.17.1 Apr 7, 2017
@benjaoming benjaoming changed the title Investigating possibly downgraded performance in 0.16.8 Performance issues when tracking logged in users Apr 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants