Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anecdotally slower performance #417

Open
BigLep opened this issue Jun 3, 2022 · 21 comments
Open

Anecdotally slower performance #417

BigLep opened this issue Jun 3, 2022 · 21 comments
Assignees

Comments

@BigLep
Copy link

BigLep commented Jun 3, 2022

Hi @andrew ,

Not blocking, but just passing on that over the last couple of months have found the ecosystem dashboard has gotten anecdotally slower. Links are taking longer to load (to the point that I open multiple links in parallel to avoid future waits). Similarly, some of the .json URLs I used to hit that would resolve within 30 seconds now don't complete in time before the apparent application timeout. I have worked around this by reducing page size of my requests.

Steve

@andrew
Copy link
Collaborator

andrew commented Jun 10, 2022

I've got a new instance deployed here that feels much snappier: http://ipfs2.ecosystem-dashboard.com

It's not 100% ready to do the switch over, but feel free to have a click around.

@andrew
Copy link
Collaborator

andrew commented Jun 11, 2022

Everything is set up now on ipfs2, it should be keeping in sync with changes on github, perhaps you can try it out in your next triage session?

@andrew
Copy link
Collaborator

andrew commented Jun 13, 2022

Currently making some database config tweaks, ipfs2 will be unavailablbe for a couple hours

@BigLep
Copy link
Author

BigLep commented Jun 17, 2022

Hi @andrew - just checking in here on what you advise I do for triage sessions going forward. I was going to flip things to ipfs2, but it doesn't look to be up.

@andrew
Copy link
Collaborator

andrew commented Jun 23, 2022

Yeah it looks like everything got really slow for a while like the server went to sleep almost, I will investigate

@andrew
Copy link
Collaborator

andrew commented Jun 24, 2022

Even on this new server the database is totally overwhelmed! I've restarted it and things are working again but it's going to need some more tweaks to make sure it doesn't fall over again, I have a full day on Monday that I can work on it.

@andrew
Copy link
Collaborator

andrew commented Jun 27, 2022

I'm running some background cleanup scripts on all the instances to remove a lot of unused database records, it may take a few hours and the dbs will be a bit slow but my hope is to reduce the database size significantly and unlock some more performance without any code changes.

@andrew
Copy link
Collaborator

andrew commented Jun 27, 2022

Before running cleanup:
Screenshot 2022-06-27 at 11 13 25

The events table and it's indexes have grown very large and consume a lot of resources, the repository dependencies table is also very large and has a lot of indexes.

@andrew
Copy link
Collaborator

andrew commented Jun 30, 2022

Cleanup is complete and I've also made a number of significant performance improvements across various parts of the app that should reduce database load on ipfs.ecosystem-dashboard.com (https://ecosystem-research.herokuapp.com) and I'll be monitoring it closely over the next week.

Ignore ipfs2.ecosystem-dashboard.com for now

@BigLep
Copy link
Author

BigLep commented Jul 14, 2022

@andrew : in case it wasn't known, I can't get the dashboard to load for me today (2022-07-14). I've tried multiple URLs. I'm planning to sing its praises during an IPFS Thing talk tomorrow (2022-07-15). I'm hopeful it will be up in case anyone in the audience checks it out.

Edit: I'm able to get some URLs to load now.

@andrew
Copy link
Collaborator

andrew commented Jul 14, 2022

There was a change earlier in the week to the pmf stats that has put a big load on the database, will see if I can tweak some things later tonight

@andrew
Copy link
Collaborator

andrew commented Jul 14, 2022

@BigLep I have killed all the db connections and restarted everything, I think the next course of action will be to seperate the pmf stats from the issue triage as the database can't handle doing both in one app.

@BigLep
Copy link
Author

BigLep commented Jul 20, 2022

Thanks @andrew for the update. Just passing on that for triages this week we have been getting "Application error" for all URLs.

@SgtPooki
Copy link
Member

I was also getting application error a lot and almost opened a second issue, but things just recently started working, and much more quickly.

side-note: Since I have access to the heroku instance, I was trying to gather logs to determine the issue, but it was not quick/simple for me to do so. Neither of the following commands gave me any more information about what was causing the errors:

  • heroku logs --tail -a ecosystem-research | grep "503"
  • heroku logs --tail -a ecosystem-research | grep "Application Error"

Any tips you (@andrew) have on troubleshooting would be great =D

@andrew
Copy link
Collaborator

andrew commented Jul 20, 2022

I gave the whole thing a big kick about 10 mins after seeing @BigLep's comment, and by big kick I mean:

heroku pg:killall

followed by

heroku restart

The problem is that there are some overnight background tasks that are completely stomping the database and it's not recovering, killing all the very long running db connections is a blunt object way of bringing the web app back online.

@SgtPooki heroku logs don't help much as it can be hard to see what is causing the timeouts, I've added newrelic as an addon that has much more info on slow actions, db queryies extra.

You should be able to find the "new relic apm" link on this page: https://dashboard.heroku.com/apps/ecosystem-research/resources (the "heroku postgres" link on that page also has some basic insights that might be helpful)

I'm going to do some more investigation tomorrow morning, haven't had a lot of free time available to keep on top of this recently as my other job has been pretty full on recently.

@andrew
Copy link
Collaborator

andrew commented Jul 22, 2022

Yesterday I made some significant changes to the pmf calculations which should reduce the load on the database and keep the web ui performant.

@BigLep
Copy link
Author

BigLep commented Sep 22, 2022

@andrew : I'm getting queries that are timing out again. I'm trying to pull down event data, and even reducing the page size to 100 is still leading to timed out results: https://ipfs.ecosystem-dashboard.com/events.json?range=144&per_page=100&page=1

Does it need to be "kicked" again?

@andrew andrew self-assigned this Sep 23, 2022
@andrew
Copy link
Collaborator

andrew commented Sep 23, 2022

The events table has grown very, very large and query time has reached over the 30 second heroku timeout limit. You can get the endpoint to load by removing the range paramter but that may not help in your case.

What I'm thinking we may need to do is move older events (say over 1 year old) into a seperate table (archived_events for example), to keep all the website endpoint performant.

@BigLep
Copy link
Author

BigLep commented Sep 23, 2022

Got it - makes sense. Moving events over a year old definitely seems good/fine to me. In the last 1.5 years, I haven't needed to go back further than a year.

@andrew
Copy link
Collaborator

andrew commented Oct 5, 2022

I'm going on holiday tomorrow, so won't get chance to split the events table for a couple weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants