Anecdotally slower performance #417

BigLep · 2022-06-03T22:41:26Z

Not blocking, but just passing on that over the last couple of months have found the ecosystem dashboard has gotten anecdotally slower. Links are taking longer to load (to the point that I open multiple links in parallel to avoid future waits). Similarly, some of the .json URLs I used to hit that would resolve within 30 seconds now don't complete in time before the apparent application timeout. I have worked around this by reducing page size of my requests.

Steve

andrew · 2022-06-10T09:09:57Z

I've got a new instance deployed here that feels much snappier: http://ipfs2.ecosystem-dashboard.com

It's not 100% ready to do the switch over, but feel free to have a click around.

BigLep · 2022-06-10T17:26:40Z

Agreed that it feels a lot snappier:

<1s : https://ipfs2.ecosystem-dashboard.com/all?org=libp2p&exclude_language%5B%5D=JavaScript&exclude_language%5B%5D=TypeScript&exclude_language%5B%5D=Rust&exclude_language%5B%5D=C%2B%2B&exclude_language%5B%5D=Kotlin&exclude_repo_full_name%5B%5D=libp2p%2Fhydra-booster&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-kad-dht&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-pubsub&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-pubsub-router&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-pubsub-tracer&range=365&state=open&per_page=100&sort=updated_at&order=desc&label%5B%5D=need%2Fauthor-input

many seconds: https://ecosystem-research.herokuapp.com/all?org=libp2p&exclude_language%5B%5D=JavaScript&exclude_language%5B%5D=TypeScript&exclude_language%5B%5D=Rust&exclude_language%5B%5D=C%2B%2B&exclude_language%5B%5D=Kotlin&exclude_repo_full_name%5B%5D=libp2p%2Fhydra-booster&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-kad-dht&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-pubsub&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-pubsub-router&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-pubsub-tracer&range=365&state=open&per_page=100&sort=updated_at&order=desc&label%5B%5D=need%2Fauthor-input

Anyways - no rush. We'll make do either way. Thanks!

andrew · 2022-06-11T07:52:07Z

Everything is set up now on ipfs2, it should be keeping in sync with changes on github, perhaps you can try it out in your next triage session?

andrew · 2022-06-13T11:32:50Z

Currently making some database config tweaks, ipfs2 will be unavailablbe for a couple hours

BigLep · 2022-06-17T17:32:47Z

Hi @andrew - just checking in here on what you advise I do for triage sessions going forward. I was going to flip things to ipfs2, but it doesn't look to be up.

andrew · 2022-06-23T16:52:59Z

Yeah it looks like everything got really slow for a while like the server went to sleep almost, I will investigate

andrew · 2022-06-24T11:17:22Z

Even on this new server the database is totally overwhelmed! I've restarted it and things are working again but it's going to need some more tweaks to make sure it doesn't fall over again, I have a full day on Monday that I can work on it.

andrew · 2022-06-27T10:08:35Z

I'm running some background cleanup scripts on all the instances to remove a lot of unused database records, it may take a few hours and the dbs will be a bit slow but my hope is to reduce the database size significantly and unlock some more performance without any code changes.

andrew · 2022-06-27T10:15:30Z

Before running cleanup:

The events table and it's indexes have grown very large and consume a lot of resources, the repository dependencies table is also very large and has a lot of indexes.

andrew · 2022-06-30T12:35:01Z

Cleanup is complete and I've also made a number of significant performance improvements across various parts of the app that should reduce database load on ipfs.ecosystem-dashboard.com (https://ecosystem-research.herokuapp.com) and I'll be monitoring it closely over the next week.

Ignore ipfs2.ecosystem-dashboard.com for now

BigLep · 2022-07-14T15:12:37Z

@andrew : in case it wasn't known, I can't get the dashboard to load for me today (2022-07-14). I've tried multiple URLs. I'm planning to sing its praises during an IPFS Thing talk tomorrow (2022-07-15). I'm hopeful it will be up in case anyone in the audience checks it out.

Edit: I'm able to get some URLs to load now.

andrew · 2022-07-14T17:24:57Z

There was a change earlier in the week to the pmf stats that has put a big load on the database, will see if I can tweak some things later tonight

andrew · 2022-07-14T17:57:43Z

@BigLep I have killed all the db connections and restarted everything, I think the next course of action will be to seperate the pmf stats from the issue triage as the database can't handle doing both in one app.

BigLep · 2022-07-20T15:10:53Z

Thanks @andrew for the update. Just passing on that for triages this week we have been getting "Application error" for all URLs.

SgtPooki · 2022-07-20T15:51:29Z

I was also getting application error a lot and almost opened a second issue, but things just recently started working, and much more quickly.

side-note: Since I have access to the heroku instance, I was trying to gather logs to determine the issue, but it was not quick/simple for me to do so. Neither of the following commands gave me any more information about what was causing the errors:

heroku logs --tail -a ecosystem-research | grep "503"
heroku logs --tail -a ecosystem-research | grep "Application Error"

Any tips you (@andrew) have on troubleshooting would be great =D

andrew · 2022-07-20T17:26:56Z

I gave the whole thing a big kick about 10 mins after seeing @BigLep's comment, and by big kick I mean:

heroku pg:killall

followed by

heroku restart

The problem is that there are some overnight background tasks that are completely stomping the database and it's not recovering, killing all the very long running db connections is a blunt object way of bringing the web app back online.

@SgtPooki heroku logs don't help much as it can be hard to see what is causing the timeouts, I've added newrelic as an addon that has much more info on slow actions, db queryies extra.

You should be able to find the "new relic apm" link on this page: https://dashboard.heroku.com/apps/ecosystem-research/resources (the "heroku postgres" link on that page also has some basic insights that might be helpful)

I'm going to do some more investigation tomorrow morning, haven't had a lot of free time available to keep on top of this recently as my other job has been pretty full on recently.

andrew · 2022-07-22T10:17:41Z

Yesterday I made some significant changes to the pmf calculations which should reduce the load on the database and keep the web ui performant.

BigLep · 2022-09-22T23:12:59Z

@andrew : I'm getting queries that are timing out again. I'm trying to pull down event data, and even reducing the page size to 100 is still leading to timed out results: https://ipfs.ecosystem-dashboard.com/events.json?range=144&per_page=100&page=1

Does it need to be "kicked" again?

andrew · 2022-09-23T14:35:59Z

The events table has grown very, very large and query time has reached over the 30 second heroku timeout limit. You can get the endpoint to load by removing the range paramter but that may not help in your case.

What I'm thinking we may need to do is move older events (say over 1 year old) into a seperate table (archived_events for example), to keep all the website endpoint performant.

BigLep · 2022-09-23T14:53:42Z

Got it - makes sense. Moving events over a year old definitely seems good/fine to me. In the last 1.5 years, I haven't needed to go back further than a year.

andrew · 2022-10-05T10:42:25Z

I'm going on holiday tomorrow, so won't get chance to split the events table for a couple weeks.

andrew self-assigned this Sep 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anecdotally slower performance #417

Anecdotally slower performance #417

BigLep commented Jun 3, 2022

andrew commented Jun 10, 2022

BigLep commented Jun 10, 2022

andrew commented Jun 11, 2022

andrew commented Jun 13, 2022

BigLep commented Jun 17, 2022

andrew commented Jun 23, 2022

andrew commented Jun 24, 2022

andrew commented Jun 27, 2022

andrew commented Jun 27, 2022

andrew commented Jun 30, 2022

BigLep commented Jul 14, 2022 •

edited

Loading

andrew commented Jul 14, 2022

andrew commented Jul 14, 2022

BigLep commented Jul 20, 2022

SgtPooki commented Jul 20, 2022

andrew commented Jul 20, 2022

andrew commented Jul 22, 2022

BigLep commented Sep 22, 2022

andrew commented Sep 23, 2022

BigLep commented Sep 23, 2022

andrew commented Oct 5, 2022

Anecdotally slower performance #417

Anecdotally slower performance #417

Comments

BigLep commented Jun 3, 2022

andrew commented Jun 10, 2022

BigLep commented Jun 10, 2022

andrew commented Jun 11, 2022

andrew commented Jun 13, 2022

BigLep commented Jun 17, 2022

andrew commented Jun 23, 2022

andrew commented Jun 24, 2022

andrew commented Jun 27, 2022

andrew commented Jun 27, 2022

andrew commented Jun 30, 2022

BigLep commented Jul 14, 2022 • edited Loading

andrew commented Jul 14, 2022

andrew commented Jul 14, 2022

BigLep commented Jul 20, 2022

SgtPooki commented Jul 20, 2022

andrew commented Jul 20, 2022

andrew commented Jul 22, 2022

BigLep commented Sep 22, 2022

andrew commented Sep 23, 2022

BigLep commented Sep 23, 2022

andrew commented Oct 5, 2022

BigLep commented Jul 14, 2022 •

edited

Loading