-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Other Updates, but pushshift.io NOT WORK! #285
Comments
PushShift is currently broken, due to API restrictions that Reddit staff are implementing. As a result, I will be unable to support any further PushShift development until (and if) they work something out with Reddit. |
So the CSV download is effectively dead now? The --full_csv flag seems to imply it will bypass the need for PushShift but it fails in the same way. Is there an easy bypass? |
The Though that might be easier to hack, as far as I can tell in that case pushshift is only needed to get the metadata from a reddit post to create an instance of the PS: @vincenzogianfelice I am appalled by the entitlement displayed in your comment. This software is provided entirely free of charge, the least you could do is to be nice to the developer. |
Yes, this functionality is currently broken - and likely both in this version and in the TypeScript rewrite. Reddit comments and submissions generally change or are lost over a long enough window, and coupled with the fact that the official Reddit API is (or was) extremely slow for individual lookups, PushShift was implemented as the sole solution for single targets. For people using this functionality, the reason is generally because they have more saved posts than the official API will return (capped at 1000), so typical CSV downloads will have many thousands of posts to scan. Frankly, the Reddit API is unsuitable for this task. Due to the harsh rate limiting of their API, and also because of their general slow response time, processing a CSV directly through official means would take a significant amount of time. Ignoring the API response times and skipping the actual download calls, which use additional API queries in some cases, the optimistic run time just to retrieve 1000 individual posts within API limits is 30+ minutes. This also ignores any old deleted or edited posts, where the data will be completely unrecoverable. In the rewritten TS version, PushShift functionality was mandatory in order to reliably build relationships between saved comments and their parents, in the event that the parent submission had been removed from the live site. This probably isn't the best place to discuss, but I may as well dump it here on the most recent issue caused by Reddit actions: Suffice it to say that I'm unlikely to expend much effort towards bringing these features back in the short term. I have very limited time to work on my passion projects these days, and I would prefer not to waste that time stepping into adversarial relationships with social media site developers. If they get things sorted out with PushShift, then everything should start working again and I'll be more encouraged to move forward with completing the rewrite, which also heavily utilizes PS. If not... well, it will likely be impossible to reimplement the lost functionality to the level people expect from the application. The code to add a bandage fix exists - scattered around - within RMD already, and I'm very open to accepting Pull Requests, but I probably won't be the one implementing it. The fix would only get RMD limping along, and honestly that seems likely to only raise more complaints and issues. At this point I'll be keeping an eye out for future Reddit API developments, and should anything come up, I'll be happy to revisit this. |
Good news, it seems like they sorted things out with PushShift and it is coming back in the following month. Bad news is that
|
The last update was a month ago, but when trying to download from
pushshift.io
it doesn't work! Why don't you fix this situation that has been going on for months? Thank youThe text was updated successfully, but these errors were encountered: