-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Github Discussions with GraphQL API #324
Comments
@carlosparadis I have some questions on the creating a parser for the downloaded file. Lines 150 to 187 in 810c183
Looking at the example response on the REST API endpoint for issue events, there is quite a lot left out. https://docs.github.com/en/rest/issues/events?apiVersion=2022-11-28#get-an-issue-event Is this up to my own discretion of what is and isn't important? |
There are refresher endpoints in R/github.R if you want to learn how that is done (i believe it is the search endpoint). There may be another notebook too other than comments. However, to implement refresh you need an api endpoint that lets you select at least a starting date for the comments. Does this endpoint gives that? Second, have you looked through GitHub to see if this is the only way to download Discussion comments? GitHub sometimes offer multiple API endpoints, so you want to be careful here you don't end up in the wrong endpoint. Third, you may want to just try and do on the browser the request for a JSON (you can construct the request as a URL --- please don't paste the URL here with your API key, but do place the URL here as an example with a PLACEHOLDER as @beydlern did. What gets parsed depends on what we discuss here may be relevant for the various analysis Kaiaulu does, so the easiest way would be for you to suggest for us to agree (i do need you to make sure you considering all possible endpoints). Also, the motivation on your issue specification sounds a bit strange for me (purpose section). That sounds more GitHub motivation than our own. Our own motivation ties closer to @daomcgill work. Dao downloads mailing list data. And mailing lists can be about developers communication, users communication or more. Back in the day, a lot of people used mailing lists for both. This goes back to before issue trackers even existed, let alone GitHub. Nowadays, issues exist, so "mailing list dev" in a lot of projects moved on to issue trackers. The equivalent for the user mailing list is discussions on GitHub (but other projects may use something else). Therefore, the purpose of the capability in Kaiaulu being available is so we can mine user interaction in projects on GitHub with projects. Some research may be interested in understanding how projects interact with users for analysis and community health for example (there are hundreds of studies that analyze StackOverflow questions!) |
I also suggest you take a look on the user side of Discussions so you understand the data you are getting (or not): (Please don't create random questions, it will pollute kaiaulu repo, but you can always create a sandbox repo on your own account to play with it and delete your sandbox repo later): https://github.com/sailuh/kaiaulu/discussions/new/choose Notice how there are 5 types of categories. It is easier for us to discuss what data makes sense if you explain here from what I can see already. |
One last note: Before you spend too much time on code and API, you should make sure the endpoint is the correct one: I just noticed the API asks for a Team Slug. I have no idea what that is. If you go to the "Discussions" tab on Kaiaulu, you will notice there is no notion of Teams. It is just plain and simple discussions. So the URL I gave may be for another type of Discussions. https://docs.github.com/en/search?query=discussions You should check GitHub Docs and google to see if it is even possible to obtain the data in the first place! |
I'll likely have to rename this issue as well as a lot of the process, as after looking into it, REST API isn't actually for the discussions, but github Teams Discussions, a completely different thing, which was my mistake. |
Sounds good! |
@RavenMarQ @carlosparadis Just mentioning this as I look into GraphQL API. |
@carlosparadis I've become more familiar with how the query works for GraphQL, and have started creating the functions. While I work on the functions, I wanted to run past you the information I am retrieving with the query.
Is there some other information that is either missing or isn't needed from this query? Here is an example response I got for the first discussion post listed in Kaiaulu's disucssions:
|
@crepesAlot thank you for the update! I guess one question here is, when you attempt to create a discussion it can be Q&A but also others. How does this affects the data format? Could you create (not on Kaiaulu), a fork and on your fork experiment with the discussion tabs as example to see what you get out of the API? For instance, the Poll, Q&A and the others looks like their JSON would be different. |
@carlosparadis I actually found that the format doesn't change at all. It still retrieves the title, body and any comments under the discussion without any issues regardless of its category. The field for answers simply returns null. I'm also hopeful that the refresher function will be relatively easy to create as not only can I get the time a discussion was created but also filter them more easily. A list of discussions signatures from the documentation:
They also list all the information you can pull from Github Discussions here: |
You can go ahead and proceed with the code for this! I guess one open question if this is the same output for all responses is, what about the upvoted answers? We can't obtain the number of upvotes? |
Unfortunately there doesn't seem to be a way to get the number of upvotes, the closest thing would be getting the reactions to comments, such as a thumbs up, but that is separate from upvotes. |
@carlosparadis Having some difficulties with the
I haven't been able to figure out what the problem is. I ran the following lines to try to find it.
Shouldn't this mean that the file exists and that it has write permissions? Or have I made some major misunderstanding with how the functions work? |
Did you try opening the function definition and running one line of it at a time? I think the filepath constructed inside or relative to from where you are running may just be incorrect. |
For github.R: - github_api_discussions - github_parse_discussions - github_parse_discussion_comments For config.R: - get_github_discussions_path For conf/kaiaulu.yml: - Added new discussion field in github issue_tracker for a save filepath for discussions JSON file For github_api_showcase.Rmd: - Demonstrating usage for the newly added functions.
@carlosparadis Thankfully, looks like we solved the problems with the notebooks and gh tool. |
Changed the function to paginate as needed to download all available entries.
- Added refresher function for discussions - Non-fixed warning, refresher function could cause error if JSON file is improperly named.
- Moved the Download Discussions function and explanations from `vignettes/github_api_showcase.Rmd` to the `vignettes/download_github_comments.Rmd` notebook
- Updated filepaths to unify formatting - Added discussion save filepath to github issue tracker
- Updated `vignettes/download_github_comments.Rmd` notebook - Added documentation for refresh function - Set create_file_directory function verbose param to FALSE as default
- Reconfigured how `create_file_directory` function obtains paths from config file
@carlosparadis I was unable to create a refresh function for the discussions, so I removed the function from |
Can you give me the endpoint where you found the limitation? Or was it through trial and error? If so, what is the closest documentation you can find about this? |
I found this limitation through trial and error. |
Purpose
GitHub Discussions is a public forum that allows for collaborative communication without needing to be tied to a specific project or related to code. It provides a more centralized space to hold discussions.
The data that can be mined from the discussions and discussion comments can be of great interest to anyone interested in the relationship between users and a project's community.
As such, we now need a way to retrieve any comments from this new endpoint.
About Github Discussions: https://docs.github.com/en/discussions/quickstart
Process
To do this, we're using the GraphQL API
There is only a single endpoint: https://api.github.com/graphql
Instead of
GET
requests with the REST API, GraphQL uses queries.The queries will only return the specified data.
gh
is a client Kaiaulu relies on to access the Github's REST and GraphQL APIs; this is what will be used to access GraphQL's single endpoint.Limitations
As of now, there are several points of interest that cannot be retrieved with the GraphQL API endpoint.
Refresh functionality:
While the queries allow the discussions be fetched in order of date, there is no way to filter or download discussions by the created at dates.
It can only be filtered by a cursor variable (end_cursor), which is what the query uses to paginate discussions.
Task List
The text was updated successfully, but these errors were encountered: