-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refresh Capabilities for Bugzilla Issue Downloader (Milestone 2) #285
Comments
This is not finished and has not been tested yet.
As discussed on call, I will be combining the two functions that download issues and issues with comments into one function. I will be adding a parameter to the |
The download_bugzilla_rest_issues_comments function now takes 3 more parameters, project_key, comments, and verbose. It is now able to download issues or issues with comments depending on the comments parameter.
API endpoints: Look through this or elsewhere to find the documentation about the comments endpoint. Can also look through old pull requests and issues to find the comments endpoint. |
Refer the this comment here for downloader logic overlap: #290 (comment) |
I found some information about how include_fields is used: Here is an example API call for reference: It seems that the
The API call above combines _default and comments which means it returns the list of bugs and their comments all in one go. I could not find any of this on the documentation website so I don't know how the other group found this out. |
Thank you for digging this out. It's amazing how hard it is to find information and how things interact in strange ways! |
@carlosparadis I am working on making the generic API call function that takes a query parameter ( Example API calls:
To fix this, I am going to update the query in a loop but I was wondering where I should put the loop. There are two ways to update it. One way:
For this one, the loop is in the generic function. Another way:
Here, the loop is in the by_date function. I was wondering which way is the best way to do it. Let me know if you need me to go into more detail about this. |
Just so I am clear, the off-set is already handled by the existing bugzilla function right? The question is where you will place it? From what you said, the more future-proof way is the second option to be modifying the query (i.e. download_bugzilla_rest_issues_comments_by_date).
If the rationale I provided sounds contradictory to the change you will make, I likely misunderstood the options, so feel free to iterate further with me if you feel is needed or you see a pro/con I did not enumerate. |
I see, so I will be using the second option then. Also yes, the function handles moving the offset parameter. |
I separated the API call into a different function so that it can be used by other queries in the future. I also finished the code for the refresher.
I added comments to the downloader functions and fixed the parser.
I recently found out that when you run the downloader on Bugzilla sites other than Red Hat, it returns a differently formatted .json file. The biggest difference is that Red Hat's json file has a 'limit', 'offset', and 'total_matches' field where as other project sites do not. For example, when you run this API call for Red Hat, the 3 fields mentioned above are listed at the bottom of the json file. When compared with this API call for GCC, the 3 fields are not in the json file. The fields are also missing for this Yocto Project API call and this Mozilla API call Another difference between Red Hat and other Bugzilla websites is the inconsistency of comments. The format of the comments field in the Red Hat API call is different from the format of the comments field in the Mozilla API call. For example, the Red Hat API call has a 'creator_id' field in the comments whereas the Mozilla API call does not have that. The first problem is the motivation for issue #300. Since offset is not being returned by all of the json files, it is confusing that it is being used in the API call. To make the API calls have more intuitive sense and to match how other API calls are made (GitHub), we want to change the API call to modify the start_timestamp instead of the offset. The second problem is the reason why issue #299 exists. Since there are different comment fields in different API calls, the Bugzilla issues comments parser does not work for all json formats. More research on how the comment field looks like on different Bugzilla websites would help to figure out the solution to this. |
Due to the missing Before, the downloader would use the
When the offset is above the total issues, there are no issues to be returned so the To change the |
We could also not use the limit and offset, and just instead rely on the timestamp which may be more sane. #300 |
- One notebook was refactored to expect the use of the getter functions from R/config.R (i #230 contains the getter functions in R/config.R).
- Two getters were added to replace hard-coded paths: get_bugzilla_issue_path() and get_bugzilla_issue_comment_path()
- The project configuration sections of a notebook was incorrectly using the project directory (kaiaulu/) as its working directory rather than the directory that it resides in (/vignettes/) as its working directory.
1. Purpose
The purpose of this issue is to create a refresh capability for the Bugzilla downloader and parser. This means updating the
download_bugzilla_rest_issue_comments
andparse_bugzilla_rest_issues_comments
function. These will be used in arefresh_bugzilla_issues_comments
function so that Bugzilla issues can be constantly updated by a Cron job.2. Process
I will be using mostly existing code to base my changes on. I will be updating
download_bugzilla_rest_issue_comments
by adding acomments
parameter which allows the user to download issues with or without comments. I also added averbose
parameter for more details on the execution status. I also separated the formation of the query and the API call into two different functions.The
download_bugzilla_rest_issue_comments
now takes in a parameter calledquery
, which is a REST API query that it uses to form an API call and download the data into a json file.Then, I created a new function called
download_bugzilla_rest_issue_comments_by_date
which takes in astart_timestamp
parameter and forms a REST API query with that timestamp. It then callsdownload_bugzilla_rest_issue_comments
in a loop until there are no more issues on the page.To make the refresher, I will use the function
refresh_bugzilla_issues_comments
which will check what the save folder path for files. If there are files, then it will find the most recent issue by using theparse_bugzilla_latest_date
function. Then, it will download the issues between the most recent issue created date and today. If there are no files in the save folder path, it will download ALL issues for the Bugzilla page.3. Endpoints
The endpoint that is being used is
/rest/bug
. The query also includescreation_time
,limit
, andoffset
.The
creation_time
query is specific to the second. This means it is possible to download duplicate issues if they were created on the same second. However, it is extremely unlikely that this will happen so it is ok.If the user wants to download issues and comments (comments = TRUE), then the query will also include
include_fields=_default,comments
. For more information on theinclude_fields
query, you can go here.4. Task List
5. Unfinished Parts
There are a few important parts that I was unable to get to. The issues with comments parser does not work with websites other than Red Hat. More information about this is found on issue #299.
Another unfinished task was moving away from the offset query and instead changing the start_timestamp query. More information about this is found on issue #300.
Relevant information can also be found here.
The text was updated successfully, but these errors were encountered: