Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Outreachy Round 27] Create ReferenceCounterApi class to hit new Reference Counter API #5565

Conversation

gabina
Copy link
Member

@gabina gabina commented Dec 26, 2023

What this PR does

This PR is part of the "Improve how Wiki Education Dashboard counts references added" project (read issue #5547).

The purpose of this PR is to add a ReferenceCounterApi class, which is in charge of making requests to the new Toolforge Reference Counter API.

The ReferenceCounterApi is quite simple. It's intended to be used to retrieve the number of references in a given revision id, across every wiki (language/project) except for wikidata (this is because wikidata works pretty different from other wikis, and there is a specific gem to get references from it).

The main entry point for this class is get_number_of_references_from_revision_id method. It takes the revision id, and returns the number of references existing in the revision, or -1 if there was any problem hitting the API (as a side effect, it logs errors in Sentry).

This PR also adds basic specs for ReferenceCounterApi.

Open questions and concerns

For now, ReferenceCounterApi class processes one revision ID per request due to limitations in the Toolforge Reference Counter API, which doesn't support batching (querying more than one revision ID in a single request). If we we observe a decline in performance as a result of this approach, we will explore modifications to the Toolforge API to enable batch processing.

@gabina gabina marked this pull request as draft December 26, 2023 19:09
Comment on lines 43 to 49
log_error(e, update_service: @update_service,
sentry_extra: { project_code: @project_code,
language_code: @language_code, rev_id: })
Copy link
Member Author

@gabina gabina Dec 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did use the ApiErrorHandling here because I'm rescuing a real error.
We may want to add a retry strategy here, as maybe the errors in TYPICAL_ERRORS are frequent? Do you know if there is any common strategy that is used in the project?

@gabina
Copy link
Member Author

gabina commented Dec 27, 2023

@Aminehassou feel free to take a look at the PR when you have the chance!

@gabina gabina marked this pull request as ready for review December 27, 2023 17:38
@gabina gabina changed the title [WIP] [Outreachy Round 27] Create ReferenceCounterApi class to hit new Reference Counter API [Outreachy Round 27] Create ReferenceCounterApi class to hit new Reference Counter API Dec 27, 2023
@gabina
Copy link
Member Author

gabina commented Dec 27, 2023

CI is failing, but I don't think those errors are related to my changes.

Finished in 27 minutes 17 seconds (files took 8.55 seconds to load)
1839 examples, 3 failures, 17 pending

Failed examples:

rspec ./spec/features/timeline_editing_spec.rb:194 # timeline editing restores content for all blocks with "Discard All Changes"
rspec ./spec/lib/alerts/unsubmitted_course_alert_manager_spec.rb:56 # UnsubmittedCourseAlertManager when the alerts do not exist does not create alerts for courses that were created too recently
rspec ./spec/lib/importers/revision_score_importer_spec.rb:205 # RevisionScoreImporter.update_revision_scores_for_all_wikis imports data and calcuates an article completeness score for available wikis

There is still an error related to the browser. I don't know if it's related to the new --headless=new argument.

  2) timeline editing restores content for all blocks with "Discard All Changes"
     Got 0 failures and 2 other errors:

     2.1) Failure/Error: find(".week-1 .block-kind-#{Block::KINDS['milestone']}").hover

          Selenium::WebDriver::Error::MoveTargetOutOfBoundsError:
            move target out of bounds
              (Session info: chrome=120.0.6099.109)
          # #0 0x55b1483ccd33 <unknown>
          # #1 0x55b148089dbb <unknown>
          # #2 0x55b14812e799 <unknown>
          # #3 0x55b148103342 <unknown>
          # #4 0x55b148122297 <unknown>
          # #5 0x55b1481030e3 <unknown>
          # #6 0x55b1480cb044 <unknown>
          # #7 0x55b1480cc44e <unknown>
          # #8 0x55b148391861 <unknown>
          # #9 0x55b148395785 <unknown>
          # #10 0x55b14837f285 <unknown>
          # #11 0x55b14839641f <unknown>
          # #12 0x55b14836320f <unknown>
          # #13 0x55b1483ba028 <unknown>
          # #14 0x55b1483ba1f7 <unknown>
          # #15 0x55b1483cbed4 <unknown>
          # #16 0x7f0950294ac3 <unknown>
          # ./spec/features/timeline_editing_spec.rb:207:in `block (2 levels) in <top (required)>'
          # ------------------
          # --- Caused by: ---
          # Selenium::WebDriver::Error::MoveTargetOutOfBoundsError:
          #   move target out of bounds
          #     (Session info: chrome=120.0.6099.109)
          #   #0 0x55b1483ccd33 <unknown>

     2.2) Failure/Error: Capybara::Screenshot.new.screenshot_and_save_page if example.exception

          NoMethodError:
            undefined method `new' for Capybara::Screenshot:Module

                Capybara::Screenshot.new.screenshot_and_save_page if example.exception
                                    ^^^^
          # ./spec/rails_helper.rb:101:in `block (2 levels) in <top (required)>'

@ragesoss
Copy link
Member

ragesoss commented Jan 2, 2024

I think I've fixed all the unrelated spec failures. If you rebase this, hopefully it will be passing now.

@gabina gabina force-pushed the 5547-outreachy-round-27-create-new-class-to-make-requests-to-new-reference-counter-toolforge-api branch from 5f16bb7 to 25da718 Compare January 2, 2024 20:06
@gabina
Copy link
Member Author

gabina commented Jan 4, 2024

@ragesoss you can take a look at this whenever you have the chance, CI is green now!

Copy link
Member

@ragesoss ragesoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks solid overall!

level: 'warning', extra: { project_code: @project_code,
language_code: @language_code, rev_id:,
status_code: response.status, content: parsed_response }
return -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return -1 (rather than nil, for example)? This class is mainly dealing with integers that represent reference counts, so using an integer as an error/warning signal is likely to lead to confusion at some point.

Copy link
Member Author

@gabina gabina Jan 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I considered using -1 as a means to distinguish (at the database level) between revisions where there was an error attempting to retrieve the number of references, and revisions for which we have not yet calculated that.
However, this approach was based on the assumption that the references field in the database would be an integer, which is not the case (we'll be using the existing features field). I believe I can find an alternative method to distinguish between these two cases after refactoring the revision score importer.
I changed this to return nil instead in ca7a4b2
Edit: the good commit is c2bf974

@gabina gabina force-pushed the 5547-outreachy-round-27-create-new-class-to-make-requests-to-new-reference-counter-toolforge-api branch from ca7a4b2 to c2bf974 Compare January 6, 2024 02:39
Copy link
Contributor

@Aminehassou Aminehassou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Good work!

Comment on lines +68 to +71
rescue StandardError => e
@errors << e
return {}
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed the same approach as for the LiftWingApi: errors are accumulated and logged at the batch level.
However, warnings related to responses not being 200 OK keep being logged individually. I think that makes sense for now, because the reference-counter API is under our own control, so we have to monitor this to be able to fix all the necessary bugs. If at some point we find benign frequent non-200 responses and we don't want to log all of them, we can easily change this.

@ragesoss ragesoss merged commit 14ff46e into WikiEducationFoundation:master Jan 16, 2024
1 check failed
@gabina gabina deleted the 5547-outreachy-round-27-create-new-class-to-make-requests-to-new-reference-counter-toolforge-api branch January 16, 2024 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants