-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Outreachy Round 27] Refactor LiftWingApi
and RevisionScoreImporter
#5578
[Outreachy Round 27] Refactor LiftWingApi
and RevisionScoreImporter
#5578
Conversation
…e is an error hitting the API
revision.wp10 = score.dig('wp10') | ||
revision.features = score.dig('features') | ||
# only modify the existing deleted value if revision was deleted | ||
revision.deleted = true if score.dig('deleted') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec modified in c045872 was to be sure that revision.deleted
is not updated if score.dig('deleted')
isn't set to true. If score.dig('deleted')
is nil (because score
is the empty hash {}), then revision.deleted
is not updated.
lib/lift_wing_api.rb
Outdated
# Given an array of revision ids, it returns an array with useful metrics for those | ||
# revision ids. | ||
# Format result example: | ||
# { 'rev_id0' => { 'wp10' => 0.2915228958136511656e2, 'features' => features_value, | ||
# 'deleted' => false, 'prediction' => 'Stub' } | ||
# ... | ||
# 'rev_idn' => { 'wp10' => 0.285936675221734978e2, 'features' => features_value, | ||
# 'deleted' => false, 'prediction' => 'D' } | ||
# } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this as I don't think there is any other way to specify types in ruby
There is a mistake though: Given an array of revision ids, it returns a hash with useful metrics for those
lib/lift_wing_api.rb
Outdated
# https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing | ||
class LiftWingApi | ||
class LiftWingApi # rubocop:disable Metrics/ClassLength |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to disable the rule because the class is too long.
One option could be to move the WEIGHTING
constants outside the class? Other thing that would take more effort would be to split the LfitWingApi
class in two different classes: one to get data form the api, and other one to process it (calculating the wp10
for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The better place to do this is .rubocop.yml
so that we have a central list of all the too-big classes.
lib/lift_wing_api.rb
Outdated
def get_single_revision_data(rev_id) | ||
private | ||
|
||
# Returns a has with wp10, features, deleted, and prediction, or empty hash if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo here: Returns a hash
lib/lift_wing_api.rb
Outdated
body = { rev_id:, extended_output: true }.to_json | ||
response = lift_wing_server.post(quality_query_url, body) | ||
parsed_response = Oj.load(response.body) | ||
|
||
return equivalent_ores_error_response(rev_id, parsed_response) if parsed_response.key? 'error' | ||
# If the responses contains an error, do not try to calculate wp10 or features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo here: If the responses contain
@@ -57,22 +57,6 @@ | |||
end | |||
end | |||
|
|||
it 'saves wp10 scores by article' do | |||
VCR.use_cassette 'revision_scores/by_article' do | |||
described_class.update_revision_scores_for_all_wikis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed all the specs calling update_revision_scores_for_all_wikis
because removed the method.
LiftWingApi
and RevisionScoreImporter
LiftWingApi
and RevisionScoreImporter
'error' => { 'message' => error_response['error'], 'type' => type } | ||
} } } } } | ||
end | ||
|
||
# TODO: monitor production for errors, understand them, put benign ones here | ||
TYPICAL_ERRORS = [].freeze | ||
|
||
def log_error_batch(rev_ids) | ||
return if @errors.empty? | ||
|
||
log_error(@errors.first, update_service: @update_service, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that we always log the first error, even when there might be multiple distinct errors. I guess we work under the assumption that if several requests fail, it's likely for the same reason, which may hold true in most cases. However, I suggest resetting the @errors
array in get_revision_data
. Without this adjustment, if there are two batches of requests, each with a single request failing, we would log the first error twice and miss the second one entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done on cde5e1d
lib/lift_wing_api.rb
Outdated
@weighting ||= WEIGHTING_BY_LANGUAGE[@wiki.language] | ||
end | ||
|
||
def weighted_mean_score(score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this whole set of weighting
code might make more sense as a separate file, since turning articlequality predictions into a single score is not a concept inherent to the LiftWing API.
Perhaps it can be extracted to a module
that LiftWingApi includes (similar to ConstantUpdate
and its included module BatchUpdateLogging
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks solid. I left a couple of suggestions; other than that, it looks ready to me. @Aminehassou would you like to have a look as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside of the things Sage already mentioned, LGTM!
@@ -70,15 +75,24 @@ | |||
it 'handles timeout errors' do | |||
stub_request(:any, 'https://api.wikimedia.org') | |||
.to_raise(Errno::ETIMEDOUT) | |||
expect_any_instance_of(described_class).to receive(:log_error).once | |||
expect(lift_wing_api_class_en_wiki).to receive(:log_error).once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to avoid using expect_any_instance_of
here, as the spec then makes statements on subject0
, which is a particular instance. I don't like making expectations to any instance and to a particular instance in the same spec, I think it's confusing.
What this PR does
This PR is part of the "Improve how Wiki Education Dashboard counts references added" project (read issue #5547).
It makes some refactors on
LiftWingApi
andRevisionScoreImporter
classes.The
RevisionScoreImporter
is responsible for populating fields such aswp10
,features
,wp10_previous
,features_previous
, anddeleted
in the revisions. Currently, this involves using the LiftWing API to retrieve data. However, as part of the "Improve how Wiki Education Dashboard counts references added" project, we also need to use the new reference-counter API (see #5565).The primary idea is to maintain the
RevisionScoreImporter
as the class responsible for populating all revision score fields, regardless of whether the values come from the LiftWing API or the Toolforge reference-counter API. This approach offers benefits, as there is valuable logic that should be shared in the population process, irrespective of the data source. For example, theget_parent_revisions
method retrieves parent revision IDs, necessary for updating bothwp10_previous
(from the LiftWing API) andfeatures_previous
(from the Toolforge reference-counter API).Before this PR, the importer contained a lot of logic specific to the articlequality model data and how to process it (such as calculating the
wp10
value from the scores).The purpose of this refactor is to move elements too specific to the LiftWing API into the existing
LiftWingApi
class. This way,RevisionScoreImporter
becomes a class with the logic to populate fields from bothLiftWingApi
andReferenceCounterApi
.Open questions and concerns
< anything you learned that you want to share, or questions you're wondering about related to this PR >