-
-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch oldest authored_at correctly instead of oldest authored_at per page #4460
base: master
Are you sure you want to change the base?
Fetch oldest authored_at correctly instead of oldest authored_at per page #4460
Conversation
@@ -87,27 +87,37 @@ class VersionsControllerTest < ActionController::TestCase | |||
end | |||
|
|||
assert_select ".gem__version__date sup", text: "*", count: 1 | |||
|
|||
assert_select ".t-list__heading", text: /1 versions since January 01, 2000/, count: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"1 versions" and not "1 version" is an existing bug that could probably be fixed using https://guides.rubyonrails.org/i18n.html#pluralization in
rubygems.org/config/locales/en.yml
Line 710 in a166302
versions_since: "%{count} versions since %{since}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted. Would you mind to open PR to fix this please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR for this is #4466
def set_oldest_version_date | ||
oldest_created_at = @rubygem.versions.order(:created_at).first | ||
oldest_built_at = @rubygem.versions.order(:built_at).first | ||
@oldest_version_date = [oldest_created_at, oldest_built_at].compact.map(&:authored_at).min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to move this logic into SQL side using MIN function and get the oldest version date in 1 query directly? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this should be possible.
I used rubygem id=183850 as an example, which is https://rubygems.org/gems/sorbet-static. It has the most versions (10991) in the database as of the Feb 12 dump on https://rubygems.org/pages/data.
I first tried this
SELECT
min(least (created_at, built_at))
FROM
versions
WHERE
rubygem_id = 183850;
However, this doesn't exactly replicate the behavior in the application code
rubygems.org/app/models/version.rb
Lines 201 to 213 in 89b2171
def authored_at | |
return built_at if rely_on_built_at? | |
created_at | |
end | |
# Originally used to prevent showing misidentified dates for gems predating RubyGems, | |
# this method also covers cases where a Gem::Specification date is obviously invalid due to build-time considerations. | |
def rely_on_built_at? | |
return false if created_at.to_date != RUBYGEMS_IMPORT_DATE | |
built_at && built_at <= RUBYGEMS_IMPORT_DATE | |
end |
In fact, for sorbet-static
, the SQL query above returns 2019-05-17 00:00:00
but Rubygem.find(183850).versions.map(&:authored_at).min
returns Sat, 18 May 2019
, so there is a one day difference.
If we want to duplicate the application code into SQL, I believe this achieves the same thing
SELECT
min(
CASE WHEN DATE(created_at) = '2009-07-25'
AND built_at < created_at THEN
built_at
ELSE
created_at
END) AS authored_at
FROM
versions
WHERE
rubygem_id = 183850;
So to generate this query, I could replace this code with
@oldest_version_date = @rubygem.versions.minimum("case when DATE(created_at) = '#{Version::RUBYGEMS_IMPORT_DATE}' and built_at < created_at then built_at else created_at end")
or if we prefer Arel
versions = Version.arel_table
created_at_date = Arel::Nodes::NamedFunction.new(
'DATE',
[versions[:created_at]]
)
rely_on_built_at = created_at_date.eq(Version::RUBYGEMS_IMPORT_DATE).and(versions[:built_at].lt(versions[:created_at]))
authored_at_case_statement = Arel::Nodes::Case.new
.when(rely_on_built_at).then(versions[:built_at])
.else(versions[:created_at])
@oldest_version_date = @rubygem.versions.minimum(authored_at_case_statement)
Do you think it's worth the additional complexity of the case statement to replicate the same behavior as Version#authored_at
? I don't fully understand where built_at
comes from, especially when a gem doesn't predate RUBYGEMS_IMPORT_DATE
, so I'm not sure what the implications are of blindly using least(created_at, built_at)
and would probably lean towards duplicating the authored_at
logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also checked the performance of these 2 queries vs the min+least or case statement. Surprisingly, at least on my local database, while the query planner thinks the separate checks on created_at
and built_at
have lower cost, the execution time of either the min+least or case statement are both much shorter.
rubygems_development=# explain analyze select min(built_at) from versions where rubygem_id=183850;
QUERY PLAN >
-------------------------------------------------------------------------------------------------------------------------------------------------->
Result (cost=41.96..41.97 rows=1 width=8) (actual time=2365.886..2365.896 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.43..41.96 rows=1 width=8) (actual time=2365.874..2365.875 rows=1 loops=1)
-> Index Scan using index_versions_on_built_at on versions (cost=0.43..429251.91 rows=10336 width=8) (actual time=2365.871..2365.872 >
Index Cond: (built_at IS NOT NULL)
Filter: (rubygem_id = 183850)
Rows Removed by Filter: 1071279
Planning Time: 1.692 ms
Execution Time: 2366.057 ms
(9 rows)
rubygems_development=# explain analyze select min(created_at) from versions where rubygem_id=183850;
QUERY PLAN >
-------------------------------------------------------------------------------------------------------------------------------------------------->
Result (cost=41.88..41.89 rows=1 width=8) (actual time=2718.618..2718.619 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.43..41.88 rows=1 width=8) (actual time=2718.605..2718.606 rows=1 loops=1)
-> Index Scan using index_versions_on_created_at on versions (cost=0.43..428426.01 rows=10336 width=8) (actual time=2718.604..2718.60>
Index Cond: (created_at IS NOT NULL)
Filter: (rubygem_id = 183850)
Rows Removed by Filter: 1068171
Planning Time: 22.439 ms
Execution Time: 2719.168 ms
(9 rows)
rubygems_development=# explain analyze select min(least(created_at, built_at)) from versions where rubygem_id=183850;
QUERY PLAN >
-------------------------------------------------------------------------------------------------------------------------------------------------->
Aggregate (cost=30761.62..30761.63 rows=1 width=8) (actual time=187.097..187.098 rows=1 loops=1)
-> Bitmap Heap Scan on versions (cost=196.53..30709.94 rows=10336 width=16) (actual time=20.778..185.133 rows=10991 loops=1)
Recheck Cond: (rubygem_id = 183850)
Heap Blocks: exact=6022
-> Bitmap Index Scan on index_versions_on_rubygem_id (cost=0.00..193.95 rows=10336 width=0) (actual time=18.283..18.284 rows=10991 loop>
Index Cond: (rubygem_id = 183850)
Planning Time: 0.541 ms
Execution Time: 187.278 ms
(8 rows)
rubygems_development=# explain analyze select min(case when created_at::date = '2009-07-25'::date and built_at < created_at then built_at else created_at end) as authored_at from versions where rubygem_id=183850;
QUERY PLAN >
-------------------------------------------------------------------------------------------------------------------------------------------------->
Aggregate (cost=30813.30..30813.31 rows=1 width=8) (actual time=94.130..94.132 rows=1 loops=1)
-> Bitmap Heap Scan on versions (cost=196.53..30709.94 rows=10336 width=16) (actual time=4.185..90.765 rows=10991 loops=1)
Recheck Cond: (rubygem_id = 183850)
Heap Blocks: exact=6022
-> Bitmap Index Scan on index_versions_on_rubygem_id (cost=0.00..193.95 rows=10336 width=0) (actual time=3.377..3.378 rows=10991 loops=>
Index Cond: (rubygem_id = 183850)
Planning Time: 2.007 ms
Execution Time: 94.376 ms
(8 rows)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albertchae thanks for detailed research on this. I'm also using sorbet-static
for "extreme" cases performance testing, good choice!
I did quick check in production DB and I see similar pattern. It is most likely caused by uneven distribution of versions (thanks for this extreme sorbet-static
gem). I did quick test also on more common case like nokogiri
gem and select min(case when created_at::date = '2009-07-25'::date and built_at < created_at then built_at else created_at end) as authored_at from versions where rubygem_id=?
query works well in all cases and if I understand it well, it will keep the same value logic as today (but with bug fixed).
So for now, I would prefer to add @oldest_version_date = @rubygem.versions.minimum("case when DATE(created_at) = '#{Version::RUBYGEMS_IMPORT_DATE}' and built_at < created_at then built_at else created_at end")
(maybe hide the SQL into model somehow) in controller.
And for the future, we can check if there's any way to synchronise dates in DB with RUBYGEMS_IMPORT_DATE
constant and normalize them in DB so one MIN(...)
clause will be enough.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for now, I would prefer to add
@oldest_version_date = @rubygem.versions.minimum("case when DATE(created_at) = '#{Version::RUBYGEMS_IMPORT_DATE}' and built_at < created_at then built_at else created_at end")
(maybe hide the SQL into model somehow) in controller.
Sounds good. I pushed up a commit to add this as a scope to the Version
model. One small note, I changed the query to use <=
to match the application code exactly, but I think we get the same result with or without the =
And for the future, we can check if there's any way to synchronise dates in DB with
RUBYGEMS_IMPORT_DATE
constant and normalize them in DB so oneMIN(...)
clause will be enough.
This sounds like a good long term fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I pushed it up as a separate commit for ease of review, but let me know if you'd like me to squash them. I also edited the PR description to match the new implementation.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4460 +/- ##
=======================================
Coverage 97.15% 97.15%
=======================================
Files 391 391
Lines 8259 8264 +5
=======================================
+ Hits 8024 8029 +5
Misses 235 235 ☔ View full report in Codecov by Sentry. |
From rubygems#4460 (comment) Added a test to verify this string is being pluralized properly. Also added every locale with varying degrees of confidence - es/pt-BR: I know enough to be fairly confident this is correct - ja: I know enough to be fairly confident this is correct and verified with native speaker - fr: I verified with native speaker - de/nl: I used online translation tools to check singular vs plural - zh-CN/zh-TW: I don't think they pluralize, but I just copied the existing string into both `other` and `one` which maintains behavior before this PR anyway.
From #4460 (comment) Added a test to verify this string is being pluralized properly. Co-authored-by: Albert Chae <[email protected]>
0d2f04e
to
d286515
Compare
app/models/version.rb
Outdated
created_at | ||
END | ||
SQL | ||
)&.in_time_zone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimum query by itself returns a Time
but #authored_at
returns an ActiveSupport::TimeWithZone
. So I added in_time_zone
to convert the Time
➡️ ActiveSupport::TimeWithZone
.
But I don't think it is strictly necessary because the equality check in the tests pass without the conversion and the rendering on the versions page is the same as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no timezone conversion currently happening in the codebase and all times are just printed to the page as stored in DB. IMHO not needed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, removed in_time_zone
…page Fixes rubygems#4448 Previously, the "since date" was using oldest `version.authored_at` for a given page. This change fixes it to get the oldest `version.authored_at` in the database. Because `authored_at` depends on either `built_at` or `created_at`, I fetch both and take the min. This should only be 2 additional queries of 1 record each. I modified existing tests to check for the "versions since" text and also have 2 pages of actual content.
Replicate `Version#authored_at` logic in a SQL case statement and call SQL `minimum` on that to get the oldest `authored_at` Add tests for the model using similar structure as tests for `authored_at`
d286515
to
b146132
Compare
app/models/version.rb
Outdated
@@ -53,6 +53,19 @@ class Version < ApplicationRecord # rubocop:disable Metrics/ClassLength | |||
validate :metadata_attribute_length | |||
validate :no_dashes_in_version_number, on: :create | |||
|
|||
scope :oldest_authored_at, lambda { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make this class level method instead of scope
? Scope is usually used to return collection (is chainable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah that makes sense, changed to a class method and moved closer to #authored_at
…ethod Also now that `.oldest_authored_at` is a class method and not a scope, moved it next to `#authored_at
@simi thanks for all the reviews and comments so far. Was there anything else you wanted me to address for this PR? |
Fixes #4448
Previously, the "since date" was using oldest
version.authored_at
for a givenpage. This change fixes it to get the oldest
version.authored_at
inthe database. Because
authored_at
depends on eitherbuilt_at
orcreated_at
, I added a scope to replicate this behavior viaSQL and get the
minimum
. This should only be 1 additional query for one column.I modified existing tests to check for the "versions since" text and
also have 2 pages of actual content.