-
Notifications
You must be signed in to change notification settings - Fork 12
feat: [FC-7879] add new fields/indices to Content-/UserDate to support assignments #343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: [FC-7879] add new fields/indices to Content-/UserDate to support assignments #343
Conversation
|
Thanks for the pull request, @Serj-N! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
edx_when/models.py
Outdated
| field = models.CharField(max_length=255, default='') | ||
| active = models.BooleanField(default=True) | ||
| block_type = models.CharField(max_length=255, null=True) | ||
| assignment_title = models.CharField(max_length=255, blank=True, default='', db_index=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an explanation or performance implication of adding index (not the general explanation but some real example of the field usage)? Just to be aware of the use case.
It will be useful to see the filtering to assess the necessity to trade an additional effort to do the indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the fields assignment_title and first_component_block_id are not used for filtering, they are only serialized in AllCourseDatesSerializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also curious about the indices that have been added on these new fields in both models. What's the value add? Is there a particular performance improvement that we need on reads for which the indexes will be a substantial help (where they have not been considered necessary on the other columns in the table)? The fields are only used for serialization and not filtering, so why do they need to be indexed?
Because all the index attributes do cause a performance cost on write, I am definitely curious about the advantage gain they are bringing here. I'm comfortable putting indices on the columns if they are necessary, I just want to know that the reason to been considered and judged valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you. Removed unused indexes 746c3dd
edx_when/models.py
Outdated
| block_type = models.CharField(max_length=255, null=True) | ||
| assignment_title = models.CharField(max_length=255, blank=True, default='', db_index=True) | ||
| course_name = models.CharField(max_length=255, blank=True, default='') | ||
| subsection_name = models.CharField(max_length=255, blank=True, default='', db_index=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, there are no use cases for this particular field, perhaps it was added with some future functionality in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding a new database column – a new indexed database column, no less – for no planned future use seems like a bad idea to me. If there is particular future functionality that is on a roadmap, that's different, but if it's just speculative, does it need to be added? I assume if nothing is populating the column, then the indexing is cost free (I'm not sure, actually, does the DB bother to rebuild the index when you add more '' strings to an empty column?), but even so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed indexes from these fields 746c3dd
edx_when/models.py
Outdated
| is_content_gated = models.BooleanField(default=False) | ||
|
|
||
| class Meta: | ||
| """Django Metadata.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the docstring according the project's code style standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not found a single case of class Meta with a proper dosctring in the project, and in fact, I don't think it is needed here. Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, pylint complained, so I added back more elaborate versions of the docstrings.
edx_when/models.py
Outdated
| actor = models.ForeignKey( | ||
| get_user_model(), null=True, default=None, blank=True, related_name="actor", on_delete=models.CASCADE | ||
| ) | ||
| first_component_block_id = UsageKeyField(null=True, blank=True, max_length=255, db_index=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above about performance - could you please add some comments in the PR about the frequency of filtering by this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the field is only used for serialization, not filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's not used for filtering then this field shouldn't be indexed either, should it? I'm not sure because the comment in the code specifically says the index is there to speed up lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
General comment - code coverage should be improved. |
fb73c11 to
c2c5f16
Compare
c2c5f16 to
a107530
Compare
edx_when/models.py
Outdated
| models.Index(fields=('assignment_title', 'course_id'), name='edx_when_assignment_course_idx'), | ||
| models.Index(fields=('subsection_name', 'course_id'), name='edx_when_subsection_course_idx'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔍 Composite Indexing: The Left-Prefix Principle
This is a crucial point for database performance! To ensure we get the most benefit from our multi-column indexes, we must respect the left-prefix principle (or left-most column rule).
An index on (A, B, C) can be used efficiently for filtering on:
A(A, B)(A, B, C)
What it Cannot Do
The index will not be used efficiently (or at all, depending on the database and query) if the query skips the leading column(s):
- Filtering on
BorCalone. - Filtering on
(B, C). - Filtering on
(A, C). (It can use theApart, but the query optimizer will still have to scan the entire range ofBvalues for that specificA.)
Actionable Takeaway
When defining a composite index, the columns should be ordered based on the query patterns: place the column(s) most frequently used in the WHERE clause first.
| models.Index(fields=('assignment_title', 'course_id'), name='edx_when_assignment_course_idx'), | |
| models.Index(fields=('subsection_name', 'course_id'), name='edx_when_subsection_course_idx'), | |
| models.Index(fields=('course_id', 'assignment_title',), name='edx_when_assignment_course_idx'), | |
| models.Index(fields=( 'course_id', 'subsection_name'), name='edx_when_subsection_course_idx'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. These fields won't be used for filtering, I removed them (746c3dd). If there will be need to use them - index can be added later
746c3dd to
2880086
Compare
|
@qasimgulzar @deborahgu Are you happy with the latest round of change? @cmltaWt0 you should also resolve your requested changes if you are satisfied. |
Sure! Thanks for ping. |
deborahgu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me.
qasimgulzar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
|
@deborahgu would you mind merging, the author doesn't have access to do so. |
|
I also don't have access to merge, @e0d. |
UsamaSadiq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A version update will be needed to release this change on PyPI. Will that be done in a follow up PR?
|
@UsamaSadiq that's a good call. I think we should do it as part of this PR. @Serj-N can you update the version? We'll need to update requirements in edx-platform and elsewhere as appropriate. |
|
@e0d @UsamaSadiq Sure, but could you please specify what version it should be? I see the latest version is 3.1.0. Should the new version be 3.2.0? |
@Serj-N Yes, this introduced migration will also be applied in edx-platform so it'll be best to bump a minor version to highlight the change it is introducing. You can bump the package version to |
…models-to-support-assignments
|
@e0d @UsamaSadiq Done, version bumped to 3.2.0. |
Note
This is a newly reopened version of an earlier PR, which should most likely be closed due to contribution rights.
Description
This PR Extends the
ContentDateandUserDatemodels in edx-when to support enhanced assignment metadata and future scheduling use cases.ContentDatemodel changes:assignment_title,course_name, andsubsection_namefields to improve tracking and display of assignment metadata.UserDatemodel changes:first_component_block_idfor referencing the starting block of a user’s assignment attempt.is_content_gatedto establish whether the user has access to the corresponding block. For convenience, exposed aslearner_has_accessproperty.