-
Notifications
You must be signed in to change notification settings - Fork 7
Enhance tfact_problem_events with Submission Details #1548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…le rows. Added a new tfact model to handle the studentmodule problem events for each platform.
Addressed linter errors.
@@ -101,6 +101,191 @@ with mitxonline_problem_events as ( | |||
select * from {{ ref('dim_platform') }} | |||
) | |||
|
|||
, content as ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a dedupe query in this file, as simply doing a UNION ALL on data from studentmodule would introduce duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the dbt test to enforce uniqueness between these columns in tfact_problem_events:
["openedx_user_id", "courserun_readable_id", "problem_block_fk", "event_timestamp"]
When I built the tfact_problem_events table locally I did not get a failure on that test but I should still add some dedupe logic to make sure we don't have dupes with production data. I added dedupe logic to the model using the same fields as the dedupe key. I am now just testing to make sure it builds successfully.
I meant to leave comments earlier. If we plan to integrate the studentmodulehistory data into tfact_problem_events, we should ensure there are deduplication queries in place. Otherwise, we end up getting duplicate records, since studentmodulehistory and tracking log contain overlapping data. Also, I'm not sure why we would want to add the same data in two different places if there is a new |
Updated table definition. Updated queries in studentmodule macro. Refactored tfact_problem_events to pull in studentmodule data from intermediate table.
Adding in guards to address null or empty maps with the correctness and answer history arrays. Building the studentmodule tfact model as an actual table instead of a view.
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the tfact_problem_events data by incorporating submission details from studentmodule and studentmodulehistoryextended sources. Key changes include:
- Introduction of a new model (tfact_studentmodule_problems.sql) that generates problem event records from multiple platforms.
- Modifications to tfact_problem_events.sql to union in data from the new studentmodule events and perform deduplication.
- Updates to the models YAML configuration and the transformation macro to support the new submission details.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/ol_dbt/models/dimensional/tfact_studentmodule_problems.sql | New model aggregating studentmodule problem events. |
src/ol_dbt/models/dimensional/tfact_problem_events.sql | Updated to union studentmodule events and deduplicate records. |
src/ol_dbt/models/dimensional/_dim__models.yml | Added metadata for tfact_studentmodule_problems and adjusted tests. |
src/ol_dbt/macros/transform_studentmodule_data.sql | New macro to transform studentmodule data for problem events. |
Comments suppressed due to low confidence (1)
src/ol_dbt/models/dimensional/_dim__models.yml:487
- Duplicate column 'event_type' is defined in the tfact_studentmodule_problems model; consider consolidating or renaming one occurrence to avoid ambiguity.
- name: event_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances tfact_problem_events
by integrating detailed submission data from studentmodule
and its history. A new macro and an intermediate model tfact_studentmodule_problems
are introduced. There are a few critical and high-severity issues in the YML schema definitions (duplicate column, type mismatches, incorrect test key) that need to be addressed.
, case | ||
when correct_item is not null and cardinality(map_keys(cast(correct_item as map(varchar, json)))) >= 1 | ||
then map_keys(cast(correct_item as map(varchar, json)))[1] | ||
else null | ||
end as problem_id | ||
, case | ||
when correct_item is not null and cardinality(map_keys(cast(correct_item as map(varchar, json)))) >= 1 | ||
then json_extract_scalar( | ||
cast(correct_item as map(varchar, json))[map_keys(cast(correct_item as map(varchar, json)))[1]], | ||
'$.correctness' | ||
) | ||
else null | ||
end as correctness -- noqa: PRS | ||
, case | ||
when answer_item is not null and cardinality(map_keys(cast(answer_item as map(varchar, json)))) >= 1 | ||
then json_format( | ||
cast(answer_item as map(varchar, json))[map_keys(cast(answer_item as map(varchar, json)))[1]] | ||
) | ||
else null | ||
end as answers_json -- noqa: PRS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for extracting problem_id
, correctness
, and answers_json
relies on map_keys(...)[1]
, assuming that the relevant map (correct_item
or answer_item
) always has at least one key and that the first key is the desired one. If a map could legitimately contain multiple keys, this logic would only pick the first part. Consider adding a comment to clarify this assumption about the JSON structure.
What are the relevant tickets?
https://github.com/mitodl/hq/issues/7195
Description (What does it do?)
Combine data from tracking logs, studentmodule, and studentmodulehistoryextended in order to enhance the tfact_problem_events data used for reporting student activity.
Affected tables:
tfact_problem_events
How can this be tested?