Add features based on file paths in the title and description#4270
Add features based on file paths in the title and description#4270benjaminmah wants to merge 42 commits intomozilla:masterfrom
Conversation
|
Metrics of the newly trained model: metrics.log |
suhaibmujahid
left a comment
There was a problem hiding this comment.
Do you see significant improvement when adding this feature?
I've previously attached the metrics of the model here:
Here are the metrics of the original/current model: metrics_original.log There is a slight improvement (~ +1%) in each of the metrics. |
marco-c
left a comment
There was a problem hiding this comment.
Looks good in general, but could you add a few tests for the new class?
|
I've converted this PR to a draft, as I realized there still needs some polishing to do with the extraction of file paths. For example, there are cases where it may mistake a URL or a step (i.e. 1.Step 1, 2.Step2) as a file path. Once done, I'll be sure to add a few tests for this feature! |
|
Current metrics: metrics.log Seems to perform slightly worse than the current model and It is worth noting that the first instance of the file path feature model correctly classified the above bug as |
|
The current model now classifies |
Added two tests here: a5a9c0f |
|
Seems like the tests failed, I'll do some revisions for these ASAP. |
|
What is the difference in average precision / recall? Is there any component which gets much better or much worse? |
Here are the metrics from the model with the Here are the metrics from the currently deployed model (which does not include the For the 0.9 CF, the precision increased by 0.02 and recall increased by 0.01. Overall, there seems to be an increase in most metrics for specific product-component pairs, however feel free to consult the detailed metrics for the few cases where either the precision or recall dropped with the new model. |
marco-c
left a comment
There was a problem hiding this comment.
Given your latest changes, was there any effect on the metrics?
Training the model with the file path feature included and excluded, I got the following results:
Overall, there seems to be a marginal increase in precision and recall when the file path feature is included. |
7eaab61 to
2022eb4
Compare
suhaibmujahid
left a comment
There was a problem hiding this comment.
@benjaminmah could you please resolve conflicts, do a self-review and check the impact on performance?
Resolves #4269.
Introduces new feature that uses file paths mentioned in the title and description of a bug and splits it into sub-paths and individual directories/files.