Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper: Computational Resource Optimisation in Feature Selection under Class Imbalance Conditions #947

Open
wants to merge 118 commits into
base: 2024
Choose a base branch
from

Conversation

AmadiGabriel
Copy link

@AmadiGabriel AmadiGabriel commented Jun 8, 2024

If you are creating this PR in order to submit a draft of your paper, please name your PR with Paper: <title>. An editor will then add a paper label and GitHub Actions will be run to check and build your paper.

See the project readme for more information.

Editor: Chris Calloway @cbcunc

Reviewers:

@ameyxd ameyxd self-assigned this Jun 8, 2024
@ameyxd ameyxd added the paper This indicates that the PR in question is a paper label Jun 8, 2024
@mepa mepa changed the title paper: Computational Resource Optimisation in Feature Selection under Class Imbalance Conditions Paper: Computational Resource Optimisation in Feature Selection under Class Imbalance Conditions Jun 9, 2024
Copy link

github-actions bot commented Jun 9, 2024

Curvenote Preview

Directory Preview Checks Updated (UTC)
papers/amadi_udu 🔍 Inspect 80 checks passed (4 optional) Jul 5, 2024, 9:51 AM

@cbcunc
Copy link
Member

cbcunc commented Jun 20, 2024

Review reminder sent to @janeadams

@janeadams
Copy link

janeadams commented Jun 26, 2024

A succinct and interesting read on evaluating permutation feature importance (PFI) impacts on three different classification models (Random Forest, LightGBM, and SVM) with varying proportions of subsampled data featuring unbalanced classes. I have minor comments but overall I think this a great contribution.

  • The dual axes in the processing time figure were odd to me at first; it might be valuable to explain that SVM's poor performance relative to the other two methods is likely due to its poor parallelizability (if that's a word)
  • The "decrease in AUC" figures are confusing in that negative x-axis values must therefore indicate increased in AUC? (Correct me if I am misunderstanding). This forces the reader to think about a "double negative makes a positive" which adds possibly unnecessary complexity to interpretation. I would recommend either 1) changing the axis / measure to just be "change in AUC" and/or 2) adding annotations directly onto the white space with an arrow indicating "poorer performance this direction" or similar.

I particularly appreciated the pre-filtering step of using hierarchical clustering of features to account for potential collinearities. I also appreciated that the authors used multiple data sets and evaluated at a range of sample proportions. This is a nice example of how a lot of scientific computing python libraries can come together into a single interesting experiment.

@AmadiGabriel
Copy link
Author

A succinct and interesting read on evaluating permutation feature importance (PFI) impacts on three different classification models (Random Forest, LightGBM, and SVM) with varying proportions of subsampled data featuring unbalanced classes. I have minor comments but overall I think this a great contribution.

  • The dual axes in the processing time figure were odd to me at first; it might be valuable to explain that SVM's poor performance relative to the other two methods is likely due to its poor parallelizability (if that's a word)
  • The "decrease in AUC" figures are confusing in that negative x-axis values must therefore indicate increased in AUC? (Correct me if I am misunderstanding). This forces the reader to think about a "double negative makes a positive" which adds possibly unnecessary complexity to interpretation. I would recommend either 1) changing the axis / measure to just be "change in AUC" and/or 2) adding annotations directly onto the white space with an arrow indicating "poorer performance this direction" or similar.

I particularly appreciated the pre-filtering step of using hierarchical clustering of features to account for potential collinearities. I also appreciated that the authors used multiple data sets and evaluated at a range of sample proportions. This is a nice example of how a lot of scientific computing python libraries can come together into a single interesting experiment.

Thank you for the encouraging comments and observations on the paper @janeadams . We are currentlly addressing some of the comments raised by @apaleyes . Hopefully, all observations raised will be responded to early next week and the paper updated accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
paper This indicates that the PR in question is a paper ready-for-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants