-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ElMD compositional featurizer #726
base: master
Are you sure you want to change the base?
Conversation
Hi @sgbaird thanks for the PR! This looks great and it is encouraging to see it basically in a working form already. I think it is preferable to have the ElM2D code directly in matminer if it is not too complicated, and from briefly looking over the code it doesn't look like it is too complex to include (from the perspective of keeping matminer self contained and not too sprawling at least.. definitely not saying it is easy to port the code over from wasserstein haha) We will need to make some changes to the locations of some of the classes and such to keep things organized but this is an excellent start if it is working on your end. Here are some structural (not formatting) changes I think we'd need to make to take this PR to completion: PCA classWould it be possible to use the PCA class from sklearn instead? Imports (
|
@ardunn, this is great! Thanks for looking through the code. Using FeaturizeThere are a few options for
(1) isn't particularly interesting (encodes the composition and a machine-learned scalar representation of each element), but they are "features" in the strictest sense - a fixed compound always maps to the same vector. Do you lean towards one of the above? I'll leave this one up to your knowledge of Adjust UMAP embedding implementationIf we go with (3), it would probably be better (if not essential) for me to implement a version that allows |
External dependenciesFor FeaturesI see two main ways of doing this:
The |
Hey @sgbaird any thoughts on this? Should I close? |
Sorry, I kind of dropped the ball on this. Thank you for the reminder! I think I'll reference it as an external dependency for now. I'll plan on getting this worked out in the next month. Will that work out? |
Yeah no problem! We don't have any time restrictions or anything. Just wanted to see if it was still something you wanted to do :) |
Summary
Add featurization scheme for ElM2D and corresponding UMAP embeddings.
TODO (if any)
@ardunn
I spent a few hours refactoring my refactor of ElM2D (chem_wasserstein) to try to conform the code to be (at least closer) to matminer, but I'm realizing I may not be able to justify the time it will take to push the PR all the way through. I'm open to your feedback (especially after you've glanced at the basic skeleton of the changes), but I'm thinking it might be more reasonable for now to use
chem_wasserstein
as an external, optional dependency.