MODNet for precalculated features and feature selection for a selected list of features #90
-
Hi, I want to use MODNet for predicting the activity of supported metal catalysts. I have all the features I have calculated for metal and the support separately.
|
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
Hi @github-ML-fan, sounds like an interesting project!
|
Beta Was this translation helpful? Give feedback.
-
Hi, interesting project!
In this case, I would still suggest to use |
Beta Was this translation helpful? Give feedback.
-
Sounds great. Thank you very much @ml-evs for the quick response with codes. I thought of adding the null features since the mol% are only for the metals that are supported which are crucial for activity. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the reply @ppdebreuck . I forgot to mention that I have used mol% for only the metals that are supported on the bulk compound. I have calculated the molar average and standard deviation of matminer features and more for the metals and support separately. I am not sure whether I could featurize for the whole catalyst since there is a separate crystal system for the support and another crystal system for the metals that are in the form of nano/bulk particles. But I would like to use the method suggested by you and love to have an example script. |
Beta Was this translation helpful? Give feedback.
-
Hi @github-ML-fan from pymatgen.core import Composition
from modnet.preprocessing import MODData
from modnet.hyper_opt import FitGenetic
import pandas as pd
import numpy as np
def main():
# define materials ids and composition, can also be structures !
my_ids = ["id1", "id2", "id3"]
surface = [Composition("Li2O"), Composition("MgO"), Composition("TiO2")]
particle = [Composition("Zr"), Composition("Co"), Composition("Au")]
my_features = pd.DataFrame({"f1": [0.12, 0.16, 0.56]}, index=my_ids)
# some pandas dataframe containing your features
# define the MODDatas
surface_data = MODData(materials=surface, structure_ids=my_ids)
particle_data = MODData(materials=particle, structure_ids=my_ids)
# featuriztion
surface_data.featurize()
particle_data.featurize()
particle_data.df_featurized.columns = [
x + "_particle" for x in particle_data.df_featurized.columns
] # simple name change such that particle features have different name than surface features
# joining all features, including custom ones
new_df_featurized = (
surface_data.df_featurized.join(particle_data.df_featurized)
).join(my_features)
# final MODData used for feature selection and fitting
final_data = MODData(
materials=[
None for _ in range(len(surface))
], # this is not used as you provide the features.
targets=np.array([[1, 2, 3]]).T,
target_names=["my_property"],
df_featurized=new_df_featurized,
structure_ids=my_ids,
)
final_data.feature_selection()
# train model
ga = FitGenetic(final_data)
model = ga.run()
model.predict(
test_data
) # predict on new samples, but test_data should follow same featurization pipeline as train data !
if __name__ == "__main__":
main() |
Beta Was this translation helpful? Give feedback.
-
Thank you so much!! |
Beta Was this translation helpful? Give feedback.
Hi @github-ML-fan
Please find hereunder an example where I make custom features from two compounds (surface + particle), and also add some custom features. I hope this is more or less in line with your problem.
The important part is to play with the
data.df_featurized
dataframe andjoin()
method to add features as desired. This dataframe is further used for feature selection and for fitting with e.g.model = ga.run(data)
.