-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does MODNet have facilities for including state variables such as temperature or pressure? #69
Comments
Hi @sgbaird, There is at this stage nothing that "easily" includes state variables (at least not expicitely). Though, two quick solutions exists. If the properties are available on a fixed range (e.g. temperature dependent property), this could be used as a vector (multi-property). Another (explicit) way is to append the state to the generated features. |
Hi @ppdebreuck, Thanks for the quick reply! For the first one, it sounds like you mean add the temperature as an additional target property? For the second one, perhaps I can just append a column to the modnet/modnet/preprocessing.py Lines 530 to 556 in 719e028
Maybe something like the following: from modnet.preprocessing import MODData
from modnet.models import MODNetModel
# Creating MODData
data = MODData(materials = structures,
targets = targets,
)
data.featurize()
data.df_featurized.append({"T": temperatures})
data.feature_selection(n=200)
# Creating MODNetModel
model = MODNetModel(target_hierarchy,
weights,
num_neurons=[[256],[64,64],[32]],
)
model.fit(data)
# Predicting on unlabeled data
data_to_predict = MODData(new_structures)
data_to_predict.featurize()
data_to_predict.df_featurized.append({"T": new temperatures})
df_predictions = model.predict(data_to_predict) # returns dataframe containing the prediction on new_structures modified from Getting Started I haven't tried this yet, but if it seems reasonable I will probably give it a go later today. |
For solution (1), yes the idea would be to have one target per temperature, like the thermodynamical data notebook. # Creating MODNetModel
model = MODNetModel([[["S_5K","S_300K","S_500K"]]],
{"S_5K":1,"S_300K":1,"S_500K":1},
num_neurons=[[256],[64],[64],[32]],
) With a few limitations : implicit, fixed temperatures, should be available for each sample, slower to train I would indeed try what you suggested. |
Created in response to ppdebreuck#69. No license was given in the original repository. See ziyan1996/VickersHardnessPrediction#1
It took some time, but I got it figured out and made an example notebook (see the PR above) |
Cool! Indeed, option (1) was infeasible here. Thanks for this addition. A simple hyper opt might be worth adding as example: from modnet.hyper_opt import FitGenetic
ga = FitGenetic(train)
model = ga.run(refit=0, nested=0, size_pop=10, num_generations=3, n_jobs=20)
# size_pop, num_generations and n_jobs can be increased if computational power available which avoids dealing with the model setup (num neurons etc.), around 5 mins to run and lowers MAE to +/- 2.2. Btw, any benchmarking results available on this dataset ? |
@ppdebreuck thanks! Can you use both As for benchmarking, in VickersHardnessPrediction/hv_predictions.py they split data according to: train_test_split(train_size=0.9, test_size=0.1, random_state=100, shuffle=True) They use XGBoost with recursive feature elimination (RFE) on physical descriptors. In the paper, they report an MSE of Btw looks like pip install git+https://github.com/ppdebreuck/modnet@master I know that MEGNet is geared towards state variables, but MEGNet only takes structures as inputs, not compositions. It can be paired with something like BOWSR, but I'd only imagine that working for single-phase structures (i.e. sort of a non-sensical physical representation if alloys are involved). |
Thanks for the info! Yep, we need to clean things a bit up and make a new release on pypi when we find time :p |
Ah, gotcha. Thank you! |
Created in response to #69. No license was given in the original repository. See ziyan1996/VickersHardnessPrediction#1
I was looking through the docs and example notebooks and didn't see where this information might fit in with the typical pipelines, but maybe I missed something.
The text was updated successfully, but these errors were encountered: