Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce the h5 files of the zinc_properties example #25

Open
sozenoid opened this issue Mar 11, 2020 · 1 comment
Open

Cannot reproduce the h5 files of the zinc_properties example #25

sozenoid opened this issue Mar 11, 2020 · 1 comment

Comments

@sozenoid
Copy link

Hello,
I've been trying to reproduce the results of the zinc_properties provided in the default repositories ./chemical_vae/models/zinc_properties.

Basically I just cd to the zinc_properties directory and use
python3 -m chemvae.train_vae
for 120 epoch with the default exp.json file and end up with the three files zinc_decoder.h5, zinc_encoder.h5, zinc_prop_pred.h5.

Now if I try to use those files in the jupyter notebook /chemical_vae/examples/intro_to_chemvae.ipynb example, the "encode then decode test" as shown below does not work (cannot find back the original smiles encoded nor generate similar smiles using a noise of 5.0) though it all does work with the original h5 files.

# Using the VAE
## Decode/Encode 

smiles_1 = mu.canon_smiles('CSCC(=O)NNC(=O)c1c(C)oc(C)c1C')
# smiles_1 = mu.canon_smiles('Cc1cc2c(cc1S(=O)(=O)NC1CCC(C)CC1)OCCN2C')

X_1 = vae.smiles_to_hot(smiles_1,canonize_smiles=True)
z_1 = vae.encode(X_1)
X_r= vae.decode(z_1)

print('{:20s} : {}'.format('Input',smiles_1))
print('{:20s} : {}'.format('Reconstruction',vae.hot_to_smiles(X_r,strip=True)[0]))

print('{:20s} : {} with norm {:.3f}'.format('Z representation',z_1.shape, np.linalg.norm(z_1)))

Were the h5 files provided obtained using the .csv and .json files provided in the same zinc_properties github repository?

Thank you very much for your work, it is so interesting
Best Regards
Hugues

@sozenoid
Copy link
Author

Removing the "limit_data" field in the exp.json seems to go a long way to improve the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant