You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have detected that the smiles included in the lsc-pcqm4mv2: 'O[Si]123O[Si]3(O1)(O2)O' (position 51128) in the dataset is an invalid smiles string according to the newest versions of RDKit (concretely, version 2024.09.5). This error can be reproduced in the following way:
from ogb.lsc import PygPCQM4Mv2Dataset
from ogb.utils import smiles2graph
def debug_smiles2graph(smiles_string):
try:
return smiles2graph(smiles_string)
except Exception as e:
print(f"Exception occurred in smiles: {smiles_string}")
print(e)
raise e
mol_ds = PygPCQM4Mv2Dataset(root='../data/pcqm4mv2_invariants', smiles2graph=debug_smiles2graph)
As you can observe when executing, the previous snippet produces the following output:
Processing...
Converting SMILES strings into graphs...
1%|▏ | 50930/3746620 [00:18<21:40, 2842.28it/s][11:26:20] Explicit valence for atom # 1 Si, 5, is greater than permitted
1%|▏ | 51128/3746620 [00:18<22:08, 2781.28it/s]
Exception occurred in smiles: O[Si]123O[Si]3(O1)(O2)O
'NoneType' object has no attribute 'GetAtoms'
If we now try to convert this smiles using RDKit without the ogb package:
from rdkit import Chem
problematic_smiles = 'O[Si]123O[Si]3(O1)(O2)O'
mol_generated = Chem.MolFromSmiles(problematic_smiles)
print(mol_generated is None)
we get a True output, making the smiles string invalid.
The text was updated successfully, but these errors were encountered:
Dear OGB team,
I have detected that the smiles included in the lsc-pcqm4mv2: 'O[Si]123O[Si]3(O1)(O2)O' (position 51128) in the dataset is an invalid smiles string according to the newest versions of RDKit (concretely, version 2024.09.5). This error can be reproduced in the following way:
As you can observe when executing, the previous snippet produces the following output:
If we now try to convert this smiles using RDKit without the ogb package:
we get a
True
output, making the smiles string invalid.The text was updated successfully, but these errors were encountered: