Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCQM4Mv2 invalid SMILES #498

Open
rballeba opened this issue Mar 5, 2025 · 0 comments
Open

PCQM4Mv2 invalid SMILES #498

rballeba opened this issue Mar 5, 2025 · 0 comments

Comments

@rballeba
Copy link

rballeba commented Mar 5, 2025

Dear OGB team,

I have detected that the smiles included in the lsc-pcqm4mv2: 'O[Si]123O[Si]3(O1)(O2)O' (position 51128) in the dataset is an invalid smiles string according to the newest versions of RDKit (concretely, version 2024.09.5). This error can be reproduced in the following way:

from ogb.lsc import PygPCQM4Mv2Dataset
from ogb.utils import smiles2graph

def debug_smiles2graph(smiles_string):
    try:
        return smiles2graph(smiles_string)
    except Exception as e:
        print(f"Exception occurred in smiles: {smiles_string}")
        print(e)
        raise e

mol_ds = PygPCQM4Mv2Dataset(root='../data/pcqm4mv2_invariants', smiles2graph=debug_smiles2graph)

As you can observe when executing, the previous snippet produces the following output:

Processing...
Converting SMILES strings into graphs...
  1%|▏         | 50930/3746620 [00:18<21:40, 2842.28it/s][11:26:20] Explicit valence for atom # 1 Si, 5, is greater than permitted
  1%|▏         | 51128/3746620 [00:18<22:08, 2781.28it/s]
Exception occurred in smiles: O[Si]123O[Si]3(O1)(O2)O
'NoneType' object has no attribute 'GetAtoms'

If we now try to convert this smiles using RDKit without the ogb package:

from rdkit import Chem
problematic_smiles = 'O[Si]123O[Si]3(O1)(O2)O'
mol_generated = Chem.MolFromSmiles(problematic_smiles)
print(mol_generated is None)

we get a True output, making the smiles string invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant