Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save atom hybridization #408

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Save atom hybridization #408

wants to merge 8 commits into from

Conversation

jthorton
Copy link
Contributor

Fixes #407 by saving the atom hybridization in the to_dict method and uses it in the from_dict method. This should now cause an error when loading molecules which were serialised using the old to_dict method though so we might need to think about how to support backwards compatibility with old serialised objects.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks!

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we'll need to do a migration I think.

I.e. what if you deserialize and there's only 7 entries in atom?

gufe/components/smallmoleculecomponent.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 97.05882% with 2 lines in your changes missing coverage. Please review.

Project coverage is 98.66%. Comparing base (e535c95) to head (1e40b08).

Files with missing lines Patch % Lines
gufe/settings/models.py 75.00% 1 Missing ⚠️
gufe/tokenization.py 91.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #408   +/-   ##
=======================================
  Coverage   98.66%   98.66%           
=======================================
  Files          36       36           
  Lines        2097     2103    +6     
=======================================
+ Hits         2069     2075    +6     
  Misses         28       28           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

# Conflicts:
#	gufe/components/smallmoleculecomponent.py
@jthorton
Copy link
Contributor Author

pre-commit.ci autofix

@jthorton
Copy link
Contributor Author

Oh dear looks like we need someone to run the lint in a different PR first!

@IAlibay
Copy link
Member

IAlibay commented Nov 25, 2024

Oh dear looks like we need someone to run the lint in a different PR first!

Yeah I think Mike is doing that 😅

@jthorton
Copy link
Contributor Author

Yeah it looks like it was done in #421 but something might have gone wrong it looks like pre-commit is failing on main and it was to change a lot of files here, tagging @mikemhenry.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question on what happens when you don't have the property set and you try to write.

@@ -223,6 +237,7 @@ def _to_dict(self) -> dict:
_ATOMCHIRAL_TO_INT[atom.GetChiralTag()],
atom.GetAtomMapNum(),
atom.GetPropsAsDict(includePrivate=False),
_HYBRIDIZATION_TO_INT[atom.GetHybridization()],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in this case if this isn't set?

I.e. if you read and old file (which now skips the hybridization reading), then write it out again - what does this pick up?

Comment on lines +356 to +370
For backwards compatibility make sure we can create an SMC with missing hybridization info.
"""
phenol_dict = phenol.to_dict()
new_atoms = []
for atom in phenol_dict["atoms"]:
# remove the hybridization atomic info which should be at index 7
new_atoms.append(tuple([atom_info for i, atom_info in enumerate(atom) if i != 7]))
phenol_dict["atoms"] = new_atoms
new_phenol = SmallMoleculeComponent.from_dict(phenol_dict)
# they should be different objects due to the missing hybridization info
assert new_phenol != phenol
# make sure the rdkit objects are different
for atom_hybrid, atom_no_hybrid in zip(phenol.to_rdkit().GetAtoms(), new_phenol.to_rdkit().GetAtoms()):
assert atom_hybrid.GetHybridization() != atom_no_hybrid.GetHybridization()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IAlibay this test covers loading an old SMC with the missing info it looks like Rdkit defaults to setting the hybridization as undefined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks - in that case we should let users know they've hit a bug (just so that they don't end up silently dealing with this). Could you add a warning please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SmallMoleculeComponent round trip does not save atom hybridization
2 participants