Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obsolete MeSH supplementary concepts as xrefs #9750

Closed
bgyori opened this issue Apr 29, 2023 · 21 comments
Closed

Obsolete MeSH supplementary concepts as xrefs #9750

bgyori opened this issue Apr 29, 2023 · 21 comments

Comments

@bgyori
Copy link

bgyori commented Apr 29, 2023

I found that a number of HP terms refer to obsolete MeSH supplementary concepts as xrefs. Here is the complete list:

HPO ID HPO name MeSH ID
HP:0000835 Adrenal hypoplasia C538429
HP:0001647 Bicuspid aortic valve C562388
HP:0001838 Rocker bottom foot C536345
HP:0002895 Papillary thyroid carcinoma C536915
HP:0004813 Post-transfusion thrombocytopenia C562868
HP:0006897 Abducens palsy C564661
HP:0010445 Primum atrial septal defect C548006
HP:0011540 Congenitally corrected transposition of the great arteries C535426
HP:0011675 Arrhythmia C562490
HP:0011743 Adrenal gland agenesis C538429
HP:0012108 Open angle glaucoma C562750
HP:0030078 Lung adenocarcinoma C538231
HP:0040198 Non-medullary thyroid carcinoma C536915
HP:0100001 Malignant mesothelioma C562839

Is there a preferred way to deal with these?

When searching for these terms in MeSH, there are usually close matches but it's not always trivially an exact match. As an example for "Papillary thyroid carcinoma", there is a MeSH term "Thyroid Cancer, Papillary" (https://meshb.nlm.nih.gov/record/ui?ui=D000077273), which sounds like a broader term but the MeSH definition "An ADENOCARCINOMA that originates from follicular cells of the THYROID GLAND..." would suggest the two are actually equivalent.

@pnrobinson
Copy link
Contributor

We are recommending to use the official UMLS mapping (https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/HPO/index.html)
@drseb @mellybelly @matentzn
We should probably remove xrefs from UMLS terminologies from the HPO file entirely and instead refer to the UMLS. First, it would be good to write a detailed tutorial of how to get the UMLS files, which is fairly complicated and will scare off many users.

@bgyori
Copy link
Author

bgyori commented Apr 29, 2023

I can see how referring to UMLS reduces redundancy and possible inconsistencies but I think that from a user perspective it creates complications since it is significantly more difficult to work with. So for what it's worth, I would much prefer having some xrefs (I am particularly interested in MeSH) directly available in HPO.

@cthoyt
Copy link
Contributor

cthoyt commented Apr 29, 2023

I agree that working with UMLS is not so convenient and it's nice to maintain the mappings inside HPO (from a user's perspective)

But also it seems like this issue is about MeSH, not UMLS

@cthoyt
Copy link
Contributor

cthoyt commented Jul 17, 2023

Update: we did a bit of an analysis on UMLS, MeSH, and HPO to see what value each adds. It turns out that there are non-redundant mappings from both HPO and UMLS that are valuable, and therefore it would be problematic to remove all of them wholesale

https://github.com/biopragmatics/semra/blob/main/notebooks/umls-inference-analysis.ipynb

@pnrobinson
Copy link
Contributor

I think the problem is that we do not have resources to support these mappings and so the XREFs in the hp.owl file are all about ten years old, and are by no means comprehensive. The UMLS team is doing this regularly and so they have by far the highest quality mappings. We are also working on a new SNOMED mapping that will live outside the hp file. We should delete the UMLS and SNOMED refs so that there is one source of truth.
The difficult thing is actually extracting the UMLS data and it would be great for us to write a tutorial (I do not know how to do it myself haha)

@matentzn
Copy link
Contributor

I know what to do here, but I will wait for the SNOMED mappings to trickle in, and then I will deal with everything at once.

I guess this is the key thing wrg to the UMLS overlap:

image

The UMLS mappings sound very few.. I am surprised there is so much difference between UMLS and HPO wrt Mesh.

@pnrobinson
Copy link
Contributor

@matentzn
whatever the overlap, we are simply not supporting this anymore == we should remove and refer people to UMLS and make a tutorial.

I think that it is better to put xrefs into separate files with SSOM, the OWL edit is not a good place to keep this information.

@matentzn
Copy link
Contributor

Yes, I agree @pnrobinson - I will make a coherent proposal that makes everyone happy when it comes!

@pnrobinson
Copy link
Contributor

@matentzn
Would it be possible to remove all of the HPO MeSH ids, move them to an external file, and then compare to UMLS and coordinate with @kanems ? @kanems would that be useful from an UMLS perspective? A long time age we did a manual mapping to MeSH. It is really hard for most of our users to use the UMLS mapping tools, and so maybe we can find a way of regularly extracting mappings, or is that a licence issue?

@kanems
Copy link

kanems commented Feb 13, 2024

If users of HPO want to know what MeSH IDs are equivalent to an HPO phenotype, that is within the scope of MedGen's subset of UMLS processing. Both are at the 'level 0' set of licensing rules, so I don't see any reason why HPO couldn't use our processed UMLS subset to refresh the HPO-MeSH mappings.
The MedGen data processing pulls in the data from UMLS following their 2x/year releases and uses that as our truth table for HPO-CUI relationships. We then generate reports in our FTP space that map CUIs in UMLS to other vocabulary IDs, this includes MeSH, HPO, OMIM, Mondo, OrphaNet and GARD IDs. This is the MedGenIDMappings.txt.gz file on FTP.
Not all HPO IDs will get a MeSH mapping, though.
1- If the HPO term is created new in between UMLS releases, we assign a temporary MedGen CUI (CN########) and then look for the CUI replacement in the next release.
2- Sometimes >1 HPO ID is mapped to a CUI (hence my persistence about those potentially redundant HPO records, I wanted to only push for UMLS to review their mappings for pairs where HPO was certain the terms were unique concepts), but MedGen will respect HPO's structure and keep those on different records in MedGen (so one gets a CUI and possibly MeSH equivalent, the other gets a CN CUI until UMLS changes something). This would mean that while UMLS may say 1 MeSH ID is equivalent to 2 HPO IDs, we would only report 1 as being equivalent.
3- Not all CUIs get a MeSH ID, so some HPO IDs are not going to match a MeSH concept.

Mondo team is already in process on using the IDMappings file to update Mondo-CUI mappings based on MedGen's processing and curation, perhaps that could be reconfigured/reworked to pull the HPO-CUI-MeSH mappings?

There is a whole different level of mapping, though, if the concern is what HPO terms describe disease entities in MeSH. We bring in and report the HPO-OMIM disease mappings, and thus if a MeSH ID is equivalent to a MIM number, that could be extracted from comparing a couple of MedGen reports but... unless that's the specific request here, I don't want to get into that much more complicated approach.

@matentzn
Copy link
Contributor

@kanems That is fantastic - I will just extend our pipeline than to support this! Thank you so much!

There is a whole different level of mapping, though, if the concern is what HPO terms describe disease entities in MeSH.

Yes, for sure. This is not what we are discussing there, and for this we would be using our HPOA files HPO->OMIM/ORDO->MONDO etc. This is not what this issue is all about.

Alright I will deal with this then! Thanks so much!

@matentzn
Copy link
Contributor

@pnrobinson
Copy link
Contributor

@matentzn @bgyori -- It looks as if the above issue has been closed. Can we also close this issue, and if not, what is still left to do?

@matentzn
Copy link
Contributor

matentzn commented May 8, 2024

How do you want those mappings to be redistributed? Shall we just add a file to the HPO repo? Update the Mesh xrefs in HPO? Both?

@pnrobinson
Copy link
Contributor

Ideally we would figure out how to create mappings from the UMLS resource and publish instructions on the website. I do not think there is a need for us to create an extra downloadable file.

@matentzn
Copy link
Contributor

matentzn commented May 8, 2024

That was the point of this ticket - we already have that file now! We get it through @kanems! The question is only now how we inform people about it..

There is no easy way to "tell people to get the information from UMLS" - it is always a bit painful..

@bgyori
Copy link
Author

bgyori commented May 8, 2024

@matentzn where is that file?

@matentzn
Copy link
Contributor

matentzn commented May 8, 2024

Here: https://github.com/monarch-initiative/medgen/releases/tag/2024-05-05

@pnrobinson
Copy link
Contributor

@matentzn the mapping is great. Can we add documentation to the HPO website about it?
Then I guess we should also remove the legacy mappings from the hp-edit.owl file also?

matentzn added a commit that referenced this issue Jul 1, 2024
The Mesh-HPO mappings are now outsourced to MedGen, see #9750.

They can be found here: https://obophenotype.github.io/human-phenotype-ontology/developers/mappings/
@matentzn
Copy link
Contributor

matentzn commented Jul 1, 2024

Documentation: https://obophenotype.github.io/human-phenotype-ontology/developers/mappings/
Removed MSH xrefs: #10605

@pnrobinson feel free to merge the above and close.

@pnrobinson
Copy link
Contributor

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants