OMIM stands for "Online Mendelian Inheritance in Man", and is an online catalog of human genes and genetic disorders. The official site is: https://omim.org/
This purpose of this repository is for data transformations for ingest into Mondo. Mainly,
it is for generating an omim.ttl
and other release artefacts.
Disclaimer: This repository and its created data artefacts are unnofficial. For official, up-to-date OMIM data, please visit omim.org.
- Run:
cp .env.example .env
- Change the value of
API_KEY
to your own. If you don't have one, you can request one at https://omim.org/downloads. This will probably be sufficient for the purposes of downloading the necessary text files, but if not, you can also require access to the REST API as well: https://omim.org/api.
- RealPython blog install guide: My preferred guide for installing on Windows or Mac
- Python documentation for installing on Windows
- Python documentation for installing on Mac
- Run:
make install
- There is a known possible issue with dependency
psutil
on some systems. If you get an error related to this when installing, ignore it, as it is does not seem to be needed to run any of the tools. If however you do get apsutil
error when running anything, please let us know by creating an issue.
Run: sh run.sh make all
Running this will create new release artefacts in the root directory.
You can also run make build
or python -m omim2obo
. These are all the same
command. This will download files from omim.org and run the build.
Offline/cache option: python -m omim2obo --use-cache
If there's an issue downloading the files, or you are offline, or you just want
to use the cache anyway, you can pass the --use-cache
flag.
Details
Command: sh run.sh make get-pmids
Currently, the only feature is get_codes_by_yyyy_mm
, which returns a list of
OMIM codes and their prefixes from https://omim.org/statistics/update.
make scrape y=<YEAR> m=<MONTH>
make scrape y=<YEAR> m=<MONTH> > <path/to/outputFile>
- Get codes for May 2021, printed to terminal:
make scrape y=2021 m=5
- Get codes for May 2021 and output to a file "myfile.txt":
make scrape y=2021 m=5 > myfile.txt
Command:
make scrape y=2021 m=5
Response:
[('#', '619340'),
('#', '619355'),
('*', '619357'),
('*', '619358'),
('*', '619359'),
('#', '619325'),
('#', '619328'),
('*', '100850'),
...
('#', '613102')]
Using get_codes_by_yyyy_mm()
will return a list of tuples.
from omim2obo.omim_code_scraper import get_codes_by_yyyy_mm
code_tuples = get_codes_by_yyyy_mm('2021/05')
omim.ttl
: OMIM ontologizedomim.sssom.tsv
: SSSOM mapping filemondo-omim-genes.robot.tsv
: ROBOT template for adding OMIM genes to Mondoreview.tsv
: Special cases to consider for manual review
Notice: These are generated based on the latest downloadable data files from omim.org, updated daily, rather than what is seen on the omim.org/entry/MIM# pages. Note that the data files and the entry pages aren't always in sync, and that one or the other may be slightly more up-to or out-of date for a period of time.
Columns:
classCode
: integer: ID of review case classclassShortName
: string (camelCase): describing the review case classvalue
: any: Some form of data to reviewcomment
: string (optional)
This review case involves what would be otherwise considered a valid disease-gene relationship, but for the fact that it quite unusually includes 'digenic' in the label, even though it only had 1 association. OMIM doesn't have a guaranatee on the data quality of its disease-gene associations marked 'digenic', so for any of these entries, it could be the case that either (a) it is not 'digenic'; OMIM should remove that from the label, and Mondo can make an explicit exception to add the relationship, or could otherwise wait until OMIM fixes the issue and it will automatically be added, or (b) it is in fact 'digenic', and OMIM should add the missing 2nd gene association.