-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taxon-term table option to switch valid/satisfiable criteria; emulate gain/loss of a term for taxa #81
Comments
For PAINT, we really need something that treats a taxon ID as representing a species, rather than a group of species. For living species like humans and Drosophila, these are the same thing. But for a taxonomy group, like vertebrates, we need it to represent the common ancestor species of all living vertebrates. We want to interpret the taxon constraint as indicating whether that term is valid in a given ancestral species, not a group of living species. I'm not sure how your code works, Jim, but I'm assuming it might do something like this. Following the example above, the constraint is only_in_taxon vertebrata (corresponding to Euteleostomi in PANTHER). To see if the term is valid for human, I assume you check to see if human is a member/subclass of the NCBITaxonomy group vertebrata. If so, you could do the same thing for ancestral species (other taxa). For example, if you wanted to check Eutheria, you'd check to see if Eutheria in NCBITaxonomy is a subclass of vertebrata. If so, the term would be valid according to that particular constraint. To do this, we'd need to have Taxonomy ID's for all the ancestral species nodes in PANTHER. If we don't have that, we can make it. |
@thomaspd Thankfully, we do have NCBITaxon IDs mapped to most of the PANTHER species (256 out of 288) including for the ancestral species. These NCBITaxon IDs get passed into the gaferencer tool to make the taxon-term table. Just double-checked and Bilateria, Eutheria, and Euteleostomi NCBITaxon IDs are all mapped. So the ancestor-to-ancestor checks should be using the NCBITaxonomy class hierarchy. |
@thomaspd Yep, it just clicked for me! Considering this as the gain and loss of functions, it wouldn't make sense to say that the Bilateria ancestor species itself (as opposed to its set of descendant species) had inflammatory response (GO:0006954) just because its descendant Vertabrata gained it later. So this should be '0' for Bilateria and "partial" wouldn't make sense either. |
Tagging self on. |
@balhoff I believe this line here contains the meat:
We could pass an option/flag into gaferencer to use an altered version of this. Though I don't yet know the syntax to express what we really need. We might need to talk it through on a call. |
We discussed changing the table format:
|
That scenario (above) will match the output of the taxon-constraints Protégé plugin. |
For our PAINT purposes, I'll add an extra step in the post-processing of this table, which we handle here, that fills in the blank cells. This final, complete taxon-term table will then be what is consumed by the PAINT tool. In the Bilateria inflammatory response example, this taxon-term cell will be blank coming out of the new gaferencer. My post-step logic would assign it For other terms that have a
@thomaspd @huaiyumi What should the default value be if a term doesn't have any taxon constraints defined? Should the PAINT tool allow curators to make annotations to these terms? If yes, then the default should be |
Answer from @thomaspd and @huaiyumi is that yes, terms not having any constraints in the I'll make a ticket in that repo for implementing the change. |
Currently,
gaferencer taxa
command generates a taxon-term table that marks whether a term is "valid" for a taxon, given the taxon constraints present in the ontology. Valid is denoted1
otherwise0
is used for not valid. But we might need to expand the values possible and maybe define what we mean by "valid"?The use case for this issue is looking at the taxon-term row for GO:0006954 (inflammatory response) and finding that the column for taxon Bilateria (NCBITaxon:33213) is marked valid (
1
), despite the taxon constraint ofGO:0006954 only_in_taxon Vertebrata
. With Bilateria being an ancestor of Vertebrata (NCBITaxon:7742), one would think this should be marked invalid0
since Bilateria is not "underneath" Vertebrata on the species tree:On this tree (taken from PANTHER), Vertebrata is roughly equivalent to Euteleostomi (NCBITaxon:117571). I also marked the current taxon-term value of GO:0006954 for most species. A green check for '1' or a red X for '0', I stopped after several and just drew that green line, which denotes every species below it is valid for GO:0006954.
So, what does "valid" mean? Given the above results, it looks like it means the term can be annotated to some taxon (e.g. Bilateria descendant taxon Vertebrata can use GO:0006954). But for PAINT annotation purposes (the consumer of this awesome table) we really need this "valid" to mean the term can be annotated to all taxon (e.g. GO:0006954 can be in all Vertebrata, so
1
, but not all Bilateria, so0
).@balhoff As we discussed, to solve this we could use a third value to denote "partial validity", a taxon with some but not all subtaxons valid for the term. Are we still thinking empty string or blank?
1
- Taxon and all descendant taxons are VALID for term""
- Taxon and only some descendant taxons are VALID for term0
- Taxon and all descendant taxons are INVALID for termIn our example, Vertebrata and its descendants would get
1
, all ancestors of Vertebrata would get""
, and all other taxons would get0
. The PAINT software would then only accept the1
's as valid.Another solution could be to still have only
1
or0
values but add a switch/arg to thegaferencer taxa
command that changes the criteria for1
to exclude the "partially valid" taxons (the""
category) and set those to0
instead.@balhoff Apologies if I just ruined our earlier discussion by adding to the confusion here.
The text was updated successfully, but these errors were encountered: