-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get SCITE Most-Likely Tree #141
Comments
The function to compare NxN trees is already implemented, see |
@pawel-czyz I just, realized we may want to complete SCITE at some point :) |
To get the maximum likelihood (ML) tree from the samples the simplest solution would be to just take:
However, this from the samples can be problematic: generally the samples don't contain the ML solution (this is especially evident in high-dimensional Euclidean spaces. For example, consider the normal distribution For discrete spaces this behavior may be somewhat different and I remember that Jack had some encouraging results comparing MCMC to optimization methods like hill climbing, but I don't remember the details... We can discuss it further, but my guess is that it's not that high priority right now, as we have seen evidence that the posterior can be multimodal, so SMC sampler should be a better choice than a single ML tree. (Another way would be to try to find different modes by plotting local ML maxima.) |
Yes - I think I understand the concern. - How is this done in the current version of SCITE then? - am interested to understand this? Absolutely, low priority. |
If I recall properly a conversation with Jack, they use simulated annealing for MAP estimation, but I'm not sure... |
Why would simulated annealing be better here? - Isn't that used when finding the optimal solution would take too long? |
The primary point here is that MCMC may never visit the MLE point (at least in continuous domains. For discrete I think the situation may be different), so optimization is preferable to find it. And one way of doing optimization is using simulated annealing, or hill climbing, or hill climbing + tempering, etc. I like simulated annealing mostly because it's working quite well even in multimodal problems. However, as we don't seem to have these issues, hill climbing (with multiple restarts) may be a viable strategy (or it's possible that MCMC will find MLE tree as well). |
Thank you for the context, but I still don't fully understand why the fact that the MLE point may not be visited is a direct argument for optimization vs. |
I remember a paper, but I can't find it. The first 15 minutes of this explain why typical set (explored MCMC) may not find MAP. Also, there's a simple experiment: try sampling from (All of the above holds in continuous spaces. In tree spaces the behaviour may be different.) |
Watched it :) Thank you - yes, I understand this argument or the typical set may not include the MAP. For a typical set in 2D: Is the idea then that |
SCITE is not complete yet. Currently, the final step to select the most likely (ML), i.e. the most frequent tree in the mcmc samples, is missing.$N^2$ comparisons. This should return a symmetric NxN boolean matrix. Here, one can simply calculate the row or column sums. The highest sum index corresponds to the highest likelihood tree.
To my understanding, this requires comparing all N trees in the mcmc with all N trees,
Is there a smarter way ?
At best this is encapsulated in a function taking the mcmc samples, returning the ML tree along with the log prob.
Later one may devise a snakemake rule to add this to mcmc workflows and also calculate distances as well and plot.
This is not required for any of the currently devised experiments but should be added at some. point for a full version of SCITE!
The text was updated successfully, but these errors were encountered: