-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nonprojective dependency entropy #103
base: master
Are you sure you want to change the base?
Conversation
Thanks! Mind fixing the small lint error? Think you need to document the return statement. Also, I am a bit worried about links to colab. Could you just put the derivation inline? It's not that long and I like the idea of having it. .. math:: I also realize we don't have any non-projective tests. I will try to add them to just verify. Finally, I wonder if Ran knows it is this easy? Although maybe his case is more general. |
Sure thing, I've replaced the colab link with the derivation and fixed the lint error (at least when I run it locally and drop the "tests" dir from the command). I'm not sure if he knows -- I can reach out. Generally any expectations for additively factorable functions (what they call first order expectations) will work with this method I think. For arc-factored crfs this also applies to entropy since logp factors additively. Btw, I just tried to test out if this was faster for projective dependencies also but am running into a torch error when computing the partition function. I'll open a separate issue for it. |
Gotcha, I'll try summoning @timvieira. Curious why people avoid creating the marginals explicitly. |
I just replied to the issue in our repo. As far as avoiding materialization goes, the main reason to avoiding materializing marginals (especially higher order marginals!) Is that you might be materializing a quantity that you don't actually need. That can increase your space and time complexity in some cases. In the case of second-order expectations, we got a huge benefit from not materializing the second-order marginals. We also benefitted from additional folding (distributive rule). In the first-order case, the benefit is less clear. Because the time and space complexity is already dominated by other things. |
Ah that's a great answer, thanks. Kind of suspected that was what was going on. Torch-Struct always assumes full materialization (except for CKY). Maybe we should have other version. CC @justinchiu as well. |
There are also some special cases where the number of hyperedges is much larger than the number of features which can mean that materialization is really inefficient for memory and to some extent time because of constant factors (looking up a value in memory can be slower than computing it from scratch). Most hypergraph algorithms only require storing values at the nodes and an iterator over edges (which can delete them). Even the storing all nodes isn't really necessary (e.g., https://timvieira.github.io/blog/post/2016/10/01/reversing-a-sequence-with-sublinear-space/). |
Right... I know you could do it this way, but most of the time we materialize all the edges anyway for the potentials. However, for approaches like HMMs this is clearly unnecessary. Hmm, maybe I will think a bit about GPU versions of the iterator approach. A lot of my problems are memory constrained these days. |
torch_struct/distributions.py
Outdated
Returns: | ||
entropy (*batch_shape) | ||
""" | ||
logZ = self.partition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinchiu points out that Instead of grouping this with non-proj we should just have it be the default entropy function for all of the models to materialize the marginals. (nothing non-projective specific).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a general property of all exponential family / sum-product models. See also this Twitter discussion https://twitter.com/RanZmigrod/status/1300832956701970434?s=20
@teffland with https://github.com/harvardnlp/pytorch-struct/pull/103/commits things should now be fixed in torch 1.9. Can you update this PR to just change the base Entropy (and possibly cross-ent and KL) to use your pairwise version. While the semiring implementations are cool, I think my forward implementations are just not going to utilize them properly, and this PR is the better way to compute decomposable expectations in practice. Independently, we're going to try to think about how to implement more memory efficient forward functions for applications where even pairwise memory is too much. Small thing: please update your editor so it doesn't make whitespace changes. The codebase follows |
@srush Awesome, thanks for the fix. Sure, I can make these changes for the entropy, xent, and kl calculations. A quick question: is there a reason we call Sorry for the whitespacing issue--my editor is also using |
…into nonprojective_entropy
Hi Sasha,
Wanted to contribute an efficient implementation of entropy for non-projective trees I'm using for a current project.
Please let me know of any changes you'd like to see.
Thanks,
Tom