Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nonprojective dependency entropy #103

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

teffland
Copy link

Hi Sasha,

Wanted to contribute an efficient implementation of entropy for non-projective trees I'm using for a current project.

Please let me know of any changes you'd like to see.

Thanks,
Tom

@srush
Copy link
Collaborator

srush commented Sep 1, 2021

Thanks! Mind fixing the small lint error? Think you need to document the return statement.

Also, I am a bit worried about links to colab. Could you just put the derivation inline? It's not that long and I like the idea of having it.

.. math::
{{derivation}}

I also realize we don't have any non-projective tests. I will try to add them to just verify.

Finally, I wonder if Ran knows it is this easy? Although maybe his case is more general.

@teffland
Copy link
Author

teffland commented Sep 1, 2021

Sure thing, I've replaced the colab link with the derivation and fixed the lint error (at least when I run it locally and drop the "tests" dir from the command).

I'm not sure if he knows -- I can reach out. Generally any expectations for additively factorable functions (what they call first order expectations) will work with this method I think. For arc-factored crfs this also applies to entropy since logp factors additively.

Btw, I just tried to test out if this was faster for projective dependencies also but am running into a torch error when computing the partition function. I'll open a separate issue for it.

@srush
Copy link
Collaborator

srush commented Sep 3, 2021

Gotcha, I'll try summoning @timvieira. Curious why people avoid creating the marginals explicitly.

@timvieira
Copy link

I just replied to the issue in our repo.

As far as avoiding materialization goes, the main reason to avoiding materializing marginals (especially higher order marginals!) Is that you might be materializing a quantity that you don't actually need. That can increase your space and time complexity in some cases.

In the case of second-order expectations, we got a huge benefit from not materializing the second-order marginals. We also benefitted from additional folding (distributive rule).

In the first-order case, the benefit is less clear. Because the time and space complexity is already dominated by other things.

@srush
Copy link
Collaborator

srush commented Sep 3, 2021

Ah that's a great answer, thanks. Kind of suspected that was what was going on. Torch-Struct always assumes full materialization (except for CKY). Maybe we should have other version.

CC @justinchiu as well.

@timvieira
Copy link

There are also some special cases where the number of hyperedges is much larger than the number of features which can mean that materialization is really inefficient for memory and to some extent time because of constant factors (looking up a value in memory can be slower than computing it from scratch). Most hypergraph algorithms only require storing values at the nodes and an iterator over edges (which can delete them). Even the storing all nodes isn't really necessary (e.g., https://timvieira.github.io/blog/post/2016/10/01/reversing-a-sequence-with-sublinear-space/).

@srush
Copy link
Collaborator

srush commented Sep 3, 2021

Right... I know you could do it this way, but most of the time we materialize all the edges anyway for the potentials. However, for approaches like HMMs this is clearly unnecessary. Hmm, maybe I will think a bit about GPU versions of the iterator approach. A lot of my problems are memory constrained these days.

Returns:
entropy (*batch_shape)
"""
logZ = self.partition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinchiu points out that Instead of grouping this with non-proj we should just have it be the default entropy function for all of the models to materialize the marginals. (nothing non-projective specific).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a general property of all exponential family / sum-product models. See also this Twitter discussion https://twitter.com/RanZmigrod/status/1300832956701970434?s=20

@srush
Copy link
Collaborator

srush commented Sep 12, 2021

@teffland with https://github.com/harvardnlp/pytorch-struct/pull/103/commits things should now be fixed in torch 1.9.

Can you update this PR to just change the base Entropy (and possibly cross-ent and KL) to use your pairwise version. While the semiring implementations are cool, I think my forward implementations are just not going to utilize them properly, and this PR is the better way to compute decomposable expectations in practice. Independently, we're going to try to think about how to implement more memory efficient forward functions for applications where even pairwise memory is too much.

Small thing: please update your editor so it doesn't make whitespace changes. The codebase follows black.

@teffland
Copy link
Author

@srush Awesome, thanks for the fix. Sure, I can make these changes for the entropy, xent, and kl calculations.

A quick question: is there a reason we call self.logpartition() when calculating the marginals in the _Struct class instead of using the potentially cached self.partition in StructDistribution? Could be a place to save a forward pass...

Sorry for the whitespacing issue--my editor is also using black but with --line-length 119 for my other projects. Will make sure to fix that for commits to this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants