Nonprojective dependency entropy #103

teffland · 2021-08-30T17:11:17Z

Hi Sasha,

Wanted to contribute an efficient implementation of entropy for non-projective trees I'm using for a current project.

Please let me know of any changes you'd like to see.

Thanks,
Tom

srush · 2021-09-01T11:55:28Z

Thanks! Mind fixing the small lint error? Think you need to document the return statement.

Also, I am a bit worried about links to colab. Could you just put the derivation inline? It's not that long and I like the idea of having it.

.. math::
{{derivation}}

I also realize we don't have any non-projective tests. I will try to add them to just verify.

Finally, I wonder if Ran knows it is this easy? Although maybe his case is more general.

teffland · 2021-09-01T18:17:23Z

Sure thing, I've replaced the colab link with the derivation and fixed the lint error (at least when I run it locally and drop the "tests" dir from the command).

I'm not sure if he knows -- I can reach out. Generally any expectations for additively factorable functions (what they call first order expectations) will work with this method I think. For arc-factored crfs this also applies to entropy since logp factors additively.

Btw, I just tried to test out if this was faster for projective dependencies also but am running into a torch error when computing the partition function. I'll open a separate issue for it.

srush · 2021-09-03T13:18:42Z

Gotcha, I'll try summoning @timvieira. Curious why people avoid creating the marginals explicitly.

timvieira · 2021-09-03T14:35:36Z

I just replied to the issue in our repo.

As far as avoiding materialization goes, the main reason to avoiding materializing marginals (especially higher order marginals!) Is that you might be materializing a quantity that you don't actually need. That can increase your space and time complexity in some cases.

In the case of second-order expectations, we got a huge benefit from not materializing the second-order marginals. We also benefitted from additional folding (distributive rule).

In the first-order case, the benefit is less clear. Because the time and space complexity is already dominated by other things.

srush · 2021-09-03T14:43:20Z

Ah that's a great answer, thanks. Kind of suspected that was what was going on. Torch-Struct always assumes full materialization (except for CKY). Maybe we should have other version.

CC @justinchiu as well.

timvieira · 2021-09-03T14:55:46Z

There are also some special cases where the number of hyperedges is much larger than the number of features which can mean that materialization is really inefficient for memory and to some extent time because of constant factors (looking up a value in memory can be slower than computing it from scratch). Most hypergraph algorithms only require storing values at the nodes and an iterator over edges (which can delete them). Even the storing all nodes isn't really necessary (e.g., https://timvieira.github.io/blog/post/2016/10/01/reversing-a-sequence-with-sublinear-space/).

srush · 2021-09-03T19:13:52Z

Right... I know you could do it this way, but most of the time we materialize all the edges anyway for the potentials. However, for approaches like HMMs this is clearly unnecessary. Hmm, maybe I will think a bit about GPU versions of the iterator approach. A lot of my problems are memory constrained these days.

srush · 2021-09-03T21:07:41Z

torch_struct/distributions.py

+        Returns:
+            entropy (*batch_shape)
+        """
+        logZ = self.partition


@justinchiu points out that Instead of grouping this with non-proj we should just have it be the default entropy function for all of the models to materialize the marginals. (nothing non-projective specific).

Yes, it's a general property of all exponential family / sum-product models. See also this Twitter discussion https://twitter.com/RanZmigrod/status/1300832956701970434?s=20

srush · 2021-09-12T23:20:10Z

@teffland with https://github.com/harvardnlp/pytorch-struct/pull/103/commits things should now be fixed in torch 1.9.

Can you update this PR to just change the base Entropy (and possibly cross-ent and KL) to use your pairwise version. While the semiring implementations are cool, I think my forward implementations are just not going to utilize them properly, and this PR is the better way to compute decomposable expectations in practice. Independently, we're going to try to think about how to implement more memory efficient forward functions for applications where even pairwise memory is too much.

Small thing: please update your editor so it doesn't make whitespace changes. The codebase follows black.

teffland · 2021-09-13T13:59:31Z

@srush Awesome, thanks for the fix. Sure, I can make these changes for the entropy, xent, and kl calculations.

A quick question: is there a reason we call self.logpartition() when calculating the marginals in the _Struct class instead of using the potentially cached self.partition in StructDistribution? Could be a place to save a forward pass...

Sorry for the whitespacing issue--my editor is also using black but with --line-length 119 for my other projects. Will make sure to fix that for commits to this repo.

…into nonprojective_entropy

Ubuntu added 2 commits August 30, 2021 16:16

add nonprojective entropy implementation

609c233

add nonprojective entropy implementation

d1d19f9

update docstring

cad728f

teffland mentioned this pull request Sep 1, 2021

On the first order implementation rycolab/tree_expectations#2

Open

Update distributions.py

b3ea558

srush reviewed Sep 3, 2021

View reviewed changes

Ubuntu added 5 commits September 14, 2021 15:37

refactor central entropy, xent, kl using marginals instead of semiring

3e17a50

:Merge branch 'master' of https://github.com/harvardnlp/pytorch-struct …

ba1f502

…into nonprojective_entropy

refactor central entropy, xent, kl using marginals instead of semiring

69c0db9

attempt to fix docstr lint error and derivation

d3fa0d6

attempt to fix docstr lint error and derivation

e2cdc41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nonprojective dependency entropy #103

Nonprojective dependency entropy #103

teffland commented Aug 30, 2021

srush commented Sep 1, 2021

teffland commented Sep 1, 2021

srush commented Sep 3, 2021

timvieira commented Sep 3, 2021

srush commented Sep 3, 2021

timvieira commented Sep 3, 2021

srush commented Sep 3, 2021

srush Sep 3, 2021

timvieira Sep 3, 2021

srush commented Sep 12, 2021 •

edited

Loading

teffland commented Sep 13, 2021

Nonprojective dependency entropy #103

Are you sure you want to change the base?

Nonprojective dependency entropy #103

Conversation

teffland commented Aug 30, 2021

srush commented Sep 1, 2021

teffland commented Sep 1, 2021

srush commented Sep 3, 2021

timvieira commented Sep 3, 2021

srush commented Sep 3, 2021

timvieira commented Sep 3, 2021

srush commented Sep 3, 2021

srush Sep 3, 2021

Choose a reason for hiding this comment

timvieira Sep 3, 2021

Choose a reason for hiding this comment

srush commented Sep 12, 2021 • edited Loading

teffland commented Sep 13, 2021

srush commented Sep 12, 2021 •

edited

Loading