Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several features are not present in the DictVectorizer after training #16

Open
MackieBlackburn opened this issue Aug 16, 2017 · 2 comments

Comments

@MackieBlackburn
Copy link
Collaborator

MackieBlackburn commented Aug 16, 2017

I found that after training on 98 freki files the DictVectorizer used by models.py doesn't contain the following features:

L-LMm
G-overlap
W-prevclass

This means that these features are not used in training or testing and this may in part account for the low performance. Currently I'm not sure why these features are not making it into the vectors.

@elirnm
Copy link
Contributor

elirnm commented Aug 16, 2017

The L-CR-LMc and L-LMc features are making it into the vectors on my machine.

L-CR-LMw isn't but that might be because nothing reaches the threshold. This is just running it on the sample file in github, not on a big set.

G-overlap doesn't work because the g_features method isn't implemented yet. I could do that, but looking at the description of G-overlap I'm not really sure what it's supposed to be measuring.

L-LMm isn't implemented yet. I can do that.

I don't think W-prevclass is implemented.

@goodmami
Copy link
Member

As @elirnm said, g-overlap was not implemented. It should be the same as the L-LMw (as I understand it), but with the model (and features) taken from the gloss line rather than the language line.

L-LMm isn't implemented yet (see #9).

Neither is W-prevclass, but it is different. W-prevclass uses the predicted value of the previous instance, so it's an online feature (i.e., you can't determine it until you've classified the one before it), but during training you can use the gold label of the previous instance, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants