Several features are not present in the DictVectorizer after training #16

MackieBlackburn · 2017-08-16T18:41:47Z

I found that after training on 98 freki files the DictVectorizer used by models.py doesn't contain the following features:

L-LMm
G-overlap
W-prevclass

This means that these features are not used in training or testing and this may in part account for the low performance. Currently I'm not sure why these features are not making it into the vectors.

elirnm · 2017-08-16T23:32:11Z

The L-CR-LMc and L-LMc features are making it into the vectors on my machine.

L-CR-LMw isn't but that might be because nothing reaches the threshold. This is just running it on the sample file in github, not on a big set.

G-overlap doesn't work because the g_features method isn't implemented yet. I could do that, but looking at the description of G-overlap I'm not really sure what it's supposed to be measuring.

L-LMm isn't implemented yet. I can do that.

I don't think W-prevclass is implemented.

goodmami · 2017-08-17T00:43:12Z

As @elirnm said, g-overlap was not implemented. It should be the same as the L-LMw (as I understand it), but with the model (and features) taken from the gloss line rather than the language line.

L-LMm isn't implemented yet (see #9).

Neither is W-prevclass, but it is different. W-prevclass uses the predicted value of the previous instance, so it's an online feature (i.e., you can't determine it until you've classified the one before it), but during training you can use the gold label of the previous instance, I think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several features are not present in the DictVectorizer after training #16

Several features are not present in the DictVectorizer after training #16

MackieBlackburn commented Aug 16, 2017 •

edited

Loading

elirnm commented Aug 16, 2017 •

edited

Loading

goodmami commented Aug 17, 2017

Several features are not present in the DictVectorizer after training #16

Several features are not present in the DictVectorizer after training #16

Comments

MackieBlackburn commented Aug 16, 2017 • edited Loading

elirnm commented Aug 16, 2017 • edited Loading

goodmami commented Aug 17, 2017

MackieBlackburn commented Aug 16, 2017 •

edited

Loading

elirnm commented Aug 16, 2017 •

edited

Loading