You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that after training on 98 freki files the DictVectorizer used by models.py doesn't contain the following features:
L-LMm
G-overlap
W-prevclass
This means that these features are not used in training or testing and this may in part account for the low performance. Currently I'm not sure why these features are not making it into the vectors.
The text was updated successfully, but these errors were encountered:
The L-CR-LMc and L-LMc features are making it into the vectors on my machine.
L-CR-LMw isn't but that might be because nothing reaches the threshold. This is just running it on the sample file in github, not on a big set.
G-overlap doesn't work because the g_features method isn't implemented yet. I could do that, but looking at the description of G-overlap I'm not really sure what it's supposed to be measuring.
As @elirnm said, g-overlap was not implemented. It should be the same as the L-LMw (as I understand it), but with the model (and features) taken from the gloss line rather than the language line.
Neither is W-prevclass, but it is different. W-prevclass uses the predicted value of the previous instance, so it's an online feature (i.e., you can't determine it until you've classified the one before it), but during training you can use the gold label of the previous instance, I think.
I found that after training on 98 freki files the DictVectorizer used by models.py doesn't contain the following features:
L-LMm
G-overlap
W-prevclass
This means that these features are not used in training or testing and this may in part account for the low performance. Currently I'm not sure why these features are not making it into the vectors.
The text was updated successfully, but these errors were encountered: