Mutating labels when training MulticlassLDA #205

grero · 2022-10-07T04:12:50Z

This recent change really threw a wrench into my pipeline:

Line 522 in bf15ed0

idxs = toindices(y)

I am training LDAs on one set of trials and testing the decoding performance on a separate set of trials. All of a sudden, my performance dropped to chance and after about a day of digging around, I realised that toindices actually mutates the label names. In other words, when I was decoding the testset by finding the projected mean that each sample was closest to, I was using the original labels for my testing, and so the class assignments were all essentially random.

As a stopgap measure for my pipeline, I defined

MultivariateStats.toindices(label::AbstractVector{T}) where T <: Integer = label

which fixed my issue, but I realise that this is not general solution. In particular, if there are gaps in label, such that maximum(label) !== length(unique(label)), this could also cause problems.
Is there currently an array type that fulfils that criteria?

The text was updated successfully, but these errors were encountered:

wildart · 2022-10-07T17:32:18Z

I see. That was a problem with previous implementation. The labels and indices were conflated which caused bounds errors if labels weren't properly defined, #187. It looks like the design problem because the LDA model doesn't carry any explicit information about labels. Class centroids relate to an index of a label rather than the label itself. You can use toindices to get a map of labels to indices and use this map get correct class centroid and weight data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutating labels when training MulticlassLDA #205

Mutating labels when training MulticlassLDA #205

grero commented Oct 7, 2022

wildart commented Oct 7, 2022

Mutating labels when training MulticlassLDA #205

Mutating labels when training MulticlassLDA #205

Comments

grero commented Oct 7, 2022

wildart commented Oct 7, 2022