Unclear interaction between MixedModels & CategoricalArrays #606

DarioSarra · 2022-04-20T11:44:43Z

DarioSarra
Apr 20, 2022

I have a dataset (in the attachment) with a few categorical variables and I am interested in assessing how they affect the response variable (simple and interaction effects). For visualization purposes, I started using CategoricalArrays so that the outcome shows the level of the variable more interesting to my design. However, I found out to my surprise that the estimated coefficients, as well as the significance of the factors, change in a way that I can't explain in terms of a simple change in the order of visualization.

I imagine that there must be an explanation, but in the documentation, I couldn't find any direct mention of how a CategoricalArrays would change the contrast matrix or other aspects of the fitting.

Here is the code I used where ex_v1 is the dataset in the attachment:

using CSV, Downloads, DataFrames, MixedModels, CategoricalArrays

ex_v1 = CSV.read(Downloads.download("https://github.com/JuliaStats/MixedModels.jl/files/8521187/Example.csv"), DataFrame)

ex_v2 = copy(ex_v1)
for x in [:Protocol,:Wall,:Gen,:Stim] 
    ex_v2[!,x] = categorical(ex_v2[:,x]) 
end
levels!(ex_v2.Gen,["WT", "HET"]);

ex_v3 = copy(ex_v2);
levels!(ex_v3.Protocol,["90/90", "90/30","30/30"]);
isordered(ex_v3.Protocol)

form = @formula(Response ~ 1  + Protocol * Wall *Stim * Gen+(1+Protocol+Wall|SubjectID));

m_v1 = fit(MixedModel,form,ex_v1);
m_v2 = fit(MixedModel,form,ex_v2);
m_v3 = fit(MixedModel,form,ex_v3);

Example.csv

I would expect the 3 models to have the same results given that it is only a visualization reorganization (with a change in the coefficient sign where is appropriate). On the contrary, they are significantly different from each other and I can't find an explanation for it in the docs

Answered by palday

Apr 20, 2022

Thanks for providing an MWE. 😄

I suspect that this is mostly an issue of the weird ways that contrast coding plays out (especially since there are multiple interactions and a three-level variable), because all the deviances are quite similar and well within numerical tolerance:

julia> deviance(m_v1)
136481.16714620416

julia> deviance(m_v2)
136481.16714408842

julia> deviance(m_v3)
136481.16714728964

Setting your contrasts (including their reference level) explicitly is a good idea so that you remember to report them:

julia> contrasts = Dict(:Protocol => EffectsCoding(; base="90/90"), 
                        :Gen => EffectsCoding(; base="WT"),
                        :Stim => DummyCoding(…

View full answer

palday · 2022-04-20T13:45:04Z

palday
Apr 20, 2022
Maintainer

Thanks for providing an MWE. 😄

I suspect that this is mostly an issue of the weird ways that contrast coding plays out (especially since there are multiple interactions and a three-level variable), because all the deviances are quite similar and well within numerical tolerance:

julia> deviance(m_v1)
136481.16714620416

julia> deviance(m_v2)
136481.16714408842

julia> deviance(m_v3)
136481.16714728964

Setting your contrasts (including their reference level) explicitly is a good idea so that you remember to report them:

julia> contrasts = Dict(:Protocol => EffectsCoding(; base="90/90"), 
                        :Gen => EffectsCoding(; base="WT"),
                        :Stim => DummyCoding(; base=false),
                        :Wall => DummyCoding(; base=false), 
                        :SubjectID => Grouping()) # pseudo-contrast for grouping variables
Dict{Symbol, StatsModels.AbstractContrasts} with 5 entries:
  :SubjectID => Grouping()
  :Stim      => DummyCoding(false, nothing)
  :Gen       => EffectsCoding("WT", nothing)
  :Wall      => DummyCoding(false, nothing)
  :Protocol  => EffectsCoding("90/90", nothing)

julia> m_v1c = fit(MixedModel,form,ex_v1; contrasts);

julia> m_v2c = fit(MixedModel,form,ex_v2; contrasts);

julia> m_v3c = fit(MixedModel,form,ex_v3; contrasts);

These models are all identical within numerical precision, with the final one being different in summary because the ordering of the corresponding categorical column impacts the display ordering.

You can also see this by plotting the corresponding model effects/predictions using the Effects.jl package.

2 replies

DarioSarra Apr 20, 2022
Author

Thank you very much @palday! This is a great solution and a good pointer for things that I need to understand better in terms of MIxedModels usage.

kleinschmidt Apr 20, 2022
Maintainer

If your contrasts have more than two levels you probably also want to specify the levels= in the contrasts so that the order is consistent, as well. I think it defaults to alphabetical order, or the order in the categorical array levels if that's how they're input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unclear interaction between MixedModels & CategoricalArrays #606

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Unclear interaction between MixedModels & CategoricalArrays #606

DarioSarra Apr 20, 2022

Replies: 1 comment · 2 replies

palday Apr 20, 2022 Maintainer

DarioSarra Apr 20, 2022 Author

kleinschmidt Apr 20, 2022 Maintainer

DarioSarra
Apr 20, 2022

Replies: 1 comment 2 replies

palday
Apr 20, 2022
Maintainer

DarioSarra Apr 20, 2022
Author

kleinschmidt Apr 20, 2022
Maintainer