Unclear interaction between MixedModels & CategoricalArrays #606
-
I have a dataset (in the attachment) with a few categorical variables and I am interested in assessing how they affect the response variable (simple and interaction effects). For visualization purposes, I started using CategoricalArrays so that the outcome shows the level of the variable more interesting to my design. However, I found out to my surprise that the estimated coefficients, as well as the significance of the factors, change in a way that I can't explain in terms of a simple change in the order of visualization. I imagine that there must be an explanation, but in the documentation, I couldn't find any direct mention of how a CategoricalArrays would change the contrast matrix or other aspects of the fitting. Here is the code I used where ex_v1 is the dataset in the attachment: using CSV, Downloads, DataFrames, MixedModels, CategoricalArrays
ex_v1 = CSV.read(Downloads.download("https://github.com/JuliaStats/MixedModels.jl/files/8521187/Example.csv"), DataFrame)
ex_v2 = copy(ex_v1)
for x in [:Protocol,:Wall,:Gen,:Stim]
ex_v2[!,x] = categorical(ex_v2[:,x])
end
levels!(ex_v2.Gen,["WT", "HET"]);
ex_v3 = copy(ex_v2);
levels!(ex_v3.Protocol,["90/90", "90/30","30/30"]);
isordered(ex_v3.Protocol)
form = @formula(Response ~ 1 + Protocol * Wall *Stim * Gen+(1+Protocol+Wall|SubjectID));
m_v1 = fit(MixedModel,form,ex_v1);
m_v2 = fit(MixedModel,form,ex_v2);
m_v3 = fit(MixedModel,form,ex_v3); I would expect the 3 models to have the same results given that it is only a visualization reorganization (with a change in the coefficient sign where is appropriate). On the contrary, they are significantly different from each other and I can't find an explanation for it in the docs |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Thanks for providing an MWE. 😄 I suspect that this is mostly an issue of the weird ways that contrast coding plays out (especially since there are multiple interactions and a three-level variable), because all the deviances are quite similar and well within numerical tolerance: julia> deviance(m_v1)
136481.16714620416
julia> deviance(m_v2)
136481.16714408842
julia> deviance(m_v3)
136481.16714728964 Setting your contrasts (including their reference level) explicitly is a good idea so that you remember to report them: julia> contrasts = Dict(:Protocol => EffectsCoding(; base="90/90"),
:Gen => EffectsCoding(; base="WT"),
:Stim => DummyCoding(; base=false),
:Wall => DummyCoding(; base=false),
:SubjectID => Grouping()) # pseudo-contrast for grouping variables
Dict{Symbol, StatsModels.AbstractContrasts} with 5 entries:
:SubjectID => Grouping()
:Stim => DummyCoding(false, nothing)
:Gen => EffectsCoding("WT", nothing)
:Wall => DummyCoding(false, nothing)
:Protocol => EffectsCoding("90/90", nothing)
julia> m_v1c = fit(MixedModel,form,ex_v1; contrasts);
julia> m_v2c = fit(MixedModel,form,ex_v2; contrasts);
julia> m_v3c = fit(MixedModel,form,ex_v3; contrasts); These models are all identical within numerical precision, with the final one being different in summary because the ordering of the corresponding categorical column impacts the display ordering. You can also see this by plotting the corresponding model effects/predictions using the Effects.jl package. |
Beta Was this translation helpful? Give feedback.
Thanks for providing an MWE. 😄
I suspect that this is mostly an issue of the weird ways that contrast coding plays out (especially since there are multiple interactions and a three-level variable), because all the deviances are quite similar and well within numerical tolerance:
Setting your contrasts (including their reference level) explicitly is a good idea so that you remember to report them: