Universal effect sizes for regression parameters #127

DominiqueMakowski · 2020-09-16T14:30:52Z

DominiqueMakowski
Sep 16, 2020
Maintainer

Follow-up of #104

First, the following is based on the assumptions that reporting and computing "standardized" (unitless/comparable) indices for parameters are useful, and that users will use them to help to interpret their effects (and thus, people will consciously or unconsciously use some rules of thumb, such as guidelines for r or d).

Now the problem, from what I see, with "standardized" parameters (i.e., expressed in the standard deviations of the depending variable) is that they are still yet not fully comparable. The reason is that the parameters do not correspond to the same "type".

For instance in the case of a linear model (where y is a continuous variable), if x is continuous, then the corresponding parameter in the model represents the strength of association. And, when computing its standardized version (where x and y are standardized), it actually gives something akin to a correlation r (defined between -1 and 1). However, if x is a factor, then the parameters will express differences between different levels, and the standardized version will be more akin to a standardized difference (in which case it can be bigger than 1). Finally, if the parameter refers to an interaction, then its type depends on the type of its basis (i.e., the parameter to which the interaction coef is added). And naturally, for double or triple interactions, it gets complicated.

So, IMO, understanding and interpreting effect sizes for regression parameters requires first to know / find the type of each parameters.

These "types" is what I try to find with parameters_type:

parameters::parameters_type(lm(Sepal.Length ~ Species * Petal.Length, data=iris))
#>                        Parameter        Type        Link              Term
#> 1                    (Intercept)   intercept        Mean       (Intercept)
#> 2              Speciesversicolor      factor  Difference Speciesversicolor
#> 3               Speciesvirginica      factor  Difference  Speciesvirginica
#> 4                   Petal.Length     numeric Association      Petal.Length
#> 5 Speciesversicolor:Petal.Length interaction  Difference      Petal.Length
#> 6  Speciesvirginica:Petal.Length interaction  Difference      Petal.Length
#>       Variable      Level Secondary_Parameter Secondary_Type Secondary_Link
#> 1         <NA>       <NA>                <NA>           <NA>           <NA>
#> 2      Species versicolor                <NA>           <NA>           <NA>
#> 3      Species  virginica                <NA>           <NA>           <NA>
#> 4 Petal.Length       <NA>                <NA>           <NA>           <NA>
#> 5 Petal.Length       <NA>   Speciesversicolor         factor     Difference
#> 6 Petal.Length       <NA>    Speciesvirginica         factor     Difference
#>      Secondary_Term Secondary_Variable Secondary_Level Tertiary_Parameter
#> 1              <NA>               <NA>            <NA>               <NA>
#> 2              <NA>               <NA>            <NA>               <NA>
#> 3              <NA>               <NA>            <NA>               <NA>
#> 4              <NA>               <NA>            <NA>               <NA>
#> 5 Speciesversicolor            Species      versicolor               <NA>
#> 6  Speciesvirginica            Species       virginica               <NA>

^{Created on 2020-09-16 by the reprex package (v0.3.0)}

Note that it gets complicated with nested effects etc. Moreover currently there is a limit on how "deep" it can go (limited to a triple interaction). The implementation of this type-finder could surely be improved by a brighter mind.

Now, coming back to effect sizes, this divergence is a problem because it seems that most people are unaware of that, and simply run standardize_parameters(model) and then interpret all the parameters in the same way, disregarding their meaning (I'm guilty of that myself, having by the past suggested that "standardized parameters can be interpreted as standardized differences" without insisting on the type).

So my goal with interpret_parameters() was to basically put all the standardized parameters "on the same scale" so that they are comparable between them. Because there are converters from r to d for instance, my goal was to convert all to "r"-like coefficients (I think that its defined space from-1 to 1 makes it the most intuitive of all indices). Note that another problem is that the converters (especially from d to r) are not perfect, and loaded with assumptions... but still, it seems like an interesting avenue to explore.

Any thoughts/ideas? @mattansb does that clarify the other issue 😁?

mattansb · 2020-09-16T15:11:56Z

mattansb
Sep 16, 2020
Maintainer

And, when computing its standardized version (where x and y are standardized), it actually gives something akin to a correlation r (defined between -1 and 1)

This is only true for simple regressions, in multiple regressions it is possible, and in some cases common to have |beta|>1, which breaks some out your logic (I'll come back to this later). Similarly:

[...] if x is a factor, then the parameters will express differences between different levels

This depends on the standardization method. When using method = "basic" to standardize, the variance in the factor (ie in the dummy coding) is taken into account, allowing for factor related parameters to be comparable to those of continuous predictors. This can also be done by setting two_sd =TRUE.

For interactions, I agree that when they are standardized the "dumb" way the interpretation is muddier:

looking at x:m as a regular variable when standardizing is meaningless - what is "1 sd" of an interaction term?
same for simple slopes / lower interactions if predictors are not centered - what is "1 sd" of x when it it outside the range of m?

However, even with "dumb" standardization, we can still interpret the "weight" of an interaction parameter or factor parameter in the model.

I think in the realm of standardized parameters there are two questions we can ask:

What would the slopes be if all the data where on the same scale (method = "refit").
What is the prediction weight of each parameter (method = "basic")

With all other options somewhere in between. When the model is simple (no interactions, no factors) the solution from both is identical. But as the model becomes more and more complex, they drift apart, and users must ask themselves what exactly they want to know (I usually want to know 2).

(We can also ask what is the importance of a predictor, which would be the change-R2 with/without the predictor, but neither 1 nor 2 can give a perfect estimation due to colinearity / the presence mixed effects...)

Because there are converters from r to d for instance, my goal was to convert all to "r"-like coefficients (I think that its defined space from-1 to 1 makes it the most intuitive of all indices). Note that another problem is that the converters (especially from d to r) are not perfect, and loaded with assumptions...

As you mention, these conversions are very crude - and as I mentioned above, beta has different properties than r, so in regression a beta of 0.7 would have a different interpretation than an r of the same value, so converting all the d-like to r only gets you half way there... ^†

Another avenue to think about is to look a partial correlations (which unlike beta are in the [-1, 1] range). In linear non-mixed models, these get as close as you can to (3) from above (but the presence of mixed effect breaks this too, limiting the interpretability of terms on different levels... :/)

† Now that I understand what you were trying to do, I will not change interpret_parameters - I will leave it as it was with a warning pointing here and informing users that the functions is experimental.

0 replies

mattansb · 2020-09-16T15:21:04Z

mattansb
Sep 16, 2020
Maintainer

Also how is this related to #6 ?

0 replies

DominiqueMakowski · 2020-09-16T15:49:07Z

DominiqueMakowski
Sep 16, 2020
Maintainer Author

This is only true for simple regressions, in multiple regressions it is possible

True, this is also why I really enjoyed our discussions that we had a few months ago about how to retrieve partial correlations from models' statistics (and the issue of Bayesian methods) because I've seen it as a potential way of solving this issue.

This depends on the standardization method. When using method = "basic" to standardize, the variance in the factor (ie in the dummy coding) is taken into account, allowing for factor related parameters to be comparable to those of continuous predictors

Well but not really though, using method - basic it literally divides the difference between a given level and the intercept (i.e., the effect of that parameter) by the crude SD of the dummy coded variable. But the SD of a binary variable doesn't make much intuitive sense. Basically it seems to me like this is mixing apples and pears, and the resulting index is weird. It's much more meaningful in this case to use method - refit (which doesn't touch factors), which is the same as dividing the parameter only by the SD of the dependent variable. In which case it's a kind of standardized difference (issues related to the denominator, pooled SD etc. aside - although in theory we should be able to retrieve and recompute the appropriate denominator and get legit Cohen's ds or such).

I don't think that the methods basic and refit are inherently different, the "basic" method is simply not appropriate in more complex cases (especially for interactions involving factors, but also for terms related to polynomials, splines, and pretty much everything beyond simple parameters). The "refit" method tackles this issue by transforming the data beforehand so that all parameters are naturally computed from standardized data. But then it doesn't solve the above mentioned issue of effect size meaning as far as I can see.

The problem of the "refit" method is that it requires to refit the model, which is computationally heavy. the "SMART" method in #6 aimed at finding away of reconstructing the same parameters as given by the "refit" method (which is the safest) using a posteriori information, which also requires information about the types, to know by what to divide. For instance, for parameters of continuous variables, you divide the coef by the SD of the dependent and also of the independent. For the parameters related to differences, because standardize() doesn't touch factors, the dummy variables are not affected, hence you need to divide the coef only by the SD of the dependent and for interaction / nested terms it's a highway to hell

so imothe standardization method and the "standardization of standardized parameters" are two conceptually different issues, albeit underpinned by some common issues...

0 replies

mattansb · 2020-09-16T16:18:36Z

mattansb
Sep 16, 2020
Maintainer

it literally divides the difference between a given level and the intercept

That doesn't actually matter, reverse coding still has the same var(x):

x <- sample(c(TRUE,FALSE), size = 100, replace = TRUE)

mean(x)
#> [1] 0.61
var(x)
#> [1] 0.240303

mean(!x)
#> [1] 0.39
var(!x)
#> [1] 0.240303

Anyway, this is only when using contr.treatment coding - for other contr.* methods the involvement of the intercept is less straightforward, demonstrating it's role is pretty trivial (as long as it is included in the model!).

f <- factor(x)
contrasts(f) <- contr.sum
contrasts(f)
#>       [,1]
#> FALSE    1
#> TRUE    -1

var(model.matrix(~f)[,2])/(2^2) # need to divide by 2^2, just to resale the range [-1,1] to the same as [0 1]
#> [1] 0.240303

I don't agree that "basic" is inherently not appropriate for more complex cases - they answer different questions. The problem is that they are (for most simple models) so close that it's hard to tell which to choose when they no longer give close results.

(issues related to the denominator, pooled SD etc. aside - although in theory we should be able to retrieve and recompute the appropriate denominator and get legit Cohen's ds or such)

This again depends on the model complexity, but if you think of Cohen's d as the distance between 2 populations' means as a function of the variance around these means (which are assumed to be equal), we can get it perfectly for t-test like linear regressions:

m <- lm(mpg ~ am, mtcars)
coef(m)[-1] / sigma(m)
#>       am 
#> 1.477947


# compare to:
effectsize::cohens_d(mpg ~ am, data = mtcars)
#> Cohen's d |         95% CI
#> --------------------------
#>     -1.48 | [-2.27, -0.67]

And in general the idea should hold for any (not generalized) linear model, as long as we keep in mind what the sigma() represents^* (for LMMs we can use insight::get_variance() to get the variance on the correct level. Also see emmeans::eff_size()).

[* this is actually how BayesFactor scales its priors...]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Universal effect sizes for regression parameters #127

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Universal effect sizes for regression parameters #127

DominiqueMakowski Sep 16, 2020 Maintainer

Replies: 4 comments

mattansb Sep 16, 2020 Maintainer

mattansb Sep 16, 2020 Maintainer

DominiqueMakowski Sep 16, 2020 Maintainer Author

mattansb Sep 16, 2020 Maintainer

DominiqueMakowski
Sep 16, 2020
Maintainer

mattansb
Sep 16, 2020
Maintainer

mattansb
Sep 16, 2020
Maintainer

DominiqueMakowski
Sep 16, 2020
Maintainer Author

mattansb
Sep 16, 2020
Maintainer