Universal effect sizes for regression parameters #127
Replies: 4 comments
-
This is only true for simple regressions, in multiple regressions it is possible, and in some cases common to have
This depends on the standardization method. When using For interactions, I agree that when they are standardized the "dumb" way the interpretation is muddier:
However, even with "dumb" standardization, we can still interpret the "weight" of an interaction parameter or factor parameter in the model. I think in the realm of standardized parameters there are two questions we can ask:
With all other options somewhere in between. When the model is simple (no interactions, no factors) the solution from both is identical. But as the model becomes more and more complex, they drift apart, and users must ask themselves what exactly they want to know (I usually want to know 2).
As you mention, these conversions are very crude - and as I mentioned above, beta has different properties than r, so in regression a Another avenue to think about is to look a partial correlations (which unlike beta are in the [-1, 1] range). In linear non-mixed models, these get as close as you can to (3) from above (but the presence of mixed effect breaks this too, limiting the interpretability of terms on different levels... :/) † Now that I understand what you were trying to do, I will not change |
Beta Was this translation helpful? Give feedback.
-
Also how is this related to #6 ? |
Beta Was this translation helpful? Give feedback.
-
True, this is also why I really enjoyed our discussions that we had a few months ago about how to retrieve partial correlations from models' statistics (and the issue of Bayesian methods) because I've seen it as a potential way of solving this issue.
Well but not really though, using method - basic it literally divides the difference between a given level and the intercept (i.e., the effect of that parameter) by the crude SD of the dummy coded variable. But the SD of a binary variable doesn't make much intuitive sense. Basically it seems to me like this is mixing apples and pears, and the resulting index is weird. It's much more meaningful in this case to use method - refit (which doesn't touch factors), which is the same as dividing the parameter only by the SD of the dependent variable. In which case it's a kind of standardized difference (issues related to the denominator, pooled SD etc. aside - although in theory we should be able to retrieve and recompute the appropriate denominator and get legit Cohen's ds or such). I don't think that the methods basic and refit are inherently different, the "basic" method is simply not appropriate in more complex cases (especially for interactions involving factors, but also for terms related to polynomials, splines, and pretty much everything beyond simple parameters). The "refit" method tackles this issue by transforming the data beforehand so that all parameters are naturally computed from standardized data. But then it doesn't solve the above mentioned issue of effect size meaning as far as I can see. The problem of the "refit" method is that it requires to refit the model, which is computationally heavy. the "SMART" method in #6 aimed at finding away of reconstructing the same parameters as given by the "refit" method (which is the safest) using a posteriori information, which also requires information about the types, to know by what to divide. For instance, for parameters of continuous variables, you divide the coef by the SD of the dependent and also of the independent. For the parameters related to differences, because so imothe standardization method and the "standardization of standardized parameters" are two conceptually different issues, albeit underpinned by some common issues... |
Beta Was this translation helpful? Give feedback.
-
That doesn't actually matter, reverse coding still has the same var(x): x <- sample(c(TRUE,FALSE), size = 100, replace = TRUE)
mean(x)
#> [1] 0.61
var(x)
#> [1] 0.240303
mean(!x)
#> [1] 0.39
var(!x)
#> [1] 0.240303 Anyway, this is only when using f <- factor(x)
contrasts(f) <- contr.sum
contrasts(f)
#> [,1]
#> FALSE 1
#> TRUE -1
var(model.matrix(~f)[,2])/(2^2) # need to divide by 2^2, just to resale the range [-1,1] to the same as [0 1]
#> [1] 0.240303 I don't agree that
This again depends on the model complexity, but if you think of Cohen's d as the distance between 2 populations' means as a function of the variance around these means (which are assumed to be equal), we can get it perfectly for t-test like linear regressions: m <- lm(mpg ~ am, mtcars)
coef(m)[-1] / sigma(m)
#> am
#> 1.477947
# compare to:
effectsize::cohens_d(mpg ~ am, data = mtcars)
#> Cohen's d | 95% CI
#> --------------------------
#> -1.48 | [-2.27, -0.67] And in general the idea should hold for any (not generalized) linear model, as long as we keep in mind what the [* this is actually how |
Beta Was this translation helpful? Give feedback.
-
Follow-up of #104
First, the following is based on the assumptions that reporting and computing "standardized" (unitless/comparable) indices for parameters are useful, and that users will use them to help to interpret their effects (and thus, people will consciously or unconsciously use some rules of thumb, such as guidelines for r or d).
Now the problem, from what I see, with "standardized" parameters (i.e., expressed in the standard deviations of the depending variable) is that they are still yet not fully comparable. The reason is that the parameters do not correspond to the same "type".
For instance in the case of a linear model (where y is a continuous variable), if x is continuous, then the corresponding parameter in the model represents the strength of association. And, when computing its standardized version (where x and y are standardized), it actually gives something akin to a correlation r (defined between -1 and 1). However, if x is a factor, then the parameters will express differences between different levels, and the standardized version will be more akin to a standardized difference (in which case it can be bigger than 1). Finally, if the parameter refers to an interaction, then its type depends on the type of its basis (i.e., the parameter to which the interaction coef is added). And naturally, for double or triple interactions, it gets complicated.
So, IMO, understanding and interpreting effect sizes for regression parameters requires first to know / find the type of each parameters.
These "types" is what I try to find with
parameters_type
:Created on 2020-09-16 by the reprex package (v0.3.0)
Note that it gets complicated with nested effects etc. Moreover currently there is a limit on how "deep" it can go (limited to a triple interaction). The implementation of this type-finder could surely be improved by a brighter mind.
Now, coming back to effect sizes, this divergence is a problem because it seems that most people are unaware of that, and simply run
standardize_parameters(model)
and then interpret all the parameters in the same way, disregarding their meaning (I'm guilty of that myself, having by the past suggested that "standardized parameters can be interpreted as standardized differences" without insisting on the type).So my goal with
interpret_parameters()
was to basically put all the standardized parameters "on the same scale" so that they are comparable between them. Because there are converters from r to d for instance, my goal was to convert all to "r"-like coefficients (I think that its defined space from-1 to 1 makes it the most intuitive of all indices). Note that another problem is that the converters (especially from d to r) are not perfect, and loaded with assumptions... but still, it seems like an interesting avenue to explore.Any thoughts/ideas? @mattansb does that clarify the other issue 😁?
Beta Was this translation helpful? Give feedback.
All reactions