Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating a population data set. #4

Open
Hadsga opened this issue Aug 14, 2019 · 1 comment
Open

Generating a population data set. #4

Hadsga opened this issue Aug 14, 2019 · 1 comment
Assignees

Comments

@Hadsga
Copy link

Hadsga commented Aug 14, 2019

I try to evaluate measures for Feature Importance for a regression. I have data set with highly correlated features. So the betas have a high variance. This makes it difficult to estimate the "true betas" i.e. the true Feature Importance. An Idea was to simulate the data and compare some measures like Permutation Importance or Shapely Value. Therefore, I need the "true Importance" i.e. the population data. For this task, I used your package. However, I only can simulate the training data or the test data but I can´t generate the population data. Is there a solution for this?

@therimalaya
Copy link
Collaborator

Hi Hadsga, Sorry for the late response. It will definitely be very useful to have variable importance for the population. I figured out that the variable importance for a certain variable is just the change in model error if you remove that variable. So, Here is a function you can use to calculate it. I have tested with variable importance you can get with caret package. With a large number of observation, the values are quite similar.

get_imp <- function(simrel_obj) {
  ## Complete Model
  rotation <- as.matrix(Matrix::bdiag(1, sobj$Rotation))
  sigma <- rotation %*% sobj$Sigma %*% t(rotation)
  sigma_xy <- sigma[-1,1]
  sigma_yx <- sigma[1,-1]
  sigma_xx <- sigma[-1,-1]
  sigma_yy <- sigma[1,1]
  minerr <- sigma_yy - sigma_yx %*% solve(sigma_xx) %*% sigma_xy
  
  ## Reduced Model
  imp <- c()
  for (idx in 1:sobj$p) {
    sigma_xiy  <- sigma_xy[-idx]
    sigma_yxi  <- sigma_yx[-idx]
    sigma_xixi <- sigma_xx[-idx,-idx]
    minerr_i   <- sigma_yy - sigma_yxi %*% solve(sigma_xixi) %*% sigma_xiy
    imp[idx]   <- minerr_i - minerr
  }
  
  return(imp)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants