Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utility function for handling missing values in SAS data #158

Open
nanxstats opened this issue Feb 9, 2024 · 2 comments
Open

Add utility function for handling missing values in SAS data #158

nanxstats opened this issue Feb 9, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@nanxstats
Copy link
Collaborator

Our everyday workflow involves converting SAS data to R dataset using haven::read_sas(). However, in most of the cases, there is an additional step to convert SAS missing values for characters to NAs in R. This is not handled by the read_sas() call. It would be great to standardize this flow and reduce frictions by implementing a reusable utility function in metalite.

This issue was discussed in this pharmaverse blog post. Theoretically, it can be handled by introducing admiral as a dependency. However, it currently has a non-trivial number of dependencies that we might want to minimize. So, implementing this in the zero-dependency metalite package could be ideal.

cc @julianschmocker @BrianLang

@nanxstats nanxstats added the enhancement New feature or request label Feb 9, 2024
@nanxstats nanxstats changed the title Add utility function for handling missing values after reading SAS data Add utility function for handling missing values in SAS data Feb 9, 2024
@elong0527
Copy link
Collaborator

I'd hesitate to guess user's intention. What if people intentionally use "" to represent a blank cell?

Should we either wait haven enable a proper behavior or write an example to guide people how to explicitly handle it?

e.g. 'mutate(across(is.character), function(x) ifelse(x == "", NA, x)'

@nanxstats
Copy link
Collaborator Author

I agree, the behavior should be made transparent and flexible, with a sensible default so that user land code is not tedious.

I would say instead of metalite, this could be a good fit for the "msdtools" package if that ever gets priority next year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants