You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think the BIM information would be part of a pandas.api.extensions.ExtensionArray type (since the information would apply to an array/column) and the actual genotype information would be part of a pandas.api.extensions.ExtensionDtype type.
The text was updated successfully, but these errors were encountered:
Currently the variable types are converted from the underlying numerical types.
Binary = Categorical with 2 categories
Categorical = Categorical with > 2 categories
Continuous= Anything numeric that hasn't been made into a 'category' type
Unknown = 'object' type which occurs when two different types are combined, or when there are strings.
Right now converting to categorical actually converts to binary if there are only two unique values.
Ideally the starting type would always be "unknown".
The custom datatypes mentioned above could be a good solution (and would allow for more types in the future as well).
jrm5100
changed the title
Handle Genotype Data
Better Handle Variable Types
Aug 5, 2019
I think a good test-case for this will be creating a datatype for a variant based on some PLINK files:
I think the BIM information would be part of a
pandas.api.extensions.ExtensionArray
type (since the information would apply to an array/column) and the actual genotype information would be part of apandas.api.extensions.ExtensionDtype
type.The text was updated successfully, but these errors were encountered: