Better Handle Variable Types #16

jrm5100 · 2019-07-18T19:01:57Z

I think a good test-case for this will be creating a datatype for a variant based on some PLINK files:

BED (00/01/10/11 encoding for each variant for each sample)
BIM (chromosome, ID, position, coordinate, allele 1, allele 2)

I think the BIM information would be part of a pandas.api.extensions.ExtensionArray type (since the information would apply to an array/column) and the actual genotype information would be part of a pandas.api.extensions.ExtensionDtype type.

The text was updated successfully, but these errors were encountered:

jrm5100 · 2019-08-05T21:13:47Z

Currently the variable types are converted from the underlying numerical types.

Binary = Categorical with 2 categories
Categorical = Categorical with > 2 categories
Continuous= Anything numeric that hasn't been made into a 'category' type
Unknown = 'object' type which occurs when two different types are combined, or when there are strings.

Right now converting to categorical actually converts to binary if there are only two unique values.
Ideally the starting type would always be "unknown".

The custom datatypes mentioned above could be a good solution (and would allow for more types in the future as well).

jrm5100 · 2020-08-03T21:23:12Z

As of Pandas v1.1, any value can be converted to the "String" type. This could be treated as "Unknown" in CLARITE and loaded by default.

jrm5100 changed the title ~~Handle Genotype Data~~ Better Handle Variable Types Aug 5, 2019

jrm5100 self-assigned this Jun 4, 2020

jrm5100 added the enhancement New feature or request label Jun 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better Handle Variable Types #16

Better Handle Variable Types #16

jrm5100 commented Jul 18, 2019 •

edited

Loading

jrm5100 commented Aug 5, 2019

jrm5100 commented Aug 3, 2020

Better Handle Variable Types #16

Better Handle Variable Types #16

Comments

jrm5100 commented Jul 18, 2019 • edited Loading

jrm5100 commented Aug 5, 2019

jrm5100 commented Aug 3, 2020

jrm5100 commented Jul 18, 2019 •

edited

Loading