-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI tool for aggregating single cell data #1264
Comments
This is done, I'm just doing a little bit more testing at this point. |
I'm almost done with deleting aggregated data. There's a few caveat to consider such as whether to remove the dimension and resetting the single-cell metrics (see #1273), but also deleting generated data files for that QT. |
Good, now we can reliably aggregate and delete aggregated vectors! I'm looking into some -Infinity slipping through the aggregation process and causing the processed vectors to be filled with NaNs... I've also made some improvements to which file get deleted when regenerating a platform annotations and pre-processing an experiment. |
Ok got the NaN situation figured out. We need to adjust the data to the library size and add a pseudocount just like we do for log2cpm of RNA-Seq data. |
Counting data would become linear after library size normalization. Linear data would technically not be CPM, but I don't think that is important. We also need to look into allowing count data to use a logarithmic scale type. We might find counting data out there that is unfortunately already log-transformed. |
Another thing to include in the tests is non-integer counting data. This happens for some method that regresses out ambient RNA or other contaminants from the data. This we would get a general type COUNT and a scale type LINEAR, or something similar. We can add a way to generate such vectors by adding a little bit of multiplicative Gaussian noise. |
TODO
The text was updated successfully, but these errors were encountered: