You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have aligned that Matrix-API should be a "canonical intermediate format" for omics data as opposed to a format that can also absorb all potential downstream and upstream representations 1.
There is interest in capturing and linking the underlying information that is used to create the aggregated (observation, feature) matrices. These data types are out of scope for storing in Matrix-API, but it would be valuable to identify the use cases for transformation of these data types into Matrix-API representations.
Use cases, with the underlying molecular information in bold:
In scRNA-seq, a raw data matrix describes the number of RNA molecules observed for each gene in each cell2.
In scATAC-seq, genomic alignments are counted or analyzed to create "peak", "genomic bin" or "gene activity score" features. The underlying data can be stored in WIG, BigWIG, or BedGraph formats3.
In spatial transcriptomics studies, RNA molecules are spatially localized in euclidean space and assigned to cells by a segmentation algorithm. OME/ngff are exploring how to represent these data: Table spec proposal ome/ngff#64 and Nanostring are developing the CosMX assay which will generate this kind of information at large scale.
The scope of SOMA as it will be in 1.0 is not explicitly defined to include any intermediary information. But the flexibility of SOMA allows users to port these information, when possible, to SOMA objects if amenable. For example a WIG file can be represented as a SOMADenseArray.
We have aligned that Matrix-API should be a "canonical intermediate format" for omics data as opposed to a format that can also absorb all potential downstream and upstream representations 1.
There is interest in capturing and linking the underlying information that is used to create the aggregated (observation, feature) matrices. These data types are out of scope for storing in Matrix-API, but it would be valuable to identify the use cases for transformation of these data types into Matrix-API representations.
Use cases, with the underlying molecular information in bold:
Footnotes
#11, see comment ↩
Example 10x experiment, Direct download link for per-molecule information. ↩
UCSC Wig format description ↩
The text was updated successfully, but these errors were encountered: