Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking to lower-level data structures (molecule information) #24

Closed
ambrosejcarr opened this issue Mar 28, 2022 · 2 comments
Closed

Linking to lower-level data structures (molecule information) #24

ambrosejcarr opened this issue Mar 28, 2022 · 2 comments

Comments

@ambrosejcarr
Copy link
Member

ambrosejcarr commented Mar 28, 2022

We have aligned that Matrix-API should be a "canonical intermediate format" for omics data as opposed to a format that can also absorb all potential downstream and upstream representations 1.

There is interest in capturing and linking the underlying information that is used to create the aggregated (observation, feature) matrices. These data types are out of scope for storing in Matrix-API, but it would be valuable to identify the use cases for transformation of these data types into Matrix-API representations.

Use cases, with the underlying molecular information in bold:

  • In scRNA-seq, a raw data matrix describes the number of RNA molecules observed for each gene in each cell2.
  • In scATAC-seq, genomic alignments are counted or analyzed to create "peak", "genomic bin" or "gene activity score" features. The underlying data can be stored in WIG, BigWIG, or BedGraph formats3.
  • In spatial transcriptomics studies, RNA molecules are spatially localized in euclidean space and assigned to cells by a segmentation algorithm. OME/ngff are exploring how to represent these data: Table spec proposal ome/ngff#64 and Nanostring are developing the CosMX assay which will generate this kind of information at large scale.

Footnotes

  1. #11, see comment

  2. Example 10x experiment, Direct download link for per-molecule information.

  3. UCSC Wig format description

@pablo-gar
Copy link
Member

@ambrosejcarr I'm having trouble interpreting this issue. What kind of underlying molecular information are you referring to?

@pablo-gar
Copy link
Member

The scope of SOMA as it will be in 1.0 is not explicitly defined to include any intermediary information. But the flexibility of SOMA allows users to port these information, when possible, to SOMA objects if amenable. For example a WIG file can be represented as a SOMADenseArray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants