Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra AnnData in Table? #270

Closed
AdemSaglamRB opened this issue May 18, 2023 · 4 comments
Closed

Extra AnnData in Table? #270

AdemSaglamRB opened this issue May 18, 2023 · 4 comments

Comments

@AdemSaglamRB
Copy link

Hello,

For multi-omics data, is it possible to add more tables to Table in a spatialdata object?

@LucaMarconato
Copy link
Member

Hi @AdemSaglamRB 😊

When we designed the package we had extensive discussions on having one versus multiple tables (see for example some notes here #43). For instance one disadvantage of not having multiple tables in the same object is that one would have to use separate objects to represent multimodal data, but in turn one advantage is that the APIs for data manipulation are generally simpler to implement.

After balancing the pros and cons we decided to support single tables in our first release, collect feedbacks from the users, and see how to relax this limitation.

To represent multiple modalities you can go for one of the following:

  • use multiple SpatialData objects
  • use multiple layers in AnnData
  • use .obsm / .obs in the table
  • add extra columns to the GeoDataFrame representing shapes
  • use multiple AnnData table and replace the sdata.table slot

Our preferred ways are the first two, but we recognize that none of these is optimal and we are exploring alternatives. To this regard, it would be appreciated if you could share an outline of your use cases (which multi-modal data you want to store and which types of cross-modality operations/manipulations you want to perform).

@AdemSaglamRB
Copy link
Author

Hi @LucaMarconato ,

Thank you for your answer.
Using several SpatialData is definitely the easiest way to go.

The multi-omics data I have been dealing with encompass RNA, protein and ATAC-seq information.
These have always the same number of cells but different number of features. The count matrix for each will look like this:

  • RNA: N cells x K genes
  • protein: N cells x L proteins
  • ATAC: N cells x M peaks

Because of the different size of the count matrices, having different layers in AnnData won't work as each layer is required to have the same sizes.

On the other side, the solution I was thinking about was that the sdata.table could hold a MuData object rather than AnnData. I don't know if this is easy or complicated to implement though.

Out of curiosity, how do you use multiple AnnData table to replace the sdata.table slot?

Thank you for your help.

@LucaMarconato
Copy link
Member

Thank you for the explanation of your use case, we will consider this feedback when discussion the new table designs. Having a MuData object instead than an AnnData object would work in this case, and the implementation is feasible, but there would be some drawbacks:

  • MuData would not be ideal for the case in which different sets of shapes (e.g. cells segmentation masks, same cells but with different segmentation mask, anatomical annotations) are annotated with the same MuData, because in that case having multiple individual tables would be handier for indices manipulation, especially after filtering rows and adding/removing new sets of shapes
  • it creates a challenges for the storage; the table specification that we proposed to NGFF Table spec proposal ome/ngff#64 would not suffice for storing a MuData table. Note that we could still store multiple AnnData tables and loading them as a MuData, and when saving a MuData object we could discard the information that is not contained in the individual tables.

@AdemSaglamRB
Copy link
Author

Thank you for your answer @LucaMarconato . Then we can close this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants