Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal proposal #3

Open
wants to merge 36 commits into
base: add-tables-points
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
c761555
First attempt at draft alternative proposal for the table spec
minnerbe May 26, 2023
27d1d2c
Fix a few things after reviewing the spec
minnerbe May 26, 2023
d61a9e1
First draft of minimal table spec
minnerbe Jun 1, 2023
3d38a15
Include suggestions by @d-v-b
minnerbe Jun 1, 2023
ea990e8
overhaul tables section introduction text
virginiascarlett Jun 2, 2023
0b3b652
start messing around with tables tree diagram
virginiascarlett Jun 2, 2023
0f58543
augment and clarify tables introduction
virginiascarlett Jun 2, 2023
eef3b3e
Homogenize dimension requirement for columns
minnerbe Jun 5, 2023
e864782
Homogenize appearance of zarr structure
minnerbe Jun 5, 2023
2d6f153
Add improvements based on discussion with @virginiascarlett
minnerbe Jun 5, 2023
4462547
clarify purpose of annotations
virginiascarlett Jun 5, 2023
d794f22
update first tables paragraph
virginiascarlett Jun 5, 2023
dd9ece8
change tables section title
virginiascarlett Jun 5, 2023
5d6cf99
add backticks around the word tables/
virginiascarlett Jun 5, 2023
2639d9e
update index.html
virginiascarlett Jun 5, 2023
f46d53e
change the second sentence of Tables section a tiny bit
virginiascarlett Jun 6, 2023
332e44d
process index.bs to index.html
virginiascarlett Jun 6, 2023
e717f4b
change one word: AnnData table to array, and process html
virginiascarlett Jun 6, 2023
b8190ba
Fix correction and clarify dimensions of annotated-data property
minnerbe Jun 6, 2023
5a26ad7
Make dimensionality of columns more concrete
minnerbe Jun 6, 2023
56560ac
Add statement about rows
minnerbe Jun 6, 2023
768cf1b
Add html file changes
minnerbe Jun 6, 2023
f6dbaa5
separate the cases of tables annotating images vs. annotating other t…
virginiascarlett Jun 6, 2023
77dada0
update html
virginiascarlett Jun 6, 2023
b65ebcd
no one will ever want to annotate a column
virginiascarlett Jun 6, 2023
8a45a5b
remove a newline
virginiascarlett Jun 6, 2023
9b08825
Add a file showing how anndata could be stored
minnerbe Jun 25, 2023
15c9a20
List tables in metadata (as suggested by @will-moore)
minnerbe Jun 30, 2023
75e8b01
Change row name representation
minnerbe Jun 30, 2023
550cfe7
clarify language in first paragraph of tables section
virginiascarlett Jul 12, 2023
997b930
explain the three properties in tables/table1/.zattrs
virginiascarlett Jul 12, 2023
03034b3
remove row_names from tables diagram
virginiascarlett Jul 12, 2023
09e1a5c
Fix spurious bracket in some link
minnerbe Aug 28, 2023
126be88
Fix `<img/>` element for newer bikeshed versions
joshmoore Mar 14, 2023
f3295b7
Fix JSON comments in tables
mkitti Aug 28, 2023
d63c147
Merge pull request #1 from mkitti/mkitti/minimal-proposal-fixes
minnerbe Aug 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
3 changes: 1 addition & 2 deletions 0.1/index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -501,8 +501,7 @@ Projects which support reading and/or writing OME-NGFF data include:

</dl>

<img src="https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/images/zarr-ome-diagram.png"
alt="Diagram of related projects"/>
<img src="https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/images/zarr-ome-diagram.png" alt="Diagram of related projects"></img>

All implementations prevent an equivalent representation of a dataset which can be downloaded or uploaded freely. An interactive
version of this diagram is available from the [OME2020 Workshop](https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/).
Expand Down
3 changes: 1 addition & 2 deletions 0.2/index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -559,8 +559,7 @@ Projects which support reading and/or writing OME-NGFF data include:

</dl>

<img src="https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/images/zarr-ome-diagram.png"
alt="Diagram of related projects"/>
<img src="https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/images/zarr-ome-diagram.png" alt="Diagram of related projects"></img>

All implementations prevent an equivalent representation of a dataset which can be downloaded or uploaded freely. An interactive
version of this diagram is available from the [OME2020 Workshop](https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/).
Expand Down
3 changes: 1 addition & 2 deletions 0.3/index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -569,8 +569,7 @@ Projects which support reading and/or writing OME-NGFF data include:

</dl>

<img src="https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/images/zarr-ome-diagram.png"
alt="Diagram of related projects"/>
<img src="https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/images/zarr-ome-diagram.png" alt="Diagram of related projects"></img>

All implementations prevent an equivalent representation of a dataset which can be downloaded or uploaded freely. An interactive
version of this diagram is available from the [OME2020 Workshop](https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/).
Expand Down
3 changes: 1 addition & 2 deletions 0.4/index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -578,8 +578,7 @@ Projects which support reading and/or writing OME-NGFF data include:

</dl>

<img src="https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/images/zarr-ome-diagram.png"
alt="Diagram of related projects"/>
<img src="https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/images/zarr-ome-diagram.png" alt="Diagram of related projects"></img>

All implementations prevent an equivalent representation of a dataset which can be downloaded or uploaded freely. An interactive
version of this diagram is available from the [OME2020 Workshop](https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/zarr_diagram/).
Expand Down
6 changes: 0 additions & 6 deletions index.html
Original file line number Diff line number Diff line change
@@ -1,6 +0,0 @@
<head>
<meta http-equiv="refresh" content="5; URL=https://ngff.openmicroscopy.org/latest/" />
</head>
<body>
<p>If you are not redirected in five seconds, <a href="https://ngff.openmicroscopy.org/latest/">click here</a>.</p>
</body>
149 changes: 149 additions & 0 deletions latest/generate_anndata_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
import numpy as np
import pandas as pd
import anndata as ad
import zarr
import numcodecs
from scipy.sparse import csr_matrix, csc_matrix

def generate_example(n_obs, n_var):
# X and layers
adata = ad.AnnData(csr_matrix(np.random.poisson(1, size=(n_obs, n_var)), dtype=np.float32))
adata.layers["log_transformed"] = np.log1p(adata.X)
adata.layers["other_data"] = np.random.poisson(1, size=(n_obs, n_var)) + 1.0

# obs and var names
adata.obs_names = [f"Cell_{i:d}" for i in range(adata.n_obs)]
adata.var_names = [f"Gene_{i:d}" for i in range(adata.n_vars)]

# annotations
# the tutorial mentions that string arrays are automatically converted to categoricals if convenient
adata.obs["cell_type"] = pd.Categorical(np.random.choice(["B", "T", "Monocyte"], size=(adata.n_obs,)))
adata.obsm["X_umap"] = np.random.normal(0, 1, size=(adata.n_obs, 2))
adata.varm["gene_stuff"] = np.random.normal(0, 1, size=(adata.n_vars, 5))
adata.obsp["pairwise_data"] = csc_matrix(np.random.poisson(1, size=(n_obs, n_obs)), dtype=np.int32)

# uns
adata.uns["random"] = [1, 2, 3]

return adata


def write_anndata(adata, filename, chunks):
# root and tables
root = zarr.open(filename, mode="w")
root.array("some_image", np.array([0]), chunks=(1,))
tables = root.create_group("tables")
tables.attrs["tables"] = ["/anndata/obs", "/anndata/var", "/anndata/obsm", "/anndata/varm", "/anndata/obsp", "/anndata/varp"]
adgroup = tables.create_group("anndata")
adgroup.attrs["anndata"] = "0.9.1"
adgroup.attrs["other-metadata"] = "metadata describing how the anndata annotates some_image"

# X/layers
X = adgroup.array('X', adata.X.todense(), chunks=chunks)
layers = adgroup.create_group("layers")
layers.array("log_transformed", adata.layers["log_transformed"].todense(), chunks=chunks)
layers.array("other_data", adata.layers["other_data"], chunks=chunks)

# obs
localChunks = (chunks[0],)
obs = adgroup.create_group("obs")
obs.create_dataset("row_names", data=np.array(adata.obs_names), dtype=object, object_codec=numcodecs.VLenUTF8())
obs.create_dataset("cell_type", data=np.array(adata.obs["cell_type"]), chunks=localChunks, object_codec=numcodecs.VLenUTF8())
obs.attrs["annotated-data"] = get_annotated_data_map(dimension=0)
obs.attrs["column-order"] = ["row_names", "cell_type"]
obs.attrs["row-names"] = "row_names"

# obsm
obsm = adgroup.create_group("obsm")
obsm.create_dataset("X_umap", data=adata.obsm["X_umap"], chunks=(chunks[0], 2))
obsm.attrs["annotated-data"] = get_annotated_data_map(dimension=0)
obsm.attrs["column-order"] = ["X_umap"]

# obsp
obsp = adgroup.create_group("obsp")
obsp.create_dataset("pairwise_data", data=adata.obsp["pairwise_data"].todense(), chunks=(chunks[0], chunks[0]))
obsp.attrs["annotated-data"] = get_annotated_data_map(dimension=0)
obsp.attrs["column-order"] = ["pairwise_data"]

# var
var = adgroup.create_group("var")
var.create_dataset("row_names", data=np.array(adata.var_names), chunks=(chunks[1],), object_codec=numcodecs.VLenUTF8())
var.attrs["annotated-data"] = get_annotated_data_map(dimension=1)
var.attrs["column-order"] = ["row_names"]
var.attrs["row-names"] = "row_names"

# varm
varm = adgroup.create_group("varm")
varm.create_dataset("gene_stuff", data=adata.varm["gene_stuff"], chunks=(chunks[1], 5))
varm.attrs["annotated-data"] = get_annotated_data_map(dimension=1)
varm.attrs["column-order"] = ["gene_stuff"]

# varm
varp = adgroup.create_group("varp")
varp.attrs["annotated-data"] = get_annotated_data_map(dimension=1)
varp.attrs["column-order"] = []

# uns
uns = adgroup.create_group("uns")
uns.create_dataset("random", data=adata.uns["random"], chunks=(3,))


def get_annotated_data_map(*, dimension):
return [{"array": "/tables/anndata/X", "dimension": str(dimension)},
{"array": "/tables/anndata/layers/log_transformed", "dimension": str(dimension)},
{"array": "/tables/anndata/layers/other_data", "dimension": str(dimension)}]


def write_anndata_suggestion(adata, filename, chunks):
# root and tables
root = zarr.open(filename, mode="w")
root.array("some_image", np.array([0]), chunks=(1,))
tables = root.create_group("tables")
tables.attrs["tables"] = ["/anndata/obs", "/anndata/var"]
adgroup = tables.create_group("anndata")
adgroup.attrs["anndata"] = "0.9.1"
adgroup.attrs["other-metadata"] = "metadata describing how the anndata annotates some_image"

# X and layers are combined into one array, a table is used to name the layers
all_together = np.stack([np.array(adata.X.todense()), np.array(adata.layers["log_transformed"].todense()), adata.layers["other_data"]], axis=2)
X = adgroup.array('X', all_together, chunks=(*chunks,1))
layers = adgroup.create_group("layers")
row_names = np.array(["X", "log_transformed", "other_data"])
layers.create_dataset("row_names", data=row_names, dtype=object, object_codec=numcodecs.VLenUTF8())
layers.attrs["annotated-data"] = [{"array": "/tables/anndata/X", "dimension": "2"}]
layers.attrs["row-names"] = "row_names"

# obs (combines obs, obsm, obsp)
localChunks = (chunks[0],)
obs = adgroup.create_group("obs")
obs.create_dataset("row_names", data=np.array(adata.obs_names), dtype=object, object_codec=numcodecs.VLenUTF8())
obs.create_dataset("cell_type", data=np.array(adata.obs["cell_type"]), chunks=localChunks, object_codec=numcodecs.VLenUTF8())
obs.create_dataset("X_umap", data=adata.obsm["X_umap"], chunks=(chunks[0], 2))
obs.create_dataset("pairwise_data", data=adata.obsp["pairwise_data"].todense(), chunks=(chunks[0], chunks[0]))
obs.attrs["annotated-data"] = [{"array": "/tables/anndata/X", "dimension": "0"}]
obs.attrs["column-order"] = ["row_names", "cell_type", "X_umap", "pairwise_data"]
obs.attrs["row-names"] = "row_names"

# var (combines var, varm, varp)
var = adgroup.create_group("var")
var.create_dataset("row_names", data=np.array(adata.var_names), chunks=(chunks[1],), object_codec=numcodecs.VLenUTF8())
var.create_dataset("gene_stuff", data=adata.varm["gene_stuff"], chunks=(chunks[1], 5))
var.attrs["annotated-data"] = [{"array": "/tables/anndata/X", "dimension": "1"}]
var.attrs["column-order"] = ["row_names", "gene_stuff"]
var.attrs["row-names"] = "row_names"

# uns
uns = adgroup.create_group("uns")
uns.create_dataset("random", data=adata.uns["random"], chunks=(3,))


# generate example and store as zarr using the minimal table spec proposal
n_obs = 10
n_var = 200
chunks = (10, 40)

adata = generate_example(n_obs, n_var)
write_anndata(adata, "example.zarr", chunks)

# store example in an alternative way, exploiting the properties of the suggested minimal table spec a bit more
write_anndata_suggestion(adata, "example_suggestion.zarr", chunks)
Loading