Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Cannot run example data from novae: #5

Open
kenxie7 opened this issue Sep 21, 2024 · 1 comment
Open

[Bug] Cannot run example data from novae: #5

kenxie7 opened this issue Sep 21, 2024 · 1 comment

Comments

@kenxie7
Copy link

kenxie7 commented Sep 21, 2024

Description

I tried to run the example code from the tutorial on the given datasets, however it didn't seem to run and encounter the following error when running compute_representation.

Code

import novae
model = novae.Novae.from_pretrained("novae-mouse-0")

model

Loading weights from local directory
Novae model
├── Known genes: 60697
├── Parameters: 32.0M
└── Model name: novae-mouse-0

# Option 1: zero-shot
adata = novae.utils.load_dataset(tissue="brain", species="mouse", pattern=".*5_7.*")[0]
model.compute_representations(adata, zero_shot=True,)

ERROR:

KeyError: 'count'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
Cell In[36], line 2
1 # Option 1: zero-shot
----> 2 model.compute_representations(adata, zero_shot=True,)# accelerator='cuda', num_workers = 20) #slide_key='sample',
4 # Option 2: fine-tuning
5 #model.fine_tune(adata)
6 #model.compute_representations(adata)

File ~/miniforge-pypy3/envs/py39_gpu_mrvi/lib/python3.9/site-packages/novae/utils/_utils.py:76, in requires_fit..wrapper(model, *args, **kwargs)
73 @wraps(f)
74 def wrapper(model, *args, **kwargs):
75 assert model.mode.trained, "Novae must be trained first, so consider running model.fit()"
---> 76 return f(model, *args, **kwargs)

File ~/miniforge-pypy3/envs/py39_gpu_mrvi/lib/python3.9/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)

File ~/miniforge-pypy3/envs/py39_gpu_mrvi/lib/python3.9/site-packages/novae/model.py:336, in Novae.compute_representations(self, adata, slide_key, zero_shot, accelerator, num_workers)
334 adatas = self._prepare_adatas(adata, slide_key=slide_key)
335 for adata in adatas:
--> 336 datamodule = self._init_datamodule(adata)
337 self._compute_representations_datamodule(adata, datamodule)
339 if self.mode.zero_shot:

File ~/miniforge-pypy3/envs/py39_gpu_mrvi/lib/python3.9/site-packages/novae/model.py:175, in Novae._init_datamodule(self, adata, sample_cells, **kwargs)
172 def _init_datamodule(
173 self, adata: AnnData | list[AnnData] | None = None, sample_cells: int | None = None, **kwargs: int
174 ):
--> 175 return NovaeDatamodule(
176 self._to_anndata_list(adata),
177 cell_embedder=self.cell_embedder,
178 batch_size=self.hparams.batch_size,
179 n_hops_local=self.hparams.n_hops_local,
180 n_hops_view=self.hparams.n_hops_view,
181 num_workers=self._num_workers,
182 sample_cells=sample_cells,
183 **kwargs,
184 )

File ~/miniforge-pypy3/envs/py39_gpu_mrvi/lib/python3.9/site-packages/novae/data/datamodule.py:27, in NovaeDatamodule.init(self, adatas, cell_embedder, batch_size, n_hops_local, n_hops_view, num_workers, sample_cells)
16 def init(
17 self,
18 adatas: list[AnnData],
(...)
24 sample_cells: int | None = None,
25 ) -> None:
26 super().init()
---> 27 self.dataset = NovaeDataset(
28 adatas,
29 cell_embedder=cell_embedder,
30 batch_size=batch_size,
31 n_hops_local=n_hops_local,
32 n_hops_view=n_hops_view,
33 sample_cells=sample_cells,
34 )
35 self.batch_size = batch_size
36 self.num_workers = num_workers

File ~/miniforge-pypy3/envs/py39_gpu_mrvi/lib/python3.9/site-packages/novae/data/dataset.py:69, in NovaeDataset.init(self, adatas, cell_embedder, batch_size, n_hops_local, n_hops_view, sample_cells)
66 self.single_adata = len(self.adatas) == 1
67 self.single_slide_mode = self.single_adata and len(np.unique(self.adatas[0].obs[Keys.SLIDE_ID])) == 1
---> 69 self._init_dataset()

File ~/miniforge-pypy3/envs/py39_gpu_mrvi/lib/python3.9/site-packages/novae/data/dataset.py:94, in NovaeDataset._init_dataset(self)
90 if self.single_adata:
91 self.obs_ilocs = np.array([(0, obs_index) for obs_index in self.valid_indices[0]])
93 self.slides_metadata: pd.DataFrame = pd.concat(
---> 94 [
95 self._adata_slides_metadata(adata_index, obs_indices)
96 for adata_index, obs_indices in enumerate(self.valid_indices)
97 ],
98 axis=0,
99 )
101 self.shuffle_obs_ilocs()

File ~/miniforge-pypy3/envs/py39_gpu_mrvi/lib/python3.9/site-packages/novae/data/dataset.py:95, in (.0)
90 if self.single_adata:
91 self.obs_ilocs = np.array([(0, obs_index) for obs_index in self.valid_indices[0]])
93 self.slides_metadata: pd.DataFrame = pd.concat(
94 [
---> 95 self._adata_slides_metadata(adata_index, obs_indices)
96 for adata_index, obs_indices in enumerate(self.valid_indices)
97 ],
98 axis=0,
99 )
101 self.shuffle_obs_ilocs()

File ~/miniforge-pypy3/envs/py39_gpu_mrvi/lib/python3.9/site-packages/novae/data/dataset.py:203, in NovaeDataset._adata_slides_metadata(self, adata_index, obs_indices)
201 slides_metadata = obs_counts.to_frame()
202 slides_metadata[Keys.ADATA_INDEX] = adata_index
--> 203 slides_metadata[Keys.N_BATCHES] = (slides_metadata["count"] // self.batch_size).clip(1)
204 return slides_metadata

File ~/miniforge-pypy3/envs/py39_gpu_mrvi/lib/python3.9/site-packages/pandas/core/frame.py:3807, in DataFrame.getitem(self, key)
3805 if self.columns.nlevels > 1:
3806 return self._getitem_multilevel(key)
-> 3807 indexer = self.columns.get_loc(key)
3808 if is_integer(indexer):
3809 indexer = [indexer]

File ~/miniforge-pypy3/envs/py39_gpu_mrvi/lib/python3.9/site-packages/pandas/core/indexes/base.py:3804, in Index.get_loc(self, key, method, tolerance)
3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
-> 3804 raise KeyError(key) from err
3805 except TypeError:
3806 # If we have a listlike key, _check_indexing_error will raise
3807 # InvalidIndexError. Otherwise we fall through and re-raise
3808 # the TypeError.
3809 self._check_indexing_error(key)

KeyError: 'count'

@quentinblampey
Copy link
Contributor

quentinblampey commented Oct 21, 2024

Hi @kenxie7, thanks for reporting!

I'm really sorry for the delay, it seems I didn't receive a notification for this issue...

According to your code, you provide only one AnnData object, yet Novae tries to be run in "multi-adata" mode, which is unexpected in that case.

Unfortunately, I was not able to reproduce the issue. Have you run other lines of code apart from this? What happens if you start a new script and run the commands below?

import novae

adata = novae.utils.load_dataset(tissue="brain", species="mouse", pattern=".*5_7.*")[0]

model = novae.Novae.from_pretrained("novae-mouse-0")

model.compute_representations(adata, zero_shot=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants