Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge_clusters error: max() arg is an empty sequence #4

Open
omarOMF opened this issue May 6, 2024 · 4 comments
Open

merge_clusters error: max() arg is an empty sequence #4

omarOMF opened this issue May 6, 2024 · 4 comments

Comments

@omarOMF
Copy link

omarOMF commented May 6, 2024

Hi,
Thank you for developing this helpful package! I am currently trying to optimize the number of clusters for a cell population using your package, but I've encountered an error that I'm having trouble interpreting. Here's the sequence of functions I ran:

cc.tl.get_markers(adata, 'leiden_05')
cc.tl.code_enrich(adata, 'leiden_05')
cc.pl.enrich_heatmap(adata, 'leiden_05')

image

However, I encounter the following error when I attempt to merge clusters. Could you please help me understand what might be going wrong? Here's the error message I received:

cc.tl.merge_clusters(adata, 'leiden_05')

Initial merge.
WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'
/home/FCAM/xxx/miniconda3/envs/scv/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:420: RuntimeWarning: invalid value encountered in log2
self.stats[group_name, 'logfoldchanges'] = np.log2(
/home/FCAM/xxx/miniconda3/envs/scv/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:420: RuntimeWarning: invalid value encountered in log2
self.stats[group_name, 'logfoldchanges'] = np.log2(
/home/FCAM/xxx/miniconda3/envs/scv/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:420: RuntimeWarning: invalid value encountered in log2
self.stats[group_name, 'logfoldchanges'] = np.log2(
/home/FCAM/xxx/miniconda3/envs/scv/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:420: RuntimeWarning: invalid value encountered in log2
self.stats[group_name, 'logfoldchanges'] = np.log2(
/home/FCAM/xxx/miniconda3/envs/scv/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:420: RuntimeWarning: invalid value encountered in log2
self.stats[group_name, 'logfoldchanges'] = np.log2(
/home/FCAM/xxx/miniconda3/envs/scv/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:420: RuntimeWarning: invalid value encountered in log2
self.stats[group_name, 'logfoldchanges'] = np.log2(
WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'
/home/FCAM/xxx/miniconda3/envs/scv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3474: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/FCAM/xxx/miniconda3/envs/scv/lib/python3.9/site-packages/numpy/core/_methods.py:181: RuntimeWarning: invalid value encountered in true_divide
ret = um.true_divide(
/home/FCAM/xxxx/miniconda3/envs/scv/lib/python3.9/site-packages/scanpy/preprocessing/_utils.py:11: RuntimeWarning: Mean of empty slice.
mean_sq = np.multiply(X, X).mean(axis=axis, dtype=np.float64)

ValueError Traceback (most recent call last)
Cell In[15], line 1
----> 1 cc.tl.merge_clusters(adata, 'leiden_05')

File ~/miniconda3/envs/scv/lib/python3.9/site-packages/cytocipher/score_and_merge/cluster_merge.py:580, in merge_clusters(data, groupby, var_groups, n_top_genes, t_cutoff, marker_padj_cutoff, gene_order, min_de, enrich_method, p_cut, max_iter, mnn_frac_cutoff, k, random_state, n_cpus, score_group_method, p_adjust, p_adjust_method, squash_exception, verbose)
574 get_markers(data, f'{groupby}_merged', n_top=n_top_genes, verbose=False,
575 var_groups=var_groups, t_cutoff=t_cutoff,
576 padj_cutoff=marker_padj_cutoff,
577 gene_order=gene_order, min_de=min_de)
579 # Running the enrichment scoring #
--> 580 run_enrich(data, f'{groupby}_merged', enrich_method, n_cpus,
581 squash_exception=squash_exception)
583 if verbose:
584 print(f"Added data.obs[f'{groupby}_merged']")

File ~/miniconda3/envs/scv/lib/python3.9/site-packages/cytocipher/score_and_merge/cluster_merge.py:419, in run_enrich(data, groupby, enrich_method, n_cpus, squash_exception)
415 raise Exception(
416 f"Got enrich_method={enrich_method}; expected one of : {enrich_options}")
418 if enrich_method == 'code':
--> 419 code_enrich(data, groupby, n_cpus=n_cpus, verbose=False,
420 squash_exception=squash_exception)
421 elif enrich_method == 'coexpr':
422 coexpr_enrich(data, groupby, n_cpus=n_cpus, verbose=False)

File ~/miniconda3/envs/scv/lib/python3.9/site-packages/cytocipher/score_and_merge/cluster_score.py:593, in code_enrich(data, groupby, cluster_marker_key, n_cpus, min_counts, squash_exception, verbose)
590 [all_genes.extend(cluster_genes_dict[cluster])
591 for cluster in cluster_genes_dict]
592 # Getting correct typing
--> 593 str_dtype = f"<U{max([len(gene_name) for gene_name in all_genes])}"
594 all_genes = np.unique( all_genes ).astype(str_dtype)
596 str_dtype_clust = f"<U{max([len(clust) for clust in cluster_genes_dict])}"

ValueError: max() arg is an empty sequence

@BradBalderson
Copy link
Owner

Hey Omar,

If you check adata.uns['leiden_05_markers'], one or more of the cluster will have no marker genes, which I think is causing this issue.

Perhaps try recalling the markers, using:
cc.tl.get_markers(adata, 'leiden_05', min_markers=1)

Hopefully there is this option, otherwise I may need to update the pypi package!

@omarOMF
Copy link
Author

omarOMF commented May 7, 2024

Thank you for your response.
I tried cc.tl.get_markers(adata, 'leiden_05', min_de=1) and still running into the same error.
I didn't find min_markers in the function unless I am using a different version. This is what get
` cc.tl.get_markers?

Signature:
cc.tl.get_markers(
    data: anndata._core.anndata.AnnData,
    groupby: str,
    var_groups: str = None,
    logfc_cutoff: float = 0,
    padj_cutoff: float = 0.05,
    t_cutoff: float = 3,
    n_top: int = 5,
    rerun_de: bool = True,
    gene_order=None,
    pts: bool = False,
    min_de: int = 0,
    verbose: bool = True,
)
Docstring:
Gets marker genes per cluster.

Parameters
    ----------
    data: sc.AnnData
        Single cell RNA-seq anndata, QC'd a preprocessed to log-cpm in
                                                                     data.X.
    groupby: str
        Specifies the clusters to perform one-versus-rest Welch's t-test
        comparison of genes for.
        Must specify defined column in data.obs[groupby].
        Must be categorical type.
    var_groups: str
        Specifies a column in data.var of type boolean, with True indicating
        the candidate genes to use when determining marker genes per cluster.
        Useful to, for example, remove ribosomal and mitochondrial genes.
        None indicates use all genes in data.var_names as candidates.
    logfc_cutoff: float
        Minimum logfc for a gene to be a considered a marker gene for a
        given cluster.
    marker_padj_cutoff: float
        Adjusted p-value (Benjamini-Hochberg correction) below which a gene
        can be considered a marker gene.
    t_cutoff: float
        The minimum t-value a gene must have to be considered a marker gene
        (Welch's t-statistic with one-versus-rest comparison).
    n_top: int
        The maximimum no. of marker genes per cluster.
    rerun_de: bool
        Whether to rerun the DE analysis, or using existing results in
        data.uns['rank_genes_groups']. Useful if have ran get_markers()
        with the same 'groupby' as input, but want to adjust the other
        parameters to determine marker genes.
    gene_order: str
        By default, gets n_top qualifying genes ranked by t-value.
        Specifying logfc here will rank by log-FC, instead.
    pts: bool
        Whether to calculate percentage cells expressing gene within/without
        of each cluster. Only relevant if rerun_de=True.
    min_de: int
        Minimum number of genes to use as markers, if not criteria met.
    verbose: bool
        Print statements during computation (True) or silent run (False).
    Returns
    --------
        data.uns[f'{groupby}_markers']
            Dictionary with cluster names as keys, and list of marker
            genes as values.
File:      ~/miniconda3/envs/scv/lib/python3.9/site-packages/cytocipher/score_and_merge/cluster_score.py
Type:      function
```

@omarOMF
Copy link
Author

omarOMF commented May 7, 2024

Also I can see there are markers selected per cluster.

caps.uns['leiden_05_markers']
{'0': array(['MT2A', 'TMSB10', 'CLDN5', 'NEAT1', 'IFITM3', 'MT1E'], dtype=object),
 '1': array(['AC083867.2', 'AC138647.2', 'AL359380.1', 'KCND3-AS1',
        'AC012555.2', 'GNG14'], dtype=object),
 '2': array(['GPCPD1', 'SPOCK3', 'SLCO1A2', 'SLC39A10', 'ABCG2', 'CA4'],
       dtype=object),
 '3': array(['OMD', 'TMEM45B', 'ATP10A', 'SLC39A10', 'ABCB1', 'THSD4'],
       dtype=object),
 '4': array(['ANO2', 'GALNT18', 'VWF', 'SCARB1', 'TNS1', 'HIF1A-AS3'],
       dtype=object),
 '5': array(['FAM43A', 'AC005083.1', 'AL928596.1', 'AC091078.2', 'BX664730.1',
        'AC083867.2'], dtype=object)}

@BradBalderson
Copy link
Owner

Hmm cluster markers dict looks correct, and I am looking at the source code and cannot see how the code would be throwing that error, given that there are genes in that dictionary.

Looks like the anndata object is called 'caps' in the example above? Only way I can see it would throw that error is if the marker dictionary was empty when running cc.tl.code_enrich(adata, 'leiden_05')

Are you running:

cc.tl.code_enrich(caps, 'leiden_05')

If you still get the same error, could you try running this snippet and let me know what it looks like?

    cluster_genes_dict = caps.uns['leiden_05_markers']

    # Putting all genes into array for speed.
    all_genes = []
    [all_genes.extend(cluster_genes_dict[cluster])
                                              for cluster in cluster_genes_dict]
    # Getting correct typing
    str_dtype = f"<U{max([len(gene_name) for gene_name in all_genes])}"
    all_genes = np.unique( all_genes ).astype(str_dtype)

    str_dtype_clust = f"<U{max([len(clust) for clust in cluster_genes_dict])}"

In particular, could you send the result of all_genes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants