GitHub - brain-bican/BICAN_bmark

Summary

Here we present benchmark datasets utilized in the BICAN Mapping Task Force to evaluation cell type annotation methods. This repository extends previous work by the Allen Institute on a bmark GitHub repo.

In particular we:

Provide a reproducible and transparent generation of benchmark datasets covering diverse cell type annotation scenarios.
Provide an easy to understand summary through data cards for each benchmark dataset,
Define a supervised classification task and performance metrics to evaluate cell type annotation methods.
Report evaluation for each cell type annotation method in a way that permits a fair comparisons across methods.

This effort attempts to formulate a decentralized approach to continuous benchmarking.

Individual computational groups are only responsible for tuning and submitting results for their own methods.
New methods can be added at any time, but results can be compared in a fair manner to previously submitted methods.
All reporting (data + model cards) is intended to be succinct yet accessible to a non-expert.

We hope that this benchmarking format can provide the basis for informed decision making regarding choice of cell type annotation methods for specific tasks.

BICAN Mapping Task Force benchmark datasets

Priority	Data set	Characteristics	Status	Download
1.1	HMBA Basal Ganglia (Macaque)	Donor effects, multi-species	Ready	S3 Link
1.2	HMBA Basal Ganglia (Human)	Donor effects	Ready	[S3 Link]
1.3	Siletti el al. Human Brain	Donor effects	Ready	[S3 Link]

Benchmark format and expectations

The BICAN benchmark files extend the Allen Inistute Taxonomy schema by including a benchmark key in the uns of each anndata file which specifies train and validation splits. We have prespecificed a 10-fold cross-validation splits to compare mapping methods on the various benchmark tasks. Users can access the k-fold indices as follows:

benchamrk_anndata.uns.benchmark.k_fold["fold_1"].train_ind
benchamrk_anndata.uns.benchmark.k_fold["fold_1"].val_ind

It is expected that the mapping method train only on samples in train_ind then validate on samples specified in val_ind. Mapping method results should conform to the follow standard:

cell_id	[Annotation_level]_[MAPPING_METHOD_NAME]_label	[Annotation_level]_[MAPPING_METHOD_NAME]_score
adata.obs.index[`val_ind`]	Annotation name from taxonomy	Numeric indicating confidence of label assignment ranging from 0-1

Contributors

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bmark/utils		bmark/utils
data_cards		data_cards
notebooks		notebooks
results		results
.DS_Store		.DS_Store
HMBA_BG_Macaque_mapping.R		HMBA_BG_Macaque_mapping.R
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

BICAN Mapping Task Force benchmark datasets

Benchmark format and expectations

Contributors

About

Releases

Packages

Languages

brain-bican/BICAN_bmark

Folders and files

Latest commit

History

Repository files navigation

Summary

BICAN Mapping Task Force benchmark datasets

Benchmark format and expectations

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages