generated from openproblems-bio/task_template
-
Notifications
You must be signed in to change notification settings - Fork 1
/
_viash.yaml
92 lines (83 loc) · 4.14 KB
/
_viash.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
viash_version: 0.9.0
name: task_batch_integration
organization: openproblems-bio
version: dev
license: MIT
keywords: [ "batch integration", "scRNA-seq" ]
links:
issue_tracker: https://github.com/openproblems-bio/task_batch_integration/issues
repository: https://github.com/openproblems-bio/task_batch_integration
docker_registry: ghcr.io
label: Batch Integration
summary: Remove unwanted batch effects from scRNA-seq data while retaining biologically meaningful variation.
description: |
As single-cell technologies advance, single-cell datasets are growing both in size and complexity.
Especially in consortia such as the Human Cell Atlas, individual studies combine data from multiple labs, each sequencing multiple individuals possibly with different technologies.
This gives rise to complex batch effects in the data that must be computationally removed to perform a joint analysis.
These batch integration methods must remove the batch effect while not removing relevant biological information.
Currently, over 200 tools exist that aim to remove batch effects scRNA-seq datasets [@zappia2018exploring].
These methods balance the removal of batch effects with the conservation of nuanced biological information in different ways.
This abundance of tools has complicated batch integration method choice, leading to several benchmarks on this topic [@luecken2020benchmarking; @tran2020benchmark; @chazarragil2021flexible; @mereu2020benchmarking].
Yet, benchmarks use different metrics, method implementations and datasets. Here we build a living benchmarking task for batch integration methods with the vision of improving the consistency of method evaluation.
In this task we evaluate batch integration methods on their ability to remove batch effects in the data while conserving variation attributed to biological effects.
As input, methods require either normalised or unnormalised data with multiple batches and consistent cell type labels.
The batch integrated output can be a feature matrix, a low dimensional embedding and/or a neighbourhood graph.
The respective batch-integrated representation is then evaluated using sets of metrics that capture how well batch effects are removed and whether biological variance is conserved.
We have based this particular task on the latest, and most extensive benchmark of single-cell data integration methods.
references:
doi:
# Luecken, M.D., Büttner, M., Chaichoompu, K. et al.
# Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2022).
- 10.1038/s41592-021-01336-8
info:
image: thumbnail.svg
test_resources:
- type: s3
path: s3://openproblems-data/resources_test/common/cxg_immune_cell_atlas/
dest: resources_test/common/cxg_immune_cell_atlas
- type: s3
path: s3://openproblems-data/resources_test/task_batch_integration/
dest: resources_test/task_batch_integration
authors:
- name: Michaela Mueller
roles: [ maintainer, author ]
info:
github: mumichae
orcid: 0000-0002-1401-1785
- name: Malte Luecken
roles: [ author ]
info:
github: LuckyMD
orcid: 0000-0001-7464-7921
- name: Daniel Strobl
roles: [ author ]
info:
github: danielStrobl
orcid: 0000-0002-5516-7057
- name: Robrecht Cannoodt
roles: [ contributor ]
info:
github: rcannood
orcid: "0000-0003-3641-729X"
- name: "Scott Gigante"
roles: [ contributor ]
info:
github: scottgigante
orcid: "0000-0002-4544-2764"
- name: Kai Waldrant
roles: [ contributor ]
info:
github: KaiWaldrant
orcid: "0009-0003-8555-1361"
- name: Nartin Kim
roles: [ contributor ]
info:
github: martinkim0
orcid: "0009-0003-8555-1361"
config_mods: |
.runners[.type == "nextflow"].config.labels := { lowmem : "memory = 20.Gb", midmem : "memory = 50.Gb", highmem : "memory = 100.Gb", lowcpu : "cpus = 5", midcpu : "cpus = 15", highcpu : "cpus = 30", lowtime : "time = 1.h", midtime : "time = 4.h", hightime : "time = 8.h", veryhightime : "time = 24.h" }
repositories:
- name: openproblems
type: github
repo: openproblems-bio/openproblems
tag: build/main