DCP spreadsheet to Tier 1 metadata

This is a project to convert the metadata schema from Human Cell Atlas Data Platform (formelrly known as DCP, Data Coordination Platform) to Human Cell Atlas Tier 1.

We first convert the DCP metadata spreadsheet to an intermediate flat csv file (in flat_dcp) and then we convert the flat_dcp file to tier1 based on the mapping specified in the metadata_dict.py.

Usage

To convert metadata, run the two notebooks in this sequence:

flatten_dcp_metadata.ipynb to create a flatten csv version of the dcp_spreadsheet on the cell_suspension/ library level
flat_dcp_to_tier1.ipynb to convert the dcp field from the flatten csv file, to the Tier 1 metadata fields (based on the mapping of fields specified on metadata_dict.py), and produce an excel file with the _Tier1.xlsx extension, and two csv files with the _tier1_uns.csv and _tier1_obs.csv

Please specify the file_name of the dcp_spreadsheet found in the dcp_spreadsheets folder, in both notebooks.

Requirements

The packages needed for these notebooks are listed in the requirements.txt file. To install via pip use:

pip install -r requirements.txt

Known limitations

flatten_dcp_metadata.ipynb
- Tested only on simple experimental design (Donor organism -> Specimen from organism/ Sample -> Cell suspension/ Library -> Analysis File & Sequence file)
- No support for "Spatial transcriptomics" data
flat_dcp_to_tier1.ipynb
- Will not populate Tier 1 fields at the cell level (cell type related fields)
- Some automations that map conditionally DCP values to Tier 1, are not yet implemented
  - institute
  - sample_collection_relative_time_point
  - cell_enrichment is not ontologised
  - sample_collection_year is not generalised to year

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
dcp_spreadsheets		dcp_spreadsheets
flat_dcp		flat_dcp
tier1_output		tier1_output
.gitignore		.gitignore
HCA_Tier 1_ Technical Metadata template_v0.1.xlsx		HCA_Tier 1_ Technical Metadata template_v0.1.xlsx
README.md		README.md
filled_values.csv		filled_values.csv
flat_dcp_to_tier1.ipynb		flat_dcp_to_tier1.ipynb
flatten_dcp_metadata.ipynb		flatten_dcp_metadata.ipynb
metadata_dict.py		metadata_dict.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DCP spreadsheet to Tier 1 metadata

Usage

Requirements

Known limitations

About

Releases

Packages

Languages

arschat/dcp_to_tier1

Folders and files

Latest commit

History

Repository files navigation

DCP spreadsheet to Tier 1 metadata

Usage

Requirements

Known limitations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages