This is a project to convert the metadata schema from Human Cell Atlas Data Platform (formelrly known as DCP, Data Coordination Platform) to Human Cell Atlas Tier 1.
We first convert the DCP metadata spreadsheet to an intermediate flat csv file (in flat_dcp) and then we convert the flat_dcp file to tier1 based on the mapping specified in the metadata_dict.py.
To convert metadata, run the two notebooks in this sequence:
- flatten_dcp_metadata.ipynb to create a flatten csv version of the dcp_spreadsheet on the cell_suspension/ library level
- flat_dcp_to_tier1.ipynb to convert the dcp field from the flatten csv file, to the Tier 1 metadata fields (based on the mapping of fields specified on metadata_dict.py), and produce an excel file with the
_Tier1.xlsx
extension, and two csv files with the_tier1_uns.csv
and_tier1_obs.csv
Please specify the file_name
of the dcp_spreadsheet found in the dcp_spreadsheets folder, in both notebooks.
The packages needed for these notebooks are listed in the requirements.txt file. To install via pip use:
pip install -r requirements.txt
- flatten_dcp_metadata.ipynb
- Tested only on simple experimental design (
Donor organism
->Specimen from organism
/Sample
->Cell suspension
/Library
->Analysis File
&Sequence file
) - No support for "Spatial transcriptomics" data
- Tested only on simple experimental design (
- flat_dcp_to_tier1.ipynb
- Will not populate Tier 1 fields at the cell level (cell type related fields)
- Some automations that map conditionally DCP values to Tier 1, are not yet implemented
institute
sample_collection_relative_time_point
cell_enrichment
is not ontologisedsample_collection_year
is not generalised to year