Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data issue: incorrect variant type #1896

Open
rmadupuri opened this issue Aug 23, 2023 · 0 comments
Open

Data issue: incorrect variant type #1896

rmadupuri opened this issue Aug 23, 2023 · 0 comments

Comments

@rmadupuri
Copy link
Collaborator

Variant type must be SNP, INS, DEL, DNP, TNP, MNP, ONP. 3097 cases have no variant types or labeled as NA or UNK. GN JSON contains correct variant types, so reannotation can resolve this.

select cs.cancer_study_identifier, me.*
from mutation_event as me
join mutation as m on me.mutation_event_id = m.mutation_event_id
join genetic_profile as gp on m.genetic_profile_id = gp.genetic_profile_id
join cancer_study as cs on gp.cancer_study_id = cs.cancer_study_id
where variant_type = 'NA' or variant_type = 'UNK';

And below is the count per study:

cancer_study_identifier count
ucec_ccr_msk_2022 1666
pan_origimed_2020 507
mds_iwg_2022 229
mixed_msk_tcga_2021 206
ucec_ccr_cfdna_msk_2022 176
acc_2019 60
rbl_cfdna_msk_2020 46
coad_silu_2022 28
pancan_pcawg_2020 18
pptc_2019 12
paired_bladder_2022 11
sclc_jhu 10
nsclc_tracerx_2017 9
bowel_colitis_msk_2022 7
hcc_inserm_fr_2015 7
prad_p1000 6
nepc_wcm_2016 5
pediatric_dkfz_2017 5
pog570_bcgsc_2020 5
rectal_msk_2022 5
skcm_broad_brafresist_2012 5
nhl_bcgsc_2013 4
paad_cptac_2021 4
prad_su2c_2019 4
prostate_pcbm_swiss_2019 4
skcm_broad 4
utuc_msk_2019 3
utuc_mskcc_2015 3
cervix_msk_2023 2
cll_iuopa_2015 2
crc_eo_2020 2
mbl_dkfz_2017 2
mtnn_msk_2022 2
npc_nusingapore 2
nsclc_mskcc_2018 2
prad_cpcg_2017 2
sarcoma_mskcc_2022 2
skcm_dfci_2015 2
tmb_mskcc_2018 2
ucs_jhu_2014 2
utuc_cornell_baylor_mdacc_2019 2
utuc_pdx_msk_2019 2
brca_bccrc 1
hcc_jcopo_msk_2023 1
mds_mskcc_2020 1
nbl_msk_2023 1
nccrcc_genentech_2014 1
nsclc_ctdx_msk_2022 1
nsclc_tcga_broad_2016 1
odg_msk_2017 1
paad_icgc 1
paad_qcmg_uq_2016 1
prad_broad 1
prad_fhcrc 1
prad_pik3r1_msk_2021 1
prostate_dkfz_2018 1
sarcoma_msk_2023 1
sclc_ucologne_2015 1
stad_oncosg_2018 1
stad_pfizer_uhongkong 1
stad_tcga_pub 1
ucec_cptac_2020 1
@rmadupuri rmadupuri changed the title Variant Type data issue Data issue: incorrect variant type Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant