Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cload bug #234

Closed
nservant opened this issue Jan 26, 2021 · 4 comments
Closed

cload bug #234

nservant opened this issue Jan 26, 2021 · 4 comments

Comments

@nservant
Copy link

Hi,
I'm trying to build cool files from yeast Hi-C data, but it seems that something goes wrong with the chromosome names.
Output cool file is empty because it fails finding the chromosome name in the list of valid pairs.
The same code works without any issue on Human/Mouse data.
Any advice to fix that ?
Many thanks
Nicolas

head SRR4292758_00_20000.bed
gi|696449538|gb|JRIU01000001.1|	0	9053
gi|696449534|gb|JRIU01000002.1|	0	20000
gi|696449534|gb|JRIU01000002.1|	20000	40000
gi|696449534|gb|JRIU01000002.1|	40000	60000
gi|696449534|gb|JRIU01000002.1|	60000	80000
gi|696449534|gb|JRIU01000002.1|	80000	100000
gi|696449534|gb|JRIU01000002.1|	100000	114563
gi|696449531|gb|JRIU01000003.1|	0	20000
gi|696449531|gb|JRIU01000003.1|	20000	40000
gi|696449531|gb|JRIU01000003.1|	40000	60000
zcat contacts.sorted.txt.gz | head
gi|696447993|gb|JRIU01000414.1|	496	1	gi|696447993|gb|JRIU01000414.1|	2084	16	1
gi|696447993|gb|JRIU01000414.1|	510	1	gi|696447993|gb|JRIU01000414.1|	2118	16	1
gi|696447993|gb|JRIU01000414.1|	535	1	gi|696447993|gb|JRIU01000414.1|	2043	16	1
gi|696447993|gb|JRIU01000414.1|	536	1	gi|696447993|gb|JRIU01000414.1|	2068	16	1
gi|696447993|gb|JRIU01000414.1|	546	1	gi|696447993|gb|JRIU01000414.1|	2083	16	1
gi|696447993|gb|JRIU01000414.1|	561	1	gi|696447993|gb|JRIU01000414.1|	2050	16	1
gi|696447993|gb|JRIU01000414.1|	563	1	gi|696447993|gb|JRIU01000414.1|	2074	16	1
gi|696448040|gb|JRIU01000400.1|	5855	1	gi|696448022|gb|JRIU01000406.1|	4903	1	1
gi|696448040|gb|JRIU01000400.1|	5889	1	gi|696448022|gb|JRIU01000406.1|	4884	1	1
gi|696448040|gb|JRIU01000400.1|	5892	1	gi|696448022|gb|JRIU01000406.1|	4859	1	1
cooler cload pairix --nproc 2 SRR4292758_00_20000.bed contacts.sorted.txt.gz SRR4292758_00_20000.cool

WARNING:py.warnings:/home/nservant/.local/lib/python3.7/site-packages/cooler/create/_ingest.py:834: UserWarning: Did not find contig  'gi|696448178|gb|JRIU01000363.1|' in contact list file.
  "Did not find contig " + " '{}' in contact list file.".format(chrom)
@nvictus
Copy link
Member

nvictus commented Feb 1, 2021

It's a pairix issue as | is used internally as a separator character, which conflicts with the fasta names.

There is a way to override the separator character with something else when creating the pairix index. You can try recreating the index with pairix -f -s1 -d4 -b2 -e2 -u5 -v5 -w {some_other_char} contacts.sorted.txt.gz.

Btw, you don't need to use pairix. cooler cload pairs (not pairix) can ingest even unsorted pairs data in two passes via mergesort.

If you try the first solution, let me know if it works.

@nservant
Copy link
Author

nservant commented Feb 1, 2021

Hi @nvictus

Unfortunatly, the first solution did not work. I still have the same errors.
Thanks for the tips about cooler pairs instead of pairix.
So finally, is there any reason to still use cooler csort rather than directly ingest pairs file ?

btw, it seems that --one-based-ids does not work on the last version, but I may open a new issue for that.
Thanks

@nvictus
Copy link
Member

nvictus commented Feb 1, 2021

Unfortunatly, the first solution did not work

Yes, I just noticed that we hard code querying using | inside cooler, so that should be fixed to figure out what separator the index is using.

Thanks for the tips about cooler pairs instead of pairix.

Yeah, I'm realizing that it hasn't been advertised well, but it's been around for quite a while!

is there any reason to still use cooler csort rather than directly ingest pairs file ?

Unless you want the benefits of the pairix index or you are having performance issues with the two-pass method, no. :)

The two-pass method will create a bunch of temporary partial coolers and then merge them.

--one-based-ids does not work on the last version, but I may open a new issue for that

Thank you!

@nvictus
Copy link
Member

nvictus commented Mar 8, 2024

Alternative separators will be supported in the next minor version with #398, so I will finally close this.

@nvictus nvictus closed this as completed Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants