You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
A part of this bug has already been reported here, I have the same issue on cooltools 0.5.4 version.
Also while writing this issue, I realized I had maybe too many questions for one post, sorry about that.
I have some cool files I want to downsample to a given cis contact count for all my replicates and different conditions to have the same number of cis contacts, so I can compare them (first of all, am I right on the usage of downsampling as a way to normalize sample coverage between conditions?)
cooltools.sample(clr = some_cool_file, out_clr_path = dnsmpl_cool_file, cis_count = 250000000)
Traceback (most recent call last):
File "/fs/gpfs06/lv05/fileset02/pool/pool-totipotency-software/hpcl67/cooltools/cooltools_0.5.4_venv/lib/python3.9/site-packages/cooltools/api/coverage.py", line 117, in coverage
else clr._load_attrs(clr.root.rstrip("/") + "/bins/weight")["ignore_diags"]
File "/fs/gpfs06/lv05/fileset02/pool/pool-totipotency-software/hpcl67/cooltools/cooltools_0.5.4_venv/lib/python3.9/site-packages/cooler/api.py", line 116, in _load_attrs
return dict(grp[path].attrs)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/fs/gpfs06/lv05/fileset02/pool/pool-totipotency-software/hpcl67/cooltools/cooltools_0.5.4_venv/lib/python3.9/site-packages/h5py/_hl/group.py", line 357, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'weight' doesn't exist)"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/fs/gpfs06/lv05/fileset02/pool/pool-totipotency-software/hpcl67/cooltools/cooltools_0.5.4_venv/lib/python3.9/site-packages/cooltools/api/sample.py", line 102, in sample
cis_total = clr.info.get("cis", np.sum(coverage(clr)[0] // 2, dtype=int))
File "/fs/gpfs06/lv05/fileset02/pool/pool-totipotency-software/hpcl67/cooltools/cooltools_0.5.4_venv/lib/python3.9/site-packages/cooltools/api/coverage.py", line 120, in coverage
raise ValueError(
ValueError: Please, specify ignore_diags and/or IC balance this cooler! Cannot access the value used in IC balancing.
- Is it possible to sample contact on raw matrices? What is the point of doing downsampling after balancing?
Also
ValueError: Please, specify ignore_diags and/or IC balance this cooler! Cannot access the value used in IC balancing. '
But ignore_diags is not an argument from sample()
cooltools.sample(clr = cooler_list[ndx], out_clr_path = dnsmpl_cool_file, cis_count = cis_contact_trgt, ignore_diags=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sample() got an unexpected keyword argument 'ignore_diags'
- To bypass the balancing in sample(), can I duplicate 'cov_cis_raw' columns in .bins() and call it 'weight' so it fakes balance cool file and downsample on raw cis contacts? And then I'll need to balance it anyway after the downsampling.
- Usually, is downsampling done on full matrix or do you ignore first(s) diagonal for some reason?
- additional question:
store (bool, optional) – If True, store the results in the input cooler file when finished.
Does it mean that the result is stored in the python variable or directly modified in the cool file? I am not sure to understand whether the matrices are loaded and only loaded version is modified or if the original file it-self is modified.
ddepierre
changed the title
Downsampling and cis counts - ValueError: The number of contacts in a sample cannot exceed that in the original dataset.
Downsampling and cis counts
Aug 8, 2023
These are good questions, although there is a lot to unpack here. I think the most important idea I can give you is to calculate coverage with ignore_diags=0, I think it should solve all your problems...
The values are indeed stored in the file, and .sample() uses the cis count when it is available in the file. When not, it calculates the coverage with default arguments (! perhaps not ideal and should be changed to ignore_diags=0 !) and uses that.
discussion 2023/10/9:
perhaps we should deprecate the usage of stored cis counts in sample(), as it is not obvious how to link this stored value with the number of diagonals ignored in previous coverage() calculation.
Hi,
A part of this bug has already been reported here, I have the same issue on cooltools 0.5.4 version.
Also while writing this issue, I realized I had maybe too many questions for one post, sorry about that.
I have some cool files I want to downsample to a given cis contact count for all my replicates and different conditions to have the same number of cis contacts, so I can compare them (first of all, am I right on the usage of downsampling as a way to normalize sample coverage between conditions?)
1/ get the cis contact count
With cooltools.coverage() or with expected_cis()
Counts doubled in cov_cis_raw as already reported here
2 / Downsampling
I have unbalanced cool file as
That I want to downsample:
- Is it possible to sample contact on raw matrices? What is the point of doing downsampling after balancing?
Also
But ignore_diags is not an argument from sample()
- To bypass the balancing in sample(), can I duplicate 'cov_cis_raw' columns in .bins() and call it 'weight' so it fakes balance cool file and downsample on raw cis contacts? And then I'll need to balance it anyway after the downsampling.
- Usually, is downsampling done on full matrix or do you ignore first(s) diagonal for some reason?
- additional question:
store (bool, optional) – If True, store the results in the input cooler file when finished.
Does it mean that the result is stored in the python variable or directly modified in the cool file? I am not sure to understand whether the matrices are loaded and only loaded version is modified or if the original file it-self is modified.
Thanks for the support,
David
The text was updated successfully, but these errors were encountered: