Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigBed interval problem #122

Open
jtd032 opened this issue Dec 3, 2021 · 7 comments
Open

BigBed interval problem #122

jtd032 opened this issue Dec 3, 2021 · 7 comments

Comments

@jtd032
Copy link

jtd032 commented Dec 3, 2021

I am creating a list of histograms, one for each file using below code:

Imports

import numpy as np
import pyBigWig as bw
import matplotlib.pyplot as plt
import os

For Loop

directory = 'listed file path'
for filename in os.listdir(directory):
f = os.path.join(directory, filename)
if os.path.isfile(f) and filename.endswith('.bb'):
fp = bw.open(f,'r')
chr = filename.replace('.bb','')
max = fp.header()['maxVal']
#print(fp.header())
a = np.array(fp.entries(chr, 1, max),dtype=np.int64)
plt.hist(a[:,2], bins='auto') # arguments are passed to np.histogram
plt.title("Histogram with 'auto' bins")
#Text(0.5, 1.0, "Histogram with 'auto' bins")
print(chr)
plt.show()

The problem I am riunning into is retreval of the maxVal from the Header command, it works for the first few graphs but ends up spitting out an error at later files: (int() argument must be a string, a bytes-like object or a number, not 'NoneType') am I understanding that the maxVal is the top end of the range of values for that file?

@dpryan79
Copy link
Collaborator

dpryan79 commented Dec 6, 2021

The maxVal is stored in the bigBed header. Could it be that it simply wasn't set for one of the files?

@jtd032
Copy link
Author

jtd032 commented Dec 8, 2021

all files pull up a maxVal when tested
chr10 was successful but chr11 was not:
image
error msg:
image

@dpryan79
Copy link
Collaborator

Can you make the file available to me? I can have a look then.

@YunfengLUMC
Copy link

Hi,
Currently, I'd like to know how to save the all entries into a file.
Here is my code:
bb=pyBigWig.open('./PBMCs_HistoneMarks_Blueprint/Males_UMCG00025_H3K4me1.peak_calls.bigBed' )
bb.entries('chrX', 16426, 156000962, withString=False)
So how can I output "bb.entries" object? By the way,for the bigBed object, how can I output all chromosomes intervals at one time, I found I need to specify start and end positions for each chromosome.
Again,if I use bigWig file, the intervals I extract is same as bigBed?Because I found start and end position is not necessary for bigWig file based on your description.
Many thx!

@dpryan79
Copy link
Collaborator

I don't know that I ever put in the logic in the .entries() function to have it fill in the chromosome bounds if nothing was supplied. I suppose that could be done, though since the python function is really just a thin wrapper over a C function and C is less flexible about such things.

For outputting the results of bb.entries(), it's just a list of tuples, so something like the following would work:

for res in bb.entries('chr1', 10000000, 10020000):
    o.write("chr1\t{}\t{}\t{}\n".format(res[0], res[1], res[2]))

@YunfengLUMC
Copy link

I don't know that I ever put in the logic in the .entries() function to have it fill in the chromosome bounds if nothing was supplied. I suppose that could be done, though since the python function is really just a thin wrapper over a C function and C is less flexible about such things.

Thanks for your detailed reply.
Could I try this, I don't need strings:
for res in bb.entries('chr1', 10000000, 10020000, withString=False):
o.write("chr1".format(res[0], res[1], res[2]))
Best wishes!

@dpryan79
Copy link
Collaborator

dpryan79 commented Jan 24, 2022

o.write("chr1\t{}\t{}\n".format(res[0], res[1])) in that case as an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants