Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDSSDR10_htm_044300.hdf5 is broken for h5py #3

Open
hombit opened this issue Feb 3, 2021 · 7 comments
Open

SDSSDR10_htm_044300.hdf5 is broken for h5py #3

hombit opened this issue Feb 3, 2021 · 7 comments

Comments

@hombit
Copy link

hombit commented Feb 3, 2021

Hello and thank you for the project.

I'm trying to use catalogs and have found that SDSSDR10_htm_044300.hdf5 looks broken for h5py module:

import h5py

for dataset in h5py.File('SDSSDR10_htm_044300.hdf5'):
    print(dataset)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-ce06a7d9c9b1> in <module>
----> 1 for dataset in h5py.File('SDSSDR10_htm_044300.hdf5'):
      2     print(dataset)
      3

~/.local/lib/python3.7/site-packages/h5py/_hl/group.py in __iter__(self)
    431     def __iter__(self):
    432         """ Iterate over member names """
--> 433         for x in self.id.__iter__():
    434             yield self._d(x)
    435

h5py/h5g.pyx in h5py.h5g.GroupID.__iter__()

h5py/h5g.pyx in h5py.h5g.GroupID.__iter__()

h5py/h5g.pyx in h5py.h5g.GroupIter.__init__()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5g.pyx in h5py.h5g.GroupID.get_num_objs()

RuntimeError: Unable to get group info (bad symbol table node signature)

I have checked the md5 sum and it is matched.

@hombit
Copy link
Author

hombit commented Feb 12, 2021

h5stat utility also fails for this file:

Filename: SDSSDR10_htm_044300.hdf5
h5stat warning: Unable to traverse objects/links in file "SDSSDR10_htm_044300.hdf5"

@maayane
Copy link
Owner

maayane commented Feb 12, 2021 via email

@hombit
Copy link
Author

hombit commented Feb 12, 2021

Hello Maayane,

Thank you for your answer!

I cannot remember the exact catsHTM.cone_search arguments I used, but I've gone through the trace and found that the problem occurs when h5py tries to read this file. I don't think that there is something wrong with h5py itself, because hdf5 utilities like h5stat or h5dump cannot open this file too.

I have problems with some other files too, I'll prepare a full list and write it here.

I use catsHTM 0.1.32, h5py 3.1.0, h5stat 1.8.12

@maayane
Copy link
Owner

maayane commented Feb 12, 2021 via email

@hombit
Copy link
Author

hombit commented Feb 12, 2021

I've run the following script on my catsHTM directory:

for FILE in $(find . -name '*.hdf5'); do
    h5stat "$FILE" > /dev/null
done

It gave me this output:

h5stat warning: Unable to traverse objects/links in file "SDSS/DR10/SDSSDR10_htm_044300.hdf5"
h5stat warning: Unable to traverse objects/links in file "NED/20180502/NEDz_htm_041800.hdf5"
h5stat error: unable to open file "NED/20180502/NEDz_htm_042500.hdf5"
h5stat error: unable to open file "NED/20180502/NEDz_htm_018800.hdf5"
h5stat warning: Unable to traverse objects/links in file "NED/20180502/NEDz_htm_018900.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/KiDS/DR3/VSTkids_htm_310000.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/KiDS/DR3/VSTkids_htm_258700.hdf5"
h5stat error: unable to open file "VST/ATLAS/DR3/VSTatlas_htm_438200.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/ATLAS/DR3/VSTatlas_htm_452200.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/ATLAS/DR3/VSTatlas_htm_449300.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/ATLAS/DR3/VSTatlas_htm_452000.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/ATLAS/DR3/VSTatlas_htm_438300.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/ATLAS/DR3/VSTatlas_htm_444400.hdf5"
h5stat warning: Unable to traverse objects/links in file "VST/ATLAS/DR3/VSTatlas_htm_442300.hdf5"
h5stat warning: Unable to traverse objects/links in file "UKIDSS/DR10/UKIDSS_htm_043900.hdf5"
h5stat warning: Unable to traverse objects/links in file "Spitzer/SAGE/SAGE_htm_533100.hdf5"
h5stat warning: Unable to traverse objects/links in file "VISTA/Viking/DR2/VISTAviking_htm_244600.hdf5"
h5stat warning: Unable to traverse objects/links in file "VISTA/Viking/DR2/VISTAviking_htm_265700.hdf5"

These files look broken for hdf5 utilities

@hombit
Copy link
Author

hombit commented Feb 17, 2021

I've found a cone search example.

Output of my script, catalog is NEDz, ra=228.80143 deg, dec=0.48434 deg, radius=60 arcsec:

...
    data, names, units = cone_search(cat, ra_rad, dec_rad, radius_arcsec, catalogs_dir=path)
  File "script.py", line 134, in cone_search
    cat = class_HDF5.HDF5(root_to_data + CatDir + '/' + FileName_0).load(DataName_0, numpy_array=True).T
  File "/home/kostya/.local/lib/python3.7/site-packages/catsHTM/class_HDF5.py", line 59, in load
    f = h5py.File(filename, 'r')
  File "/home/kostya/.local/lib/python3.7/site-packages/h5py/_hl/files.py", line 427, in __init__
    swmr=swmr)
  File "/home/kostya/.local/lib/python3.7/site-packages/h5py/_hl/files.py", line 190, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 96, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

I've run it with strace, this part of its output shows that the problem is with NEDz_htm_018800.hdf5 file which I've found in the previous message:

$ strace -f -t -e trace=file ./my_script.py
...
[pid 19970] 00:34:47 stat(".../NED/20180502/NEDz_htmColCell.mat", {st_mode=S_IFREG|0640, st_size=1164, ...}) = 0
[pid 19970] 00:34:47 open(".../NED/20180502/NEDz_htmColCell.mat", O_RDONLY|O_CLOEXEC) = 4
[pid 19970] 00:34:47 stat(".../NED/20180502/NEDz_htm.hdf5", {st_mode=S_IFREG|0640, st_size=2274192, ...}) = 0
[pid 19970] 00:34:47 open(".../NED/20180502/NEDz_htm.hdf5", O_RDONLY) = 4
[pid 19970] 00:34:47 lstat(".../NED/20180502/NEDz_htm.hdf5", {st_mode=S_IFREG|0640, st_size=2274192, ...}) = 0
[pid 19970] 00:34:47 stat(".../NED/20180502/NEDz_htm_018800.hdf5", {st_mode=S_IFREG|0640, st_size=5902144, ...}) = 0
[pid 19970] 00:34:47 open(".../NED/20180502/NEDz_htm_018800.hdf5", O_RDONLY) = 4
...

@gnarayan
Copy link

gnarayan commented Nov 5, 2021

Hi @maayane - is there any update on this issue, and adding object IDs to the catsHTM files. I'm trying to get some reliable multi-catalog crossmatch service setup for LSST DESC, and our activities in the Time Domain and Photo-Z Working Group in particular and would like to use catsHTM, but without object IDs and these broken files for different surveys, we're a bit stuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants