Support multiple file formats for the raw data #5

rossant · 2019-06-14T10:25:51Z

No description provided.

yger · 2019-06-19T09:53:13Z

I don't really know what is the best way to proceed here. On one hand, there is neo, a python package meant to be able to read/write various file formats, in a fast and efficient way. On the other hand, we also recoded numerous wrappers on our side, close/similar to the wrappers you'll find in neo, but lighter, for the internal needs of SpyKING CIRCUS. Since neo is more structured, maybe this is the good way to go? It would be amazing if phy could display several native/proprietary file format, as numerous users are struggling to simply export data into raw binary...

rossant · 2019-06-19T10:10:49Z

Could you point me to the code of your wrappers?

yger · 2019-06-19T17:57:53Z

The code is here https://github.com/spyking-circus/spyking-circus/tree/master/circus/files
I am not saying this is optimal, I am not as good coder as you, so clearly the system should be refactored. Basically, there is a DataFile object, exposing read/write methods implemented for all subclasses and proprietary file formats. It gets a little bit more complicated because we can virtually concatenate files or recordings within the same file (hdf5). Neo is doing the same thing, but with a slightly more documented structure. Both have high overlap, and for quite a while I told myself that maybe SC should import neo as a dependency, in order to centralize once for all all the wrappers. I would be very interested to have your opinion too

samuelgarcia · 2019-06-21T09:09:13Z

@rossant :
Hi Cyril,
you should seriously have a look to neo reader.

There are 2 levels for read in neo:

neo.io : the legacy that reader neo object (AnalogSignal, SpikeTrain, Event, ...)
neo.rawio : the low level that lazy acces buffer directly

https://github.com/NeuralEnsemble/python-neo

https://neo.readthedocs.io/en/latest/rawio.html

Reading ephy format have been done in many places. (circus, neo, spikeextractors and many individual wrapper for particular format). It is a total energy disperssion.

Neo have a strong API with 2 levels that support multiblock, multi segment, signals multi sample rate, events, epochs, spike and waveforms.
Neo include many formats.

I really think that, Pierre should move all the wrapper in neo and Cyril you should use neo.rawio.
I telling this to Pierre since some years now.
Maybe one day it will happen. :)

Also note that recently, lazy reading have also been incorporate in neo.io so it is also a solution you could use.

yger · 2019-06-21T09:22:00Z

I think moving to Neo wrappers, for us, would be the solution. I just never managed to take the time, but this is an Open Issue with SC :-) One day, it will happen.

rossant · 2019-06-21T09:22:01Z

Thanks @yger and @samuelgarcia, I'll have a look at this soon. I agree that we should reuse the same code as much as possible. For phy, what I'll need is a function with the following signature:

read_raw_data(data_files, n_channels_dat=None, dtype=None, offset=None, sample_rate=None)

which returns a single memmap NumPy array with (virtual) shape (n_samples, n_channels) (or an object polymorphic to it) that can be efficiently sliced in time, where n_samples is the total number of time samples across the entire recording, and n_channels is the total number of channels in the recording (across all shanks if there are multiple probes). I say "virtual" because phy supports multi raw data files that have the same characteristics and just split in time.

I already have this function for raw binary files. Parameters like n_channels, dtype, offset, sample_rate cannot be obtained from raw data files, unless a specific format is used with the header; that's why I need to have them as optional parameters to this read_raw_data() function. For more complex file formats, I suppose the readers can extract these values from the binary header.

Can neo be used to write such a wrapper function?

yger · 2019-06-21T09:25:29Z

Yes. So I think we should all converge to neo as a dependency, and centralize all the individual wrappers. Neo can do what you want, I think, and expose such functions. The problem is that for different file format, you have different inputs to give (sampling rate, nb_channels, ...). Some have everything in the header, some only partial information. Not a big deal, you just need, in phy, to know how to handle this in the params.py I guess

samuelgarcia · 2019-06-21T09:25:38Z

I think it should be easy to make an objec proxy between neo.rawio and this function.
This object would virtual concatenate all signals.

But why don't you use directly the neo.rawio API that explicitly have multi segment and lazy read instead of this virtual object inside phy ?

Note that in your case offset can change from one file to another. So this function can lead to problems

samuelgarcia · 2019-06-21T09:26:19Z

Few format need parameters as input except raw binary.

rossant · 2019-06-21T09:29:36Z

These parameters would just be used for the raw binary format, which is what we use at the moment. If the file format is different, these parameters would be None and just discarded, since they would be parsed from the files themselves.

The course of action I gave is the least effort path for me since I wouldn't have to change anything in phy. The virtual concatenation object we have already works very well for us and ideally, we'd use it indistinctively for all file formats.

What's the difference between lazy read and memmap?

samuelgarcia · 2019-06-21T09:31:13Z

Some file don't have continuous block in the file.
So we construct block on the fly.
Like memmap do but with many many many more line of codes.

rossant self-assigned this Jun 14, 2019

This was referenced Jun 14, 2019

Dealing with non binary files kwikteam/phy-contrib#63

Closed

Any latest progress of Phy-contrib? kwikteam/phy-contrib#135

Closed

rossant added the feature label Sep 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple file formats for the raw data #5

Support multiple file formats for the raw data #5

rossant commented Jun 14, 2019

yger commented Jun 19, 2019

rossant commented Jun 19, 2019

yger commented Jun 19, 2019

samuelgarcia commented Jun 21, 2019

yger commented Jun 21, 2019

rossant commented Jun 21, 2019

yger commented Jun 21, 2019

samuelgarcia commented Jun 21, 2019

samuelgarcia commented Jun 21, 2019

rossant commented Jun 21, 2019

samuelgarcia commented Jun 21, 2019

Support multiple file formats for the raw data #5

Support multiple file formats for the raw data #5

Comments

rossant commented Jun 14, 2019

yger commented Jun 19, 2019

rossant commented Jun 19, 2019

yger commented Jun 19, 2019

samuelgarcia commented Jun 21, 2019

yger commented Jun 21, 2019

rossant commented Jun 21, 2019

yger commented Jun 21, 2019

samuelgarcia commented Jun 21, 2019

samuelgarcia commented Jun 21, 2019

rossant commented Jun 21, 2019

samuelgarcia commented Jun 21, 2019