Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data streamer 1 #313

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Data streamer 1 #313

wants to merge 14 commits into from

Conversation

artmeln
Copy link
Contributor

@artmeln artmeln commented Jan 13, 2023

This is a proposal to add a new kind of device called DataStreamer to MM (discussed previously here). I’ve been slowly making improvements to this branch, but I think this project is now nearing completion.
The purpose of this new device is to simplify retrieving data from continuously running hardware for processing and inspection. Examples of such hardware are oscilloscopes, photon counters, time-tagged time-resolved (tttr) fluorescence data collectors etc. DataStreamer base class takes care of multithreaded data acquisition and processing while allowing the user to implement specific hardware operations involved in retrieving and processing of the data in the device adapter.

DataStreamer device has the following methods (as defined in MMDevice.h):

int StartStream();
int StopStream();
int GetBufferSize(unsigned& dataBufferSiize);
std::unique_ptr<char[]> GetBuffer(unsigned expectedDataBufferSize, unsigned& actualDataBufferSize, int& exitCode);
int ProcessBuffer(std::unique_ptr<char[]>& pDataBuffer, unsigned actualDataBufferSize);
int SetStreamParameters(bool stopOnOverflow, bool pauseAcquisitionBeforeOverflow, int numberOfBuffers, int durationMs, int updatePeriodMs);
int GetStreamParameters(bool& stopOnOverflow, bool& pauseAcquisitionBeforeOverflow, int& numberOfBuffers, int& durationMs, int& updatePeriodMs);
int SetAcquisitionPause(bool pause);
bool GetAcquisitionPause();
bool GetOverflowStatus();
int ResetOverflowStatus();
bool IsStreaming();
int GetAcquisitionExitStatus();
int GetProcessingExitStatus();
int SetCircularAcquisitionBufferCapacity(int capacity);
int GetCircularAcquisitionBufferCapacity();

The first five methods (StartStream, StopStream, GetBufferSize, GetBuffer, ProcessBuffer) should be implemented by the user at the level of the device adapter; the rest have default implementations in DeviceBase.h. In addition to the definition of CDataStreamerBase class, this file contains auxiliary classes

  • AcquisitionBufferCollection for handling the circular acquisition buffer
  • AcquisitionThread
  • ProcessingThread

The following methods are exposed to the java layer:

void startStream(const char* dataStreamerLabel) throw (CMMError);
void stopStream(const char* dataStreamerLabel) throw (CMMError);
bool getOverflowStatus(const char* dataStreamerLabel) throw (CMMError);
void resetOverflowStatus(const char* dataStreamerLabel) throw (CMMError);
bool getAcquisitionPause(const char* dataStreamerLabel) throw (CMMError);
void setAcquisitionPause(const char* dataStreamerLabel, bool pause) throw (CMMError);
bool isStreaming(const char* dataStreamerLabel) throw (CMMError);
int getAcquisitionExitStatus(const char* dataStreamerLabel) throw (CMMError);
int getProcessingExitStatus(const char* dataStreamerLabel) throw (CMMError);
void setStreamParameters(const char* dataStreamerLabel, bool stopOnOverflow, bool pauseAcquisitionBeforeOverflow, int numberOfBlocks, int durationMs, int updatePeriodMs) throw (CMMError);
void getStreamParameters(const char* dataStreamerLabel, bool& stopOnOverflow, bool& pauseAcquisitionBeforeOverflow, int& numberOfBlocks, int& durationMs, int& updatePeriodMs) throw (CMMError);
void setCircularAcquisitionBufferCapacity(const char* dataStreamerLabel, int capacity) throw (CMMError);
long getCircularAcquisitionBufferCapacity(const char* dataStreamerLabel) throw (CMMError);

The logic of DataStreamer operation is as follows. When StartStream is called two threads are started, one for data acquisition and one for data processing. The acquisition thread calls GetBufferSize with the period specified by updatePeriodMs stream parameter. When GetBufferSize reports dataBufferSize>0, the acquisition thread calls GetBuffer which returns the pointer to the data along with the actual size of the data received from the hardware. The acquisition thread puts this information into a circular buffer, then GetBufferSize is called again and so on. This cycle continues until one of the following conditions is met:

  • the specified number of data buffers are collected (specified by numberOfBlocks stream parameter),
  • the measurement time elapses (durationMs stream parameter),
  • the circular buffer overflows (the buffer capacity is set through setCircularAcquisitionBufferCapacity),
  • the user calls StopStream.
    In order to facilitate error handling and interaction with real hardware, the acquisition thread also terminates if
  • GetBufferSize returns an error (a value other than DEVICE_OK),
  • GetBuffer reports exitStatus other than DEVICE_OK,
  • GetBuffer returns 0 instead of a data pointer.

The acquisition thread only adds data to the circular buffer but never retrieves it. This is done by the processing thread which repeatedly calls ProcessBuffer on the first available entry in the circular buffer. The purpose of the processing thread is to store acquired data to the hard drive and reduce it to manageable and easily displayed form. [Details of how to displayed results of processing will be a subject of a separate discussion]. The processing thread stops running after the acquisition thread has stopped and all entries from the circular buffer have been processed, or when ProcessBuffer returns a value other than DEVICE_OK.

I’ve added an example DataStreamer device to DemoCamera.h. In this example every call to GetBuffer increments a counter and generates a buffer filled with the counter value while ProcessBuffer calculates the average of this buffer and writes the value to a property for display. After StartStream is called, “Average data value” property should take on values 1, 2, 3, and so on as long as the stream is running. In order to simulate real life scenarios where data acquisition and processing take finite time, “Acquisition period” and “Processing period” properties can be modified. For example, setting acqPeriod = 0 and procPeriod = 1000 ms will eventually result in buffer overflow due to processing not being able to keep up with acquisition. Finally, there are properties that will allow simulating errors in GetBufferSize, GetBuffer, and ProcessBuffer calls at specified average data values. The exit status for both threads can be determined at any time with getAcquisitionExitStatus() and getProcessingExitStatus().

Here is one test script that illustrates DemoDataStreamer:

s = mmc.getProperty("DDataStreamer","Average data value");
mmc.setStreamParameters("DDataStreamer",true,false,14,10000000,10);
mmc.startStream("DDataStreamer");
while (mmc.isStreaming("DDataStreamer")) {
	snew = mmc.getProperty("DDataStreamer","Average data value");
	if (!s.equals(snew)) {
		print(snew); 
		s = snew;
	}
	mmc.sleep(50);
}
acqStatus = mmc.getAcquisitionExitStatus("DDataStreamer");
print("Acquisition thread: "+mmc.getErrorMessage("DDataStreamer",acqStatus)+" ("+acqStatus+")");
procStatus = mmc.getProcessingExitStatus("DDataStreamer");
print("Processing thread: "+mmc.getErrorMessage("DDataStreamer",procStatus)+" ("+procStatus+")");

Here are a few additional remarks. The user may need to tell the hardware to start acquiring data. This call should be implemented in the StartStream method and must be followed by StartDataStreamerThreads() call which will get the threads going. Similarly, implementation of StopStream method in the adapter may contain instructions to hardware but must also call StopDataStreamerThreads().

Finally, there is a stream parameter called pauseAcquisitionBeforeOverflow. It should be used in those cases when processing is faster than acquisition and acquisition timing is not critical. This might be the case with a data streamer device that “acquires” data from the hard drive faster than it can be processed. In this case the acquisition thread will pause before the overflow occurs to give processing thread time to catch up.

I hope my explanation of the changes makes sense, and the data streamer functionality can become a part of Micro-Manager. Needless to say, I’m happy to make whatever changes or improvements are needed to make this happen.

@marktsuchida
Copy link
Member

I'm excited about this. Thanks for the detailed writeup. I will review this in detail next week.

@marktsuchida
Copy link
Member

Or early next week -- sorry, I haven't forgotten.

@henrypinkard
Copy link
Member

@artmeln A couple quick comments on this:

  1. Have seen you New circular buffer/memory model #244? Once a more generic data buffer object has been implemented in the Core, it might be worth considering using that rather than creating your own buffer in the device adapter

  2. Am I understanding correctly that the data processing must take place in the device adapter itself? Most users are far more comfortable writing Python/Java, and so I think it would be valuable to consider outsourcing the processing to a higher level

@artmeln
Copy link
Contributor Author

artmeln commented Feb 15, 2023

Thank you for your comments, @henrypinkard

Have seen you #244? Once a more generic data buffer object has been implemented in the Core, it might be worth considering using that rather than creating your own buffer in the device adapter

I have seen the thread about the new circular buffer model but I’m not sure it’s the way to go for the streaming device. Looking at the circular camera buffer as it is currently implemented in MM, it looks tailored to describing images. For the streaming device, I think it would be best to have a circular buffer that is as generic as possible so that no assumptions need to be made about what kind of data is being streamed. But I’m open to the idea of using the future circular buffer model for the streaming device if there is a way to use it as a simple container for arbitrary data.

Am I understanding correctly that the data processing must take place in the device adapter itself? Most users are far more comfortable writing Python/Java, and so I think it would be valuable to consider outsourcing the processing to a higher level

As far as data processing goes, I was trying to avoid the situation where the acquisition is running in the C++ layer and the circular buffer is being filled but the data is not being retrieved from it because the user has not started a data processing thread in Python/Java. My solution to this is to implement both acquisition and processing threads in C++, and for now the only way to process data is in the C++ device adapter. However, in the next round of development I’d like to add another circular buffer to the Core that is accessible to Python/Java. This buffer would get populated in the C++ processing thread which could simply pass the data through if real processing is to be done in Python. Alternatively the C++ processing thread could rearrange the data into a format ready for plotting in Python.

@henrypinkard
Copy link
Member

Have seen you #244? Once a more generic data buffer object has been implemented in the Core, it might be worth considering using that rather than creating your own buffer in the device adapter

I have seen the thread about the new circular buffer model but I’m not sure it’s the way to go for the streaming device. Looking at the circular camera buffer as it is currently implemented in MM, it looks tailored to describing images. For the streaming device, I think it would be best to have a circular buffer that is as generic as possible so that no assumptions need to be made about what kind of data is being streamed. But I’m open to the idea of using the future circular buffer model for the streaming device if there is a way to use it as a simple container for arbitrary data.

Part of the rationale for #244 is that the current circular buffer assumptions are overly restrictive, and MM would benefit from a more generic type of buffer that is potentially somewhat future-proof for new types of cameras that acquire different types of data. This seems like it could be a good use case (apologies that I didn't catch this earlier before you'd made a full implementation)

As far as data processing goes, I was trying to avoid the situation where the acquisition is running in the C++ layer and the circular buffer is being filled but the data is not being retrieved from it because the user has not started a data processing thread in Python/Java. My solution to this is to implement both acquisition and processing threads in C++, and for now the only way to process data is in the C++ device adapter.

The reason I bring this up is partly based on the experience the ImageProcessor device type. This has been a part of MM since the beginning, and as far as I know no one has ever implemented one, because of the inconvenience involved in creating it in c++. My concern is that this device type might end up with something similar, where there is an implementation that you've created, but no one else creates other implementations because of the preference for higher level languages, and then it becomes something that needs to continue being supported, but is not often used.

Do you see any issue with simply saying in the documentation that the use should start a processing thread in Java/Python before starting the device, so that data gets pulled out? And if they fail to and it overflows, it throws an error? This is essentially how the current circular buffer works for images and it doesn't seem to cause any major problems.

However, in the next round of development I’d like to add another circular buffer to the Core that is accessible to Python/Java. This buffer would get populated in the C++ processing thread which could simply pass the data through if real processing is to be done in Python. Alternatively the C++ processing thread could rearrange the data into a format ready for plotting in Python.

I think adding multiple new buffers in the C++ layer comes with a significant cost in terms of extra code that will need to maintained. And it doesn't sound to me like any of this is so specialized that it needs to create all of this from scratch.

Would you be open to trying to outsource some of the memory parts of this to #244 ? If so, maybe you can comment on there a bit more about what essential metadata attributes would be needed aside from the buffer data itself?

@artmeln
Copy link
Contributor Author

artmeln commented Feb 17, 2023

Part of the rationale for #244 is that the current circular buffer assumptions are overly restrictive, and MM would benefit from a more generic type of buffer that is potentially somewhat future-proof for new types of cameras that acquire different types of data. This seems like it could be a good use case (apologies that I didn't catch this earlier before you'd made a full implementation)

I don’t mind switching to the more general version of the circular buffer, my implementation is extremely basic and would be easy to replace. I’ll post my thoughts on what the device streamer would expect from a circular buffer soon.

I’m still not entirely convinced that processing for the data streamer shouldn’t be done in the adapter. A lot of the time such processing is rather calculation intensive, and C seems well suited for this purpose. I still like the idea of having the option of doing the processing in the device adapter or passing the data pointer through. Another concern I have though is how seamless the passing of pointers is going to be from c to python/java so that there isn’t unnecessary copying of memory or a possibility of memory leaks.

@henrypinkard
Copy link
Member

I don’t mind switching to the more general version of the circular buffer, my implementation is extremely basic and would be easy to replace. I’ll post my thoughts on what the device streamer would expect from a circular buffer soon.

Great!

I’m still not entirely convinced that processing for the data streamer shouldn’t be done in the adapter. A lot of the time such processing is rather calculation intensive, and C seems well suited for this purpose. I still like the idea of having the option of doing the processing in the device adapter or passing the data pointer through.

Yes that makes sense. I am a big fan of python packages like jax and numba that give c-level performance without having to code in C. But certainly if one is willing and able to write it in C, that is a good option.

Another concern I have though is how seamless the passing of pointers is going to be from c to python/java so that there isn’t unnecessary copying of memory or a possibility of memory leaks.

Yes, this will be important to get right on the new buffer. Right now everything uses copies. Your thoughts on the best way to do this on that issue are much appreciated

@artmeln
Copy link
Contributor Author

artmeln commented Mar 26, 2023

I am wondering about the status of this PR. It seemed like there was interest in this development, and as the author I definitely think that it would be valuable to other users, so I’d like to know what’s holding this up and if there is anything I can do to speed up the process.
We’ve discussed using the future circular camera buffer model for the data streamer needs. However, as far as I understand, the new buffer model has not yet been developed, and is likely to require other changes to the Core in how cameras are implemented. The proposed DataStreamer addition, on the other hand, is (again, as far as I can tell) fully self-contained and does not interfere with any existing functionality in MM. The sooner DataStreamer is integrated into MM, the sooner further development of device adapters by myself (and hopefully by other people) can start.
If this PR is problematic in some way, I’d like to know about that so that I can work on fixing the problems. If it needs to be better documented throughout the code or elsewhere, I’m happy to work on that too.
In short, it would be nice to know what the plan is. I’d like to see the data streamer functionality come to fruition, and I’m happy to do the work necessary to make this a reality sooner.

@nicost
Copy link
Member

nicost commented Mar 27, 2023

Sorry for the long delay! I am very supportive, but API design is a bit outside of my comfort zone. @henrypinkard and @marktsuchida, is it OK to merge or are there specific changes you would like to see?

@marktsuchida
Copy link
Member

marktsuchida commented Mar 28, 2023

I'm in favor of going forward with this separately from any future camera buffer redesign, but would like to propose a few changes before merging. Will get back to you in the next 24 hours by Wednesday. Sorry to let this sit for so long.

@henrypinkard
Copy link
Member

Hi @artmeln, thanks for your patience on this. I think this will be an excellent addition to MM eventually.

My one concern here is that in general, I think it is a good idea to keep the c++ layer of MM as lean as possible. That is, we implement only what is necessary in the layer, because it takes more work to maintain and a much smaller fraction of the community is comfortable programming in it.

Your implementation makes perfect sense given the current capabilities of MM, but as mentioned before, this is poised to shift in a major way with the introduction of the new buffer model. To me it seems the best end result is that your data streamer device uses the new buffer model, and a result that we want to avoid is having a different buffer model in this device type than what will eventually be used with all the cameras.

To give a concrete example of why I would like to avoid this: The addition of fast c++ layer file saving will likely require the use of buffers with specific characteristics (#323 (comment)). If the buffer system in your data streamer device does not implement these, it may not be compatible with this feature, which seems like it might be a source of user confusion and frustration (not to mention the larger code base to support).

That being said, I understand that it is probably frustrating to have this sitting in a PR waiting on MM to change.

I propose the following: what if we merge this (pending @marktsuchida's suggestions), but add in an "experimental_" prefix (i.e. experimental_DataStreamer). Essentially this means that we're not (yet) guaranteeing stability of its API, since it would be a better long term outcome if this works together with the new buffer model. So you and others can move forward with this device type, just with the understanding that any implementations may require changes once the new buffer model is ready.

What do you all think?

@artmeln
Copy link
Contributor Author

artmeln commented Mar 29, 2023

Thank you everyone for prompt replies! I think it's a good idea to label the device as experimental and move forward while acknowledging that it will change in the near future. If everyone agrees, I'll plan to rename it accordingly when I make other necessary changes.

@marktsuchida
Copy link
Member

Sorry for the delay, can I have a day or two more on this? It's a large amount of new API and I'd like to properly go through it, even if it is to be labeled experimental. Appreciate it.

@artmeln
Copy link
Contributor Author

artmeln commented Mar 31, 2023

No problem, take your time and thank you for working on this. I agree, this should be done well even if the changes are experimental

@marktsuchida
Copy link
Member

@artmeln Sorry for taking so long to get back to you.

API design is hard, and I'm sure what I wrote below doesn't address every potential issue. Also I may have made different assumptions from you based on the device APIs I know well, in which case please by all means provide counterexamples. My thinking is largely based on DAQ devices (NI DAQmx) and TCSPC devices (Becker-Hickl, PicoQuant, Swabian), whose APIs I know quite well. I have not worked with oscilloscope APIs.

Also most of my comments are based on your description above, not (yet) a detailed review of your actual code. Please correct me if I misinterpreted anything.

In general I still like the idea of a device type that can stream user-defined data but is otherwise kept simple. Although it has the drawback that the application needs to know something about the device in order to interpret the data, it's a big step forward from not having any way to do this.

Name

I like the name DataStreamer but it might not be clear that it is for acquisition (input) only and not for generation (output). Maybe InputStreamer or InputDataStreamer (so that there could conceivably be a future OutputStreamer)?

Processing

Perhaps I'm not grasping your intended real-world use cases for the "processing" (other than saving to disk; see below under "Passing data to the application" on that). In general, I think it is best if processing is decoupled from acquisition as much as possible -- so requiring the same device object to do the acquisition and processing would seem to limit the possible applications quite a bit (among other things, it forces any processing code to be written in C++ and configured solely through device properties).

Maybe we can discuss futher if you could offer some concrete (even device-specific) examples of what data processing you might do in ProcessBuffer().

Buffering

I am not exactly sure why buffering is needed, because for the devices I'm familiar with, the device driver already has built-in buffering, often with configurable size.

If, on the other hand, for some reason we need a common buffering scheme akin to the sequence buffer for cameras (but separate buffer for each device), then it would make sense to put it into MMCore (perhaps in the DeviceInstance object) rather than DeviceBase, so that it is cleanly separated from the device.

But, again, I think it is best to let the driver take care of buffering, and avoid re-buffering the already buffered data. Perhaps a reusable buffer implementation would be useful for any devices whose driver doesn't offer buffering (of enough capacity), but that could be done completely independently of the interfaces, as a small utility library.

Pausing

Can you explain what scenarios would be helped by pausing? I don't think many devices support pausing as a hardware feature (but correct me if I'm wrong), so I'm wondering if this is something that should be handled on the application side (by stopping and restarting the acquisition).

Memory allocation

The MMDevice interface is designed to work across binaries built with different C++ runtimes. The prime example of this is that device adapters built with MSVC Debug runtime work with a Relaese build of MM. For this to be true, we cannot pass C++ objects across this boundary (because their memory layout may differ). Furthermore, we cannot allocate memory in one DLL and release it in another.

Using std::unique_ptr in the interface therefore needs to be avoided. For reading data from the device, there are a couple of options:

  • Device calls a CoreCallback function to push data, passing pointer and size of a buffer managed by the device. MMCore then copies the data to the destination (or intermediate buffer). This might require an extra copy that could otherwise be avoided. Cameras currently work this way.

  • Device calls a CoreCallback function to allocate a buffer. It then fills said buffer with data and hands it back to MMCore (together with buffer capacity and data size). MMCore exposes the buffer to the application and later deallocates it.

    • Devices need to be careful not to leak such buffers, but this can be diagnosed pretty easily if necessary.
      • You also need a CoreCallback function to deallocate any buffers that end up not being used.
    • Many device driver APIs work by filling user-passed buffers, so this fits nicely: for example, the device could pass the buffer provided by MMCore straight to NI DAQmx's DAQmxReadAnalogF64().

Passing data to the application

If I understand correctly, the final destination of data is the ProcessBuffer() function of the device itself.

All existing MM devices work by passing data through MMCore to the Java or Python application. I pretty strongly think that this pattern should be followed by any new devices, at least as the primary scheme. This would make sense if you have any kind of live display (which most of these devices would benefit from).

I'm not opposed to having mechanisms to save directly to files without leaving C++ (see also @henrypinkard's proposals), but the only case where it makes sense (that I'm aware of) is when the data rate is too high to handle it in any other way. I don't think most applications of DataStreamer would have such high data rates -- but if they do, we might have other bottlenecks to also worry about.

I'm especially worried about tieing together the acquisition, processing, and saving code into device adapters. This would mean, for example, that if multiple TCSPC device adapters want to save to the same file format (say, Photon-HDF5), they would need to contain duplicate implementations.

At least for a first version of this kind of device, I would strongly encourage you to consider offering a straightforward way to transfer data to the application, even if it requires one additional copy at the language boundary.

If saving to disk directly is important and cannot be done by the application (I would love to hear the use cases), we can add that in a second round, but it would be good to try to keep that as decouploed from the acquisition and device handling as possible.

Also note that not having the acquisition and saving be in an inaccessible closed loop buried in the device adapter should open up many more use cases for this device type.

Miscellaneous

  • It is best to avoid functions that take many parameters of the same type (Set/GetStreamParameters). There are even tools to detect these. It's better to have separate functions.
    • In the case of these parameters, having separate functions also helps with interpreting any error codes returned.

Once again, apologies for taking so long to review this; I'll make sure to respond faster in the future.

@artmeln
Copy link
Contributor Author

artmeln commented May 15, 2023

Hi Mark,
Thank you for the review.

Name

I contemplated adding output streaming functionality to this device but haven’t implemented it yet. It’s probably best to delegate input and output streaming to separate devices anyway, so I’m in favor of renaming this device InputStreamer.

Processing and passing data to the application

Here are a few examples of processing that I had in mind: calculation of histograms from TTTR data for lifetime analysis, calculation of correlation functions from TTTR data, using TTTR data to make an image in lifetime microscopy. In general, it can be anything that prepares the data for user-friendly display. My intent was not to limit data processing to the device adapter but rather to provide the user with this option. Since ProcessBuffer is for the writer of the device adapter to implement, it can do nothing at all and simply pass the data through for processing in the application. I do think though that there is an advantage to processing data in the device adapter: the device becomes self-sufficient as an object that communicates with hardware and produces data that’s ready for visualization by either the java GUI or pymmcore.

When it comes to passing data to the application, unfortunately I have no idea how this could be done. I poked around what SWIG is doing during MM build but I only learned that it’s complicated, and cameras are a unique case passing void* for images which no other device is allowed to do. This sort of development is very far outside of my comfort zone.
For the input data streamer, it would definitely be good to avoid any additional data copying or at least copy small amounts of data (another reason to process raw data in the adapter and reduce it to small arrays ready for plotting).

Buffering

I still like the idea of buffering, especially since it is tied with the acquisition thread. As implemented now, DataStreamer starts an acquisition thread as soon as StartStream is called, and will continue asking for data and putting it in the circular acquisition buffer. Perhaps, some drivers will do exactly this, and in that case the extra thread and the buffering aren’t necessary. But I wanted to make implementation of acquisition transparent to someone who is writing a device adapter: all one needs to do is implement GetDataBuffer, and the rest is taken care of behind the scenes (in my experience, one difficulty with implementing Live in camera adapters is having to deal with starting a new thread in the adapter).

To the extent of my understanding, the streamer device needs to have an acquisition thread and a processing thread associated with it, and my preference is to put both in the Core code. An alternative solution (for which you are advocating, if I understand correctly) is to have application create and monitor these threads. I’m not opposed to this in general but this solution is counterintuitive to me, the streamer device itself would not do any streaming then, just communicating with the hardware.

Pausing

The reason pausing is there is for the case where processing is having difficulty keeping up with acquisition. In that case acquisition can be temporarily paused to take advantage of other available buffering (in the hardware or in the device driver).

Memory allocation

If unique_ptr presents a problem, I can get rid of it. The idea was to eliminate the possibility of memory leaking and to make it clear that pointers are passed around but only one thread at a time can possess a chunk of data.

@marktsuchida
Copy link
Member

Thanks for the detailed comments!

Regarding buffering: the point I was trying to get across is that we don't need a separate buffer of our own, even for processing. For most DAQ and TCSPC devices, the driver already buffers data, and provides you with a function to read buffered data. To use the TTTR example (and for now disregarding whether processing/saving is done in C++ or Java/Python), you can have a single thread simply doing:

  • Start acquisition
  • In a loop (until stopped or seen the expected amount of data), read, process, save
  • Stop acquisition (if it didn't stop itself)
    This whole sequence can be run in a background thread, similarly to MDA, on the application side. Even if this thread runs in the Core, as you propose, you wouldn't need a separate buffer.

The only case where this would not work is if the driver has a small, fixed buffer only, such that it overflows under normal usage unless a dedicated thread is pulling data from it very frequently. As far as I know the types of devices we're considering typically do not have such a limitation, so it does not make sense to me to force all devices to re-buffer data. But I'm happy to look into particular device APIs if you think they would require additional buffering.

Regarding processing in device adapters:

the device becomes self-sufficient as an object that communicates with hardware and produces data that’s ready for visualization

I find this to be the very problem, I'm afraid. Software modules should do one thing and do it well -- and this is two things. The issue in this case is that there are many ways to visualize time tag data (histograms, FCS, FLIM, phasors), each with many configuration parameters (I know this because I have written a full BH SPC to FLIM stack), and it is better if they are not tied to a particular device. It would make more sense, in my view, to have a completely separate library (not even tied to MM) that accepts time tag data streams (from different devices) and transforms them to interpretable data. (tttrlib is an example of such a library that already exists. As it happens, I'm also working on a C++/Python library that focuses on real-time TCSPC stream processing, but it's not ready for prime time.)

Another problem is that close coupling of device control with data processing usually ends up in a situation where it is hard to properly test one or the other independently, or to reuse the processing part on stored data. (I know that you have separate functions in the device API for acquisition and processing, which is great. But before we know it, somebody will write a device adapter where the processing algorithm depends on the current device settings.) We've made this mistake in various parts of MM already, and I would really like to avoid repeating it.

I think I would have had a slightly different opinion if we had a vendor-neutral streaming format for time tags: then it may have made sense for each device to convert its proprietary data to the common format. But as far as I know we don't have such a format (Photon-HDF5 can't be streamed), and inventing one would be a nontrivial project in itself, if done right.

If a significant part of the motivation behind this is wanting to avoid dealing with SWIG (which is absolutely understandable), would you be willing to work with me on that part?

Basically what I'm envisioning is that the Core does very little other than relaying function calls to the device adapter, and there would be an MMCore API function to read the buffered data, which calls into the device adapter. From the application programmer's point of view, the interface would be conceptually pretty similar to reading a file or network stream, where the OS is buffering and you simply call a "read" function (possibly with the maximum number of bytes to read).

I get that, while this would make both MMCore and the device adapters much simpler, it would require the application side (Python or Java) to deal with the raw data stream. I think this is in general a good thing: it allows app writers to mix and match devices and processing libraries, and to have a live view of data; it also makes it easier to test device I/O and data processing independently (I cannot emphasize the importance of this too much); and, finally, it gives more control to the languages (especially Python) where scientists tend to have the most expertise. However, I also see that it may be less convenient for your specific situation, given, perhaps, your expertise or existing code. Yet it is important that MMCore be designed to serve a general audience and promote the separation of concerns, so I'm hoping that we can come up with a way to make this work well for you, too, without losing sight of those goals.

@artmeln
Copy link
Contributor Author

artmeln commented May 22, 2023

I’ll try to summarize the approach to implementing the data streamer that you described in your last post in my own words as I understand it. If this is not accurate, please correct me.

  • The device adapter for a data streamer serves the function of communicating with the hardware and does nothing else; device-specific calls are implemented here by the user.
  • MMCore exposes some of the streamer adapter methods to the application but does not have any new functionality related to the streamer.
  • The application implements the stream and exposes stream controls (Start, Stop, etc.) to the user. The stream is implemented in the application as a single thread that repeatedly calls GetData, Process and/or Save, Visualize and does not buffer received or processed data.

If this is what you have in mind, I’m not opposed to this architecture (except for buffering, more on this later). I do have two questions though:

  • it is not clear to me how the user will be able to implement device specific processing steps when normally any customization happens through the device adapter or through a java plugin;
  • since different devices have different requirements for visualization, should the visualization step be abstracted from the streamer device?

If a significant part of the motivation behind this is wanting to avoid dealing with SWIG (which is absolutely understandable), would you be willing to work with me on that part?

Yes, I would very much appreciate your help with this aspect.

On the subject of buffering, here are two examples of devices for which, I believe, buffering is necessary:

  1. Pisoquant HydraHarp (as well as PicoHarp and, I believe, other devices). According to the manual (section 6.2 of HydraHarp400_SW_and_DLL_v3_0_0_4/HydraHarp%20HHLib%20v3.0.0.4/Windows/manual/manual_HHLibv3.0.0.4.pdf) the function HH_ReadFiFo is used to allocate a data buffer which is then filled with data. This is the step that would be implemented in the device adapter. My understanding is that the device driver facilitates USB transfers to the provided destination but does not do any additional buffering.
  2. When working with libusb as a library for USB transfers (in conjunction with WinUSB as a generic driver on Windows) from a piece of custom FPGA-based hardware, the application (in this case, the device adapter) allocates the buffer to fill and specifies the timeout for the transfer to complete. This situation is very similar to example 1 of Picoquant hardware.

Picoquant manual also suggests an acquisition scheme where acquisition, processing, and visualization all happen on separate threads. This architecture is also adopted by a Python package qudi that has support for data streamer devices. In my opinion, passing data from one thread to another would benefit from buffering so that it is always clear where the bottleneck is in this pipeline.

@marktsuchida
Copy link
Member

  • it is not clear to me how the user will be able to implement device specific processing steps when normally any customization happens through the device adapter or through a java plugin;

I agree: the app needs to know something about the device. But I don't think processing and/or saving to a file in the device adapter side helps solve this problem, if there is no standard for what the outcome of the processing should be. As far as I could tell, in the version you propose there is nothing to guarantee anything about what the data looks like or how to access it.

With an API that streams (device-specific) data to the app, we can talk about later adding a standard way to indicate the data format, for example "16-bit unsigned integers with 2 channels interleaved" or "TTTR data in PicoQuant HydraHarp V1 T3 format" (the format produced may depend on the current device settings). And an app can then choose to support only devices that produce a format it knows how to handle.

  • since different devices have different requirements for visualization, should the visualization step be abstracted from the streamer device?

There are categories of devices (analog DAQ, TCSPC, perhaps software-defined radio) that can acquire data that (after an appropriate transformation) can be visualized in ways that are common across different products, and yet there are multiple ways to do the visualization for the same type of raw data. For example, an analog DAQ could be used to visualize 1D waveforms like an oscilloscope, but its data could also be visualized as a confocal image. For TCSPC, the initial processing of the photon timestamp data does indeed depend on the device, but the method to assemble the data into a FLIM image, or FCS trace, etc., can be common.

At least that is why I don't think it makes sense to couple data processing to the device. (Plus the point about unit testing.)

If the visualization is tied to the device, we are talking about a very different architecture from Micro-Manager, and it is not clear to me what the benefit of using MMCore for this is -- as opposed to a separate library per-device -- except for superficially taking advantage of the device property system to plug into the MM GUI.

For things like FLIM, there is perhaps an alternative approach: we could have a "FLIM" device type, that works like a camera, except that it produces a series of 3D (x-y-tau) FLIM images (or we could talk about a more generic 3DCamera device type or adding proper 3D image support to Camera). Then I would have no problem with the device adapter containing the code to assemble FLIM images from the time tag stream. But I would still consider it unfortunate if the PicoQuant, BH, and Swabian device adapters had to duplicate nontrivial parts the FLIM histogramming code. But this is perhaps off-topic if we are talking about a data streamer device type.

  1. Pisoquant HydraHarp (as well as PicoHarp and, I believe, other devices). According to the manual (section 6.2 of HydraHarp400_SW_and_DLL_v3_0_0_4/HydraHarp%20HHLib%20v3.0.0.4/Windows/manual/manual_HHLibv3.0.0.4.pdf) the function HH_ReadFiFo is used to allocate a data buffer which is then filled with data. This is the step that would be implemented in the device adapter. My understanding is that the device driver facilitates USB transfers to the provided destination but does not do any additional buffering.

Thanks for pointing this out. I think you are right -- there seems to be no buffering by the driver, according to their description. So buffering in software is needed if the processing is jittery enough that it can temporarily fall behind the data acquisition and cause the device's FIFO to fill up (if the processing is slower than acquisition on average, buffering cannot save us). The HydraHarp 400 FIFO holds ~2M photons, which could fill up in a fraction of a second at highest throughput, so having a larger buffer on the PC side is probably a good idea.

So I stand corrected and we do need buffering, at least for some devices. And I do like your proposal of managing the acquisition thread (the thread that transfers data from device to the buffer) from MMCore. But to me it seems much more natural to also have the buffer itself live in MMCore, as opposed to DeviceBase (which is compiled into each device adapter DLL). These buffers and the associated threads are not that easy to get right, so it's good to have a single implementation that we can perfect. The only reason to put it in DeviceBase (that I can think of) would be if we wanted to let devices override the buffer implementation, but it is not clear to me that this is necessary. For the devices that don't need buffering (or need to do their own buffering for some reason that I cannot predict), the device adapter could provide this information and MMCore could disable buffering (without affecting the app-facing APIs) -- but I would be okay if this optimization were not part of the first version of this feature.

Another way to look at this is the often repeated software engineering advice to "prefer composition over inheritance". Having functionality in DeviceBase means it is poorly isolated from concrete device adapters, and nothing guarantees that the latter overrides the correct combination of methods in the correct way (in CameraBase there is some unfortunate code that simulates sequence acquisition using snaps, and this has lead to lots of buggy code in concrete camera adapters over the years; it would have been much cleaner to put such emulation (if at all necessary) into MMCore so that it is invisible from the device).

(Yes, I do think that DeviceBase itself is a design mistake, but we are stuck with it for now.)

Picoquant manual also suggests an acquisition scheme where acquisition, processing, and visualization all happen on separate threads. This architecture is also adopted by a Python package qudi that has support for data streamer devices. In my opinion, passing data from one thread to another would benefit from buffering so that it is always clear where the bottleneck is in this pipeline.

I don't have anything against this. My comments are about where these threads and buffers should be located, and which entity should have control over them. Some devices already buffer in the background (the "acquisition" thread is implicit) and we should not duplicate the buffering in those cases. When an acquisition thread and buffer is needed (as with PicoQuant), I like your idea of running the thread in the Core, but I don't see why the buffer should not also be managed by the Core. And I think processing and visualization (and file saving) is out of scope for MMCore/MMDevice and should be handled by the application.

@henrypinkard
Copy link
Member

Lots of great discussion and ideas here!

So I stand corrected and we do need buffering, at least for some devices. And I do like your proposal of managing the acquisition thread (the thread that transfers data from device to the buffer) from MMCore. But to me it seems much more natural to also have the buffer itself live in MMCore, as opposed to DeviceBase (which is compiled into each device adapter DLL). These buffers and the associated threads are not that easy to get right, so it's good to have a single implementation that we can perfect.

I strongly agree that setting up buffering in a correct, performant way is sufficiently complicated that we should make a single consensus version that lives in the the Core (or a small library thereof) that is reused across current and future device types. Working on this (#244) is the next major project I plan to undertake, though this will not begin for a few months. So it may make sense to have a temporary stopgap solution if this device type merges before that is ready, with the understanding that it will eventually come to rely on this component

@marktsuchida
Copy link
Member

I would perhaps not characterize it as stopgap -- the buffer for data streamer should be designed to do its job well. When a new, more universal buffer becomes available, at that point we can decide if unifying the implementation (ideally without breaking the APIs) is a win. I think getting one feature (like this one) working at a time is the only reasonable way to evolve MMCore, and this is not a bad thing.

@henrypinkard
Copy link
Member

henrypinkard commented May 25, 2023

To be clear, I'm not advocating that it should be designed poorly, just reiterating my previous suggestion:

I propose the following: what if we merge this (pending @marktsuchida's suggestions), but add in an "experimental_" prefix (i.e. experimental_DataStreamer). Essentially this means that we're not (yet) guaranteeing stability of its API, since it would be a better long term outcome if this works together with the new buffer model. So you and others can move forward with this device type, just with the understanding that any implementations may require changes once the new buffer model is ready.

It seems to me that it gives us more flexibility if things can be merged into main without full commitment to the shape of the API. More information and insight often become available after things have been beta tested to some degree, and it is nice to have the ability to try things out without the pressure of a finalized perfect form right away. Also, since I have a very concrete plan and timeline to implement this buffer, this is not the type of thing that will hang for an indeterminate time.

I don't there is "one reasonable way" to do this type of thing. I have seen the approach I propose taken on many large open source projects, and it seems to work well

@marktsuchida
Copy link
Member

No, I'm okay with marking it experimental, and I agree that things should be merged to main sooner than later once we agree that it is something we want. I'm not entirely sure how this relates to my previous comment, or what part of what I said suggested that I think there is "one reasonable way" to do things.

@henrypinkard
Copy link
Member

is the only reasonable way

This part. But I think we have just had miscommunication and we are actually on the same page

@marktsuchida
Copy link
Member

Ah! The "only reasonable way" being gradual evolution by getting things into main bit by bit (as opposed to having a mega-branch/fork with multiple new features being perfected first) -- so yes, I don't think there was a disagreement, we were talking about slightly different but related things.

@artmeln
Copy link
Contributor Author

artmeln commented Jun 8, 2023

I apologize for the delayed response. It looks like we are getting close to an agreement on how the data streamer should be implemented. If I understand correctly, this is the proposed architecture:

  1. Acquisition happens in the device adapter. The following methods will be implemented by the user:
    StartStream – tells the hardware to start streaming; if an acquisition thread is necessary, it can be set up here.
    StopStream – stops the stream.
    GetAcquiredDataSize, GetAcquiredData – provide application with access to the acquired data.
    GetStreamStatus – running, exited without error, exited due to overflow etc.

All these methods will be exposed to the application. In addition, MMCore should have a helper class for setting up the acquisition thread and its buffer. This class can be used by the device adapter if need be.

  1. Processing takes place in the application which acts as the consumer of the acquired data. A user could implement their own processing via a java plugin. The details don’t have to be specified immediately but I’m thinking of writing an example java MM plugin that creates a processing thread, calls the device adapter and saves the acquired data.

  2. Visualization can be discussed separately since it is not specific to data streaming. Perhaps a mechanism for plotting any standardized data should be added to MM.

@artmeln
Copy link
Contributor Author

artmeln commented Jun 27, 2023

@marktsuchida could you confirm that my understanding of the desired architecture from the last post is accurate? If yes, I'll start working on implementing item 1 and the helper class in MMCore.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants