You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current data synchronisation implementation, in particular with regards to finding overlapping contiguous chunks across data sources, might ultimately require a lot of memory if the time series is long enough/the sampling is rate is too high.
P. Fluxa mentions:
A colleague of mine and I figured out a "compressed" way for synchronising chunks, which requires knowing of the start and end times of every interval. That is very cheap to obtain and scales as O(n). Then, the operation of finding all relevant intervals (the ones where there is data in all "channels") scales even better as it only depends in the number of intervals found.
This is a quick-and-dirty implementation showing how it works:
"""Sample script showing the solution of the following problem:"given N channels of data with R continous ranges each, find all theranges where there is data for all N channels""""importrandomimportpandasimportnumpyimportmatplotlib.pyplotaspltfrommatplotlib.patchesimportRectangle# create a set of random ranges. this is just formalitynumChan=5nRanges=10data=list()
fornchinrange(numChan):
ms=random.randint(0, 5)
fornrinrange(nRanges):
jitter1=0jitter2=1#random.randint(2, 6)width=7start=ms+jitter1end=start+widthentry=dict()
entry['start'] =startentry['sflag'] =1entry['end'] =endentry['eflag'] =-1entry['channel'] =nchentry['rangeidx'] =nrdata.append(entry)
ms=end+jitter2rangesdf=pandas.DataFrame(data)
# extract all timestamps from ranges, keeping track of whether they# correspond to start or end of rangestimest=rangesdf['start'].values.tolist()
flags=rangesdf['sflag'].values.tolist()
flags+=rangesdf['eflag'].values.tolist()
timest+=rangesdf['end'].values.tolist()
# build intermediate dataframesdf=pandas.DataFrame(dict(st=timest, flag=flags))
sdf.sort_values(by='st', inplace=True)
cumsum=sdf.flag.cumsum()
print(cumsum)
cr=numpy.where(cumsum==numChan)
crlist=cr[0].tolist()
crarr=list()
foreincrlist:
crarr.append(e)
crarr.append(e+1)
crarr=numpy.asarray(crarr)
crmask=tuple((crarr,))
cmnRanges=sdf.iloc[crmask].st.values.reshape((-1, 2))
# make a figure showing the resultfig, ax=plt.subplots()
# plot all rangesforidx, entryinrangesdf.iterrows():
xs=entry['start']
xe=entry['end']
ys=entry['channel']
ax.hlines(ys, xs, xe)
# plot commmon rangesforcrincmnRanges:
# avoid drawing ranges with no widthifcr[1] ==cr[0]:
continueax.vlines(cr[0], 0, numChan,
color='red', alpha=0.5, linestyle='--', linewidth=0.5)
ax.vlines(cr[1], 0, numChan,
color='red', alpha=0.5, linestyle='--', linewidth=0.5)
plt.savefig('ranges.pdf')
And this is the kind of the result you get
The text was updated successfully, but these errors were encountered:
The current data synchronisation implementation, in particular with regards to finding overlapping contiguous chunks across data sources, might ultimately require a lot of memory if the time series is long enough/the sampling is rate is too high.
P. Fluxa mentions:
The text was updated successfully, but these errors were encountered: