You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is not a problem with the current experimental design that we have been discussing, nonetheless I'd like to raise the follwoing question:
We currently rely on Pandas for holding data - this is not efficient for extremely large datasets (i.e. larger than allocated memory). Is our scope to additionally support other methods that would allow for an easier distibution / parallelization?
I know we can scale on disk as long as the partitions fit in memory, but this is not computationally efficient. Thus opening the discussion.
Alternatives would be, for example, Dask or Ray, eventually behind Modin (but that increases our dependency tree).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
This is not a problem with the current experimental design that we have been discussing, nonetheless I'd like to raise the follwoing question:
I know we can scale on disk as long as the partitions fit in memory, but this is not computationally efficient. Thus opening the discussion.
Alternatives would be, for example, Dask or Ray, eventually behind Modin (but that increases our dependency tree).
Beta Was this translation helpful? Give feedback.
All reactions