Integrate with Kartothek #30

dharhas · 2020-02-27T16:06:49Z

Is your feature request related to a problem? Please describe.

SpatialPandas helps spatially sort data but we are seeing the need for higher level arbitrary indexing. Two example use cases:

Geospatial. We have spatially sorted daily GPS data for the US for multiple days. Getting a small region for a 60-90 day process can get bogged down by the need to read the 60-90 multiple metadata files and construct the task graph.
Astronomy. We have spatial data for multiple filters (HSC-Y, HSC-G etc). Again we would have to read multiple metadata files.

Describe the solution you'd like

The above could be fixed by building higher level indexes. I think we can benefit from integrating with kartothek. It enables an O(1) index and creates the necessary task graphs for reading just the partitions required. It could also be used to store the extra metadata spatialpandas currently stores in its own format (if I'm understanding spatialpandas correctly)

I'm at the Dask Dev Conference with some of the Kartothek devs and based on conversations with fjetter this integration should be possible.

jbednar · 2020-07-16T22:48:01Z

Sounds cool to me!

jbednar added this to the Wishlist milestone Jul 16, 2020

jbednar added the enhancement New feature or request label Jul 16, 2020

jbednar added the help wanted Extra attention is needed label Jul 27, 2020

jorisvandenbossche mentioned this issue Aug 20, 2020

Non-index-based partitioning of Dask DataFrames dask/dask#6246

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate with Kartothek #30

Integrate with Kartothek #30

dharhas commented Feb 27, 2020

jbednar commented Jul 16, 2020

Integrate with Kartothek #30

Integrate with Kartothek #30

Comments

dharhas commented Feb 27, 2020

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

jbednar commented Jul 16, 2020