You must be signed in to change notification settings - Fork 3
Processing Algorithm Bugs and Enhancements
Dumping ground for collating issues and enhancements for just the CORE processing algorithms (e.g. dissolve/intersection/overlay type algorithms... not "point in polygon distributed by statistical t-test with assumption of integer weighted principal components" type algorithms)
There's a lot of bug reports related and it's hard to tell which are current/outdated/apply to 2.x/apply to 3.x only. But we want to make processing rock solid for 3.0, so let's use this space to collaborate and fix the mess for good :)
- dissolve
Dissolve algorithm in QGIS compared to ArcGIS needs some love and optimizations to make it more robust facing invalid or on-the-edge geometries (topologically speaking). Here is the list of working optimization found (checking means it's been ported into the alg):
- [] Explode multi part geometries, then collect them before running UnaryUnion will avoid many topology errors for all boundaries inside polygons to dissolve. Raw Union is a lot less smart. Using Postgis, what revealed a lot more robust for very large datasets:
select "tableA", "codeTableA",
st_area(ST_UnaryUnion(st_collect(geom_intersection))) geomdissolvedarea, --Unary
st_area(ST_UnaryUnion(st_collect(geom_intersection))) / "areaObjectA" as ratio_couv
'public.communes'::character varying "tableA",
a.code::character varying "codeTableA",
b.code::character varying "codeTableB",
'public.couv4g'::character varying "tableB",
st_intersection(st_snaptogrid(a.geom, 0.0001) , st_snaptogrid( ( CASE WHEN st_isvalid(b.geom) then b.geom ELSE st_makevalid(b.geom) END ), 0.0001)) as geom_intersection, -- snaptogrid to avoid some precision issues
-- >>>>> dump all objects in tableA >>>
( select code_com::character varying code, (st_dump(geom)).geom geom from public.communes ) a
( -- >>>>> dump all objects in tableB >>>
select code_bds as code, st_dump(geom) from public.couv4g
) b
ON (st_intersects(a.geom, b.geom)) -- spatial relation and index clause
) as crosselem
GROUP BY "tableA", "codeTableA", "areaObjectA"
[] Raise GEOS version to 3.6.2 where an issue was fixed : https://trac.osgeo.org/geos/ticket/837
[] check for OGC validity before doing the UNION, and run makeValid in this case.
Those three optimizations give a robust algorithm, but it appears to be still slower than ESxxx equivalent.
- intersection
new ticket (replaces older ones) here: https://issues.qgis.org/issues/17131 sample project and datasets: https://issues.qgis.org/attachments/download/11399/union_intersection_test_datasets.zip
results seems ok if within the input layers there are no overlaps (first test project/dataset). If within an input layer there are overlaps (second test project/dataset) then results are wrong (missing features and attributes).
- union
new ticket (replaces older ones) here: https://issues.qgis.org/issues/17131 sample project and datasets: https://issues.qgis.org/attachments/download/11399/union_intersection_test_datasets.zip
results are wrong (geometries and attributes).
Interesting new observation: https://issues.qgis.org/issues/17131#note-2
symmetrical difference
select/extract by location and join by location: These algorithms are currently optimised for the most common use case of joining many features against a few features (e.g. joining millions on points to a localities table). Heuristics should be added to detect when a user is performing a few to many (e.g. find localities which contain points from a million point table) or many-to-many joins (e.g. joining centroid points of parcel boundaries to the polygon parcel boundaries themselves) and run optimised logic for these cases.
Port PointsLayerFromTable to feature based algorithm, allow for in-place with point input tables
Keep N biggest parts - port to in place, fix