-
Notifications
You must be signed in to change notification settings - Fork 3
Processing Algorithm Bugs and Enhancements
Dumping ground for collating issues and enhancements for just the CORE processing algorithms (e.g. dissolve/intersection/overlay type algorithms... not "point in polygon distributed by statistical t-test with assumption of integer weighted principal components" type algorithms)
There's a lot of bug reports related and it's hard to tell which are current/outdated/apply to 2.x/apply to 3.x only. But we want to make processing rock solid for 3.0, so let's use this space to collaborate and fix the mess for good :)
- dissolve
Dissolve algorithm in QGIS compared to ArcGIS needs some love and optimizations to make it more robust facing invalid or on-the-edge geometries (topologically speaking). Here is the list of working optimization found (checking means it's been ported into the alg):
- [] Explode multi part geometries, then collect them before running UnaryUnion will avoid many topology errors for all boundaries inside polygons to dissolve. Raw Union is a lot less smart. Using Postgis, what revealed a lot more robust for very large datasets:
select "tableA", "codeTableA",
st_area(ST_UnaryUnion(st_collect(geom_intersection))) geomdissolvedarea, --Unary
st_area(ST_UnaryUnion(st_collect(geom_intersection))) / "areaObjectA" as ratio_couv
FROM
(
Select
'public.communes'::character varying "tableA",
a.code::character varying "codeTableA",
b.code::character varying "codeTableB",
'public.couv4g'::character varying "tableB",
st_intersection(st_snaptogrid(a.geom, 0.0001) , st_snaptogrid( ( CASE WHEN st_isvalid(b.geom) then b.geom ELSE st_makevalid(b.geom) END ), 0.0001)) as geom_intersection, -- snaptogrid to avoid some precision issues
FROM
-- >>>>> dump all objects in tableA >>>
( select code_com::character varying code, (st_dump(geom)).geom geom from public.communes ) a
JOIN
( -- >>>>> dump all objects in tableB >>>
select code_bds as code, st_dump(geom) from public.couv4g
) b
ON (st_intersects(a.geom, b.geom)) -- spatial relation and index clause
) as crosselem
GROUP BY "tableA", "codeTableA", "areaObjectA"
-
[] Raise GEOS version to 3.6.2 where an issue was fixed : https://trac.osgeo.org/geos/ticket/837
-
[] check for OGC validity before doing the UNION, and run makeValid in this case.
Those three optimizations give a robust algorithm, but it appears to be still slower than ESxxx equivalent.
- intersection
new ticket (replaces older ones) here: https://issues.qgis.org/issues/17131 sample project and datasets: https://issues.qgis.org/attachments/download/11399/union_intersection_test_datasets.zip
results seems ok if within the input layers there are no overlaps (first test project/dataset). If within an input layer there are overlaps (second test project/dataset) then results are wrong (missing features and attributes).
- union
new ticket (replaces older ones) here: https://issues.qgis.org/issues/17131 sample project and datasets: https://issues.qgis.org/attachments/download/11399/union_intersection_test_datasets.zip
results are wrong (geometries and attributes).
Interesting new observation: https://issues.qgis.org/issues/17131#note-2
-
difference
-
symmetrical difference
-
buffer
-
select/extract by location and join by location: These algorithms are currently optimised for the most common use case of joining many features against a few features (e.g. joining millions on points to a localities table). Heuristics should be added to detect when a user is performing a few to many (e.g. find localities which contain points from a million point table) or many-to-many joins (e.g. joining centroid points of parcel boundaries to the polygon parcel boundaries themselves) and run optimised logic for these cases.
IN PLACE
-
Port PointsLayerFromTable to feature based algorithm, allow for in-place with point input tables
-
Keep N biggest parts - port to in place, fix