Skip to content

Processing Algorithm Bugs and Enhancements

Nyall Dawson edited this page Sep 21, 2018 · 9 revisions

Dumping ground for collating issues and enhancements for just the CORE processing algorithms (e.g. dissolve/intersection/overlay type algorithms... not "point in polygon distributed by statistical t-test with assumption of integer weighted principal components" type algorithms)

There's a lot of bug reports related and it's hard to tell which are current/outdated/apply to 2.x/apply to 3.x only. But we want to make processing rock solid for 3.0, so let's use this space to collaborate and fix the mess for good :)

  • dissolve

Dissolve algorithm in QGIS compared to ArcGIS needs some love and optimizations to make it more robust facing invalid or on-the-edge geometries (topologically speaking). Here is the list of working optimization found (checking means it's been ported into the alg):

  • [] Explode multi part geometries, then collect them before running UnaryUnion will avoid many topology errors for all boundaries inside polygons to dissolve. Raw Union is a lot less smart. Using Postgis, what revealed a lot more robust for very large datasets:
select "tableA", "codeTableA", 
        st_area(ST_UnaryUnion(st_collect(geom_intersection))) geomdissolvedarea, --Unary 
        st_area(ST_UnaryUnion(st_collect(geom_intersection)))  / "areaObjectA" as ratio_couv

        FROM

        (
        Select
                'public.communes'::character varying "tableA",
                a.code::character varying  "codeTableA",
                b.code::character varying  "codeTableB",
                'public.couv4g'::character varying "tableB",
                st_intersection(st_snaptogrid(a.geom, 0.0001) , st_snaptogrid( ( CASE WHEN st_isvalid(b.geom) then b.geom ELSE st_makevalid(b.geom) END ), 0.0001))  as geom_intersection, -- snaptogrid to avoid some precision issues
        FROM

                    -- >>>>> dump all objects in tableA >>>
                    (  select code_com::character varying code, (st_dump(geom)).geom geom  from public.communes   ) a

            JOIN
                (  -- >>>>> dump all objects in tableB >>>

                select code_bds as code, st_dump(geom) from  public.couv4g 

                 ) b
            ON  (st_intersects(a.geom, b.geom)) -- spatial relation and index clause

        ) as crosselem
GROUP BY  "tableA",  "codeTableA", "areaObjectA" 

Those three optimizations give a robust algorithm, but it appears to be still slower than ESxxx equivalent.

  • intersection

new ticket (replaces older ones) here: https://issues.qgis.org/issues/17131 sample project and datasets: https://issues.qgis.org/attachments/download/11399/union_intersection_test_datasets.zip

results seems ok if within the input layers there are no overlaps (first test project/dataset). If within an input layer there are overlaps (second test project/dataset) then results are wrong (missing features and attributes).

  • union

new ticket (replaces older ones) here: https://issues.qgis.org/issues/17131 sample project and datasets: https://issues.qgis.org/attachments/download/11399/union_intersection_test_datasets.zip

results are wrong (geometries and attributes).

Interesting new observation: https://issues.qgis.org/issues/17131#note-2

  • difference

  • symmetrical difference

  • buffer

  • select/extract by location and join by location: These algorithms are currently optimised for the most common use case of joining many features against a few features (e.g. joining millions on points to a localities table). Heuristics should be added to detect when a user is performing a few to many (e.g. find localities which contain points from a million point table) or many-to-many joins (e.g. joining centroid points of parcel boundaries to the polygon parcel boundaries themselves) and run optimised logic for these cases.


IN PLACE

  • Port PointsLayerFromTable to feature based algorithm, allow for in-place with point input tables

  • Keep N biggest parts - port to in place, fix

Clone this wiki locally