Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate division features in July release #191

Open
skmoore opened this issue Jul 26, 2024 · 2 comments
Open

Duplicate division features in July release #191

skmoore opened this issue Jul 26, 2024 · 2 comments
Labels
bug Something isn't working divisions

Comments

@skmoore
Copy link

skmoore commented Jul 26, 2024

I'm seeing duplicate division features in the July release. There are a few patterns, some of which may be expected.

The value for local_type is different, so perhaps this is expected? In this example the capital_of_divisions column is identical for both features, but the values are too long to include here

id subtype local_type name
085e1b033fffffff0143e1f3681c0468 locality suburb Bratislava
085e1b033fffffff018e7dd51bb7f7c2 locality city Bratislava


Another example where local_type and capital_of_divisions have different values for each duplicate

id subtype local_type name capital_of_divisions
085cf0a87fffffff01899c602517d124 locality city Kingston [{division_id=085d2436ffffffff018dac84a99372bd, subtype=country}, {division_id=085cf0acbfffffff0132b9b77708bb1e, subtype=region}]
08516260bfffffff010a9b6d0cbde49e locality town Kingston [{division_id=08516260bfffffff01d5cb63d73e2437, subtype=county}, {division_id=085a391c7fffffff01dcadc0d18a31d7, subtype=country}]
08520ed77fffffff01152a636d6595da locality hamlet Kingston [{division_id=08520e87bfffffff01f0d8333eaebd81, subtype=country}]


Others are basically exact matches of each other

id subtype local_type name capital_of_divisions
085b2cd0ffffffff01ec1a70e2268d2a locality city Lefkoşa [{division_id=085b39333fffffff01f145d7f9110d70, subtype=country}]
085b2cd0ffffffff012a65eb36fb45c1 locality city Levkosia [{division_id=085b39333fffffff01f145d7f9110d70, subtype=country}]
085b2cd0ffffffff01861b0866ec3f98 locality city Levkosia [{division_id=085b39333fffffff01f145d7f9110d70, subtype=country}]
085b2cd0ffffffff01aa2be6f778df73 locality city Levkosia [{division_id=085b39333fffffff01f145d7f9110d70, subtype=country}]
@skmoore skmoore added bug Something isn't working Admins labels Jul 26, 2024
@stepps00
Copy link

Thanks for the examples @skmoore.

Today, multiple place tags from OpenStreetMap - the local_type values you're seeing - are used to generate locality entities in the divisions theme. So for the Bratislava example, because those places are represented though multiple features in OSM with suburb and city place tags, multiple entities are generated in Overture. This is not ideal and is causing the duplicate and overlap issues you're seeing, so some changes are being planned to the ingestion pipeline as a fix.

The Kingston examples are actually legitimate entities - one in Jamaica, one in Tasmania, and one in Norfolk Island, hence they have unique capital_of_divisions values.

The issue with the Levkosia example is slightly different, as multiple entities were generated even though they all share the same place tags / local_type value. Running this query in duckdb

SELECT
	id,
	sources[1].dataset as dataset,
	sources[1].record_id as concordance_id
FROM
	read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=divisions/type=*/*', filename=true, hive_partitioning=1)
WHERE
	id in ('085b2cd0ffffffff01ec1a70e2268d2a','085b2cd0ffffffff01aa2be6f778df73','085b2cd0ffffffff01861b0866ec3f98','085b2cd0ffffffff012a65eb36fb45c1');

you'll see four unique OSM features

┌──────────────────────────────────┬───────────────┬────────────────┐
│                id                │    dataset    │ concordance_id │
│             varchar              │    varchar    │    varchar     │
├──────────────────────────────────┼───────────────┼────────────────┤
│ 085b2cd0ffffffff01ec1a70e2268d2a │ OpenStreetMap │ R16283715      │
│ 085b2cd0ffffffff01861b0866ec3f98 │ OpenStreetMap │ R2628520       │
│ 085b2cd0ffffffff01aa2be6f778df73 │ OpenStreetMap │ N1893015330    │
│ 085b2cd0ffffffff012a65eb36fb45c1 │ OpenStreetMap │ R2628521       │
└──────────────────────────────────┴───────────────┴────────────────┘

Ideally, a single entity would be maintained on Overture's end for this locality.

Both of these issues are related and similar to a discussion around localities here. There is no timeline for a fix yet, but once some action is taken, we can share a progress update. We're hoping to make some pipeline updates soon, so this should be corrected in one of the upcoming releases.

Feel free to add additional examples, they're very helpful.

@skmoore
Copy link
Author

skmoore commented Jul 26, 2024

@stepps00 Thanks for the info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working divisions
Projects
None yet
Development

No branches or pull requests

3 participants