Add the ability to rate limit edits #4319

tomhughes · 2023-10-29T19:38:26Z

This aims to implement #2342 by rate limiting edits.

The limits work in a similar way to the changeset comment limits and are configurable with the defaults being a ramp up from 1000 changes per hour to 100000 per hour over the first week after a user starts editing. Users that are reported lose some allowance subject to a minimum that defaults to 100 edits per hour.

The increase is non linear, increasing as the square of time elapsed by subject to the lower bound of the initial limit so that it starts to climb after about 18 hours as shown in this graph which shows hours on the X axis versus allowed edits per hour on the Y axis.

Moderators have a separately configurable rate limit and this also introduces a new importer role that has a higher rate limit and which can be granted to accounts doing large imports/bulk edits.

Obviously once this agreed cgimap will need work to respect the rate limits.

mmd-osm · 2023-10-30T07:19:00Z

Obviously once this agreed cgimap will need work to respect the rate limits.

It would probably be quicker to handle all non-moderator changeset uploads on the Rails port for a while and get some experience with the approach first. Working on both rails and cgimap is likely slowing things down too much.

zstadler · 2023-10-30T08:26:35Z

Will it be possible to use the accumulated number of changes rather than the elapsed time as the measure by which the rate is increased?

I can see a malicious account creating a benign changeset, waiting a week doing nothing suspicious, and then receiving a decent rate.
Alternatively, providing an initial 10000 benign edits, working for a period of at least 10 hours, in order to get a rate increase is much harder to achieve. On the other hand, I do not believe that would be unacceptable for new inexperienced users.

tomhughes · 2023-10-30T08:31:57Z

I can see it would be better - my concern is the cost of computing that value as it means reading all the changesets for that user unless we do something to cache the value.

zstadler · 2023-10-30T08:52:27Z

Yes, it would be reading the changesets, not the change files.
It is not fundementally different than what's done in https://www.openstreetmap.org/user/.

It would be useful to cache the open changesets. Still, Perfect is the enemy of good

tomhughes · 2023-10-30T08:54:22Z

It is not fundementally different than what's done in https://www.openstreetmap.org/user/.

I don't understand what you mean? Nothing on the user page shows the number of changes - the number of changesets yet but not the number of objects changed.

It would be useful to cache the open changesets. Still, Perfect is the enemy of good

What does the number of open changesets have to do with anything?

tomhughes · 2023-10-30T08:56:24Z

It would probably be quicker to handle all non-moderator changeset uploads on the Rails port for a while and get some experience with the approach first. Working on both rails and cgimap is likely slowing things down too much.

There's no way for apache to know if the user is a moderator or not so we can't make redirects conditional like this and in any case the performance impact of doing that is likely to be substantial.

zstadler · 2023-10-30T09:19:31Z

I'm deeply sorry I was not expressing myself clearly.

It is not fundementally different than what's done in https://www.openstreetmap.org/user/.

I don't understand what you mean? Nothing on the user page shows the number of changes - the number of changesets yet but not the number of objects changed.

I meant to say that GET /api/0.6/changesets can provide the information needed to compute the total number of changes made by an account.

This is in contrast to an alternative approach of reading and analyzing the associated osmChange data using GET /api/0.6/changeset/#id/download calls for each changeset.

I was assuming the "Edit" link on the users page is using the GET /api/0.6/changesets API point.

It would be useful to cache the open changesets. Still, Perfect is the enemy of good

What does the number of open changesets have to do with anything?

I meant to say that if there is a cache for the total number of changes made by each user, than at any given moment, the number of useful lines in the the cache would be at the same order of magnitude as the number of open changesets in the system.

tomhughes · 2023-10-30T09:25:54Z

Well sure the data is available (which is all the API being able to fetch it proved) but we put limits on that API call - we don't just allow people to fetch a user's entire edit history because of the cost of doing that.

Talking about the API is irrelevant anyway because the server code doesn't use the API or parse XML to do things like this, it would just execute current_user.changesets.sum(:num_changes) but that will turn into something like:

SELECT SUM("changesets"."num_changes") FROM "changesets" WHERE "changesets"."user_id" = $1

which forces the database to read all the changeset records for the user.

I doubt the edit link uses that API call either - why would iD care about your previous changesets?

drolbr · 2023-10-30T10:38:55Z

Talking about the API is irrelevant anyway because the server code doesn't use the API or parse XML to do things like this, it would just execute current_user.changesets.sum(:num_changes) but that will turn into something like:
SELECT SUM("changesets"."num_changes") FROM "changesets" WHERE "changesets"."user_id" = $1

Can we take advantage of changesets_user_id_created_at_idx and look only into the last hour or last 24 hours or so?

tomhughes · 2023-10-30T10:45:11Z

We'd certainly use that index yes (in fact current_user.changesets does but the sum causes the order to be lost) but limiting by date is probably not a good idea as it would penalise people with lots of infrequent edits.

Better is probably only to look at the last 100/200/1000 or whatever changesets.

mmd-osm · 2023-10-30T11:54:32Z

There's no way for apache to know if the user is a moderator or not so we can't make redirects conditional like this and in any case the performance impact of doing that is likely to be substantial.

So for our DWG members using osmtools to revert changesets, there won't be any change, since they're using the single object upload on Rails anyway. I'm thinking about whitelisting some IP addresses used by specialized revert tools, such as Thanos. This of course assumes that the tool (+server ip address) would be somehow restricted to mods only. (ping @Zaczero).

For sure there will be a performance impact for everyone else. As long as our revert tools can't really cope with the data volume, and we're still trying to work out a good strategy to identify excessive uploads, this seems acceptable to me as a temporary measure. Are you concerned that spike-0[678] can't cope with the additional load?

tomhughes · 2023-10-30T12:01:08Z

Well I'm more concerned about slowing down upload calls than about literally overloading the servers.

AntonKhorev · 2023-10-30T12:44:00Z

since they're using the single object upload on Rails anyway

not necessarily

app/controllers/api/changesets_controller.rb

app/models/user.rb

test/helpers/user_roles_helper_test.rb

matkoniecz · 2023-10-30T13:01:50Z

to 100000 per hour over the first week after a user starts editing

is such rapid growth needed? 100 000 per hour after just week is still pretty high (though I have not analysed actual editing patterns, so maybe I am underestimating how fast people often scale up their editing after joining)

tomhughes · 2023-10-30T13:06:14Z

Well there are three questions - what is an appropriate final target, how long should it take to get there, and what curve should we use to get to that target.

I did consider a month (well four weeks) rather that one week but that then has an impact at the end other end in that it takes longer before the limit starts to grow and I thought it might be keeping things too long for too long.

Of course if we switch to basing it on edit counts instead of time then that goes away as an issue.

Zaczero · 2023-10-30T14:41:43Z

I believe a 1000 changes per hour is too low. New users should be able to create at least 1 full changeset (10_000 changes) to avoid confusion. However, they should not be able to create 20_000 changes. Perhaps make it a daily limit instead of hourly, starting from max 20_000 changes per day.

tomhughes · 2023-10-30T14:44:11Z

I think allowing 10000 is far too much given what we're trying to combat here - no real user that hasn't dived straight into an import is going to be generating changesets that big.

I have never done a changeset that big in 16 years of editing!

Zaczero · 2023-10-30T14:46:11Z

I personally dislike the new importer role; everyone should be able to perform imports without much of an issue. In other words, the rate limit should be configured in a way that does not disrupt imports, assuming the account is in good condition.

Zaczero · 2023-10-30T14:47:35Z

I have never done a changeset that big in 16 years of editing!

It's not hard to imagine a newbie who performs 10_000 changes in a day (and does a single, bulk upload).

tomhughes · 2023-10-30T15:00:04Z

I see better control of import/bulk edits as a useful side effect of this to be honest but other people may disagree.

The main reason for the importer role though was that it was very difficult to come up with an algorithm that would block the abuse without disputing them - there are importers that have got close to half a million edits in an hour so you would need to increase the maximum by quite a bit to allow them all.

Newbies doing lots of edits offline is a concern though that's not good practice and most newbies probably start with iD these days rather than an offline editor like JOSM.

Zaczero · 2023-10-30T15:17:48Z

One more argument: reverting 20,000 changes is not difficult, but when it's 2 million or more, it becomes challenging. Whether we save an additional 10,000 changes or not is insignificant and may potentially cause more harm than good.

zstadler · 2023-10-30T16:06:16Z

I request that the requirements will continue to be discussed at the issue - #2342 (comment)

Is it a limit on:

The number of changesets?
*The changes_count of the changesets?
*The size in bytes of the change data (rate limit)?

or something else?

What are the initial limits?

How should they be increased?

IMHO, the PR is a place to discuss the implementation of these requirements.

I was mistaken to combine requirements and implantation aspects in the same comments above.

gravitystorm · 2023-10-30T21:26:19Z

PG::ActiveSqlTransaction: ERROR:  ALTER TYPE ... ADD cannot run inside a transaction block
/app/config/initializers/migrate.rb:55:in `add_enumeration_value'
/app/db/migrate/20231029151516_add_importer_role.rb:3:in `up'

tomhughes · 2023-10-30T22:28:46Z

Ah so that is what's going on with docker.... I couldn't see what the problem was but the error messages are weirdly in the middle of the output.

So it seems that restriction was relaxed in postgres 12 which is why the main tests work as the GitHub ubuntu images use postgres 14 but our docker setup still uses postgres 11...

501Ghost · 2023-11-29T02:45:17Z

Hi. I just want to say that I've managed to hit the rate limit with https://www.openstreetmap.org/changeset/144568191 and subsequent changesets. It kicked in after 100k changes. An hour later I was able to upload the remaining 102k changes of my huge changeset without hitting the rate limit.

I'm happy that the rate limit exists, and the number is more than reasonable, but I was a bit surprised by it, and stressed out because I was afraid it would lead to strange editing issues/conflicts. Luckily that didn't happen and everything went smoothly.

Thank you for this nice implementation of the rate limit that didn't mess up my oversized edit.

anthaas · 2023-11-29T18:51:03Z

Hi, in a mapathon in Brno, Czechia lots of attendees are new mappers creating their OSM accounts for this mapathon. They are mapping https://tasks.hotosm.org/projects/15495/tasks which includes round buildings. The limit is absurd since they are trying to upload 3k+ nodes (mapped in 10 minutes) and they never will be able to do that.

tomhughes · 2023-11-29T18:53:57Z

Any first time user that is mapping 3000 nodes in ten minutes cannot possibly be doing high quality work - that is 5 nodes a second, every second.

anthaas · 2023-11-29T18:56:47Z

Building tools in JOSM creates 10-12 nodes for each bulding and it's only two click of the mouse. I can imagine very lively that result meets OSM standards.

woodpeck · 2023-11-29T19:01:12Z

That would still mean 500 clicks of the mouse in 600 seconds - not counting any potential zoom operations. Frankly, this sounds like a recipe for problems, and HOT should be thankful for a rate limit that avoids cementing their bad rep for low-quality edits.

DwynDwe · 2023-11-29T19:18:53Z

It is relatively easy to make this many edits in a short time. It's not necessarily poor quality work, especially when there are validators. Unfortunately, the current restriction discourages new users who want to contribute to openstreetmap.

HarelM · 2023-11-29T19:52:09Z

Taking it a bit more easy and learning the system before making these huge edits makes a lot of sense, IMHO.
I would consider investing more time in teaching these new users how to properly map before making such huge amount of editing.

As I said before, if the lack of security (which was before this limit was introduced) discouraged veteran mappers from editing (which it did) I think there's no real question here which users are more valuable, at least for me it's clear...

matkoniecz · 2023-11-30T07:57:32Z

The limit is absurd since they are trying to upload 3k+ nodes (mapped in 10 minutes)

@anthaas Can you provide representative sample of edits they are trying to send? JOSM can save not uploaded edits, and it is possible to open them elsewhere (in another JOSM instance). It would allow to look into whether it was good idea to block such edits. Also, you can use it to upload edits from another account (if these edits were worth uploading)

I see some building mapped as round in this area, what maybe contributes to larger volume. But first time user that is mapping 3000 nodes in ten minutes? I am really curious about data quality. If that is typical problem, can you make and post screen capture of editing activity of such user? How they edit to end with that?

(I tried to look at edit quality in this area in edits done by HOT mappers but HOT tasking manager seems to not be making easy to review edits done with it, I ended creating hotosm/tasking-manager#6147 hotosm/tasking-manager#6148 https://hotosm.slack.com/archives/C4GLC45PY/p1701330196012119 before running out of time )

mmd-osm · 2023-11-30T16:47:22Z

@matkoniecz : maybe try osmcha.org using changeset comment "#hotosm-project-15495". Some of the larger changesets have a fair amount of round buildings, like this one: https://overpass-api.de/achavi/?changeset=144010949&relations=true - it seems these buildings have been mostly created by copy & pasting an existing round building, which takes almost no time at all in JOSM.

zstadler · 2023-11-30T17:35:50Z

It is very unlikely that a new user will do their edits with JOSM.
The above example is the user's 50'th edit.
In his first hour of OSM editing he created about 700 elements. That was also for an hotosm project.

Like @matkoniecz,, I would like to see a real-world new user that hits the current rate limit.
Like @HarelM, I'd rather limit the rate of contribution for new users than have veteran mappers leave OSM. Not to mention the toll on the DWG members and the community members that fights vandalism.

Personally, I've spent more than 3 weeks of work in reverting the latest vandalism in Israel as well as recovering from well-meaning attempts to re-map the affected areas. I did that knowing this PR is in the works.

anthaas · 2023-11-30T17:40:47Z

@matkoniecz I don't like the idea of uploading changes for other users and taking their credits. That is not the idea behind mapathons. I believe you are familiar with the workflow of humanitarian mapping using the Tasking Manager. So when a user locks a task and tries to upload mapped buildings, the next step is marking the task as mapped and locking another one. This workflow is broken by this limit since the mapper shouldn't mark the task as mapped in the Tasking Manager because changes aren't uploaded yet.

Standard workflow of the mapping is to sweep the whole task, map buildings (or other features) and submit the work. In the JOSM training session, a buildings tools plugin and mapathoner plugin is introduced to ensure buildings are squared or rounded and tagged properly. And yes, we are having a JOSM training even for the new mappers. Training session usually looks like a 1 hour live demo by an experienced mapper on how to identify buildings and how to map followed by Q&A and individual mapping.

mappingInOneMinute.osm.txt
I’m attaching a file demonstrating a one minute test of the mapping.

SomeoneElseOSM · 2023-11-30T22:10:55Z

To address some of the points above:

Where new users map their first changesets very quickly, and especially when they do so in JOSM, there is a clear correlation with poor-quality editing. A significant subset of DWG tickets created about new users have simply been about poor editing quality, and our advice is often "slow down, and make smaller but better quality edits". This has been more noticeable since the recent waves of vandalism where large number of objects were edited (or deleted) in a short period of time - we (the DWG) have been using various means to detect this, and in a few cases have found ourselves dealing with poor quality new editors who just needed to be encouraged to slow down a bit.

To address @zstadler's point above - these are "real-world new users that hit the current rate limit", but even though they're not mapping in bad faith, it still makes sense for them to be asked to slow down a bit. As an example from the changeset linked above, https://www.openstreetmap.org/way/1223453025 has been mapped as "perfectly circular" with 30 nodes; the imagery suggests perhaps something closer to a hexagon.

mmd-osm · 2023-11-30T22:44:22Z

I suppose buildings in the area are really circular, like shown on this example: https://upload.wikimedia.org/wikipedia/commons/4/43/House_in_Toteil_002.jpg ... we just don't have an established tagging as a node with building=african_hut and diameter=xyz, so people end up creating ways with lots of nodes to approximate the shape of the building.

matkoniecz · 2023-12-01T11:34:54Z

https://community.openstreetmap.org/t/the-new-rate-limiting-prevents-participants-to-missing-maps-mapathons-from-saving-buildings/106610/ was created

Janabau · 2023-12-01T14:40:11Z

Taking it a bit more easy and learning the system before making these huge edits makes a lot of sense, IMHO. I would consider investing more time in teaching these new users how to properly map before making such huge amount of editing.

As I said before, if the lack of security (which was before this limit was introduced) discouraged veteran mappers from editing (which it did) I think there's no real question here which users are more valuable, at least for me it's clear...

This is not a question which users are more valuable @mmd-osm. It is not either, or. The open mapping community should enable to map both experienced users and new users. I think we should value people who join the community and want to contribute to humanitarian organizations on the ground, as in this case. Blocking the new ones if they are not doing anything wrong, just using the powerful plugins that experienced mappers train them in, is not particularly welcoming.

mmd-osm · 2023-12-01T14:45:05Z

This is not a question which users are more valuable @mmd-osm**

@Janabau : Oops, you're misquoting me here. I never said anything like that.

cquest · 2023-12-01T14:55:44Z

As current vandalism is mainly tag related (changing tags on existing objects), or moving objects around (moving nodes), not counting new nodes without tags may be a way avoid this kind of problem (if it is possible and simple in the API code).

mmd-osm · 2023-12-01T15:10:18Z

@cquest : I don't see how this would help with vandalism. Instead of moving nodes around, you could simply create lots of new nodes without any tags, and then replace existing way nodes by your new nodes. It's even more difficult to manage now, since you "only" changed 100 ways maybe, but created havoc nonetheless.

With today's limits and your proposed change, a newly created account could in a worst case scenario:

Upload 2'000'000 new nodes without tags in 200 changesets
Change 1000 ways, each having 2000 nodes, and replace the exiting nodes by the newly created ones. -> large scale screw up of way geometry is possible!

Some further remarks:

The old nodes are not touched in any way, they're simply no longer referenced by a way.
The new user could even upload more untagged nodes, thereby creating more junk in the database. However, they're not able to cause further damage to existing data beyond the 1000 ways. At least not on the first day.

I think it's fairly easy to see that this might cause some issues down the road.

fititnt · 2023-12-01T22:20:39Z

@mmd-osm
I suppose buildings in the area are really circular, like shown on this example: https://upload.wikimedia.org/wikipedia/commons/4/43/House_in_Toteil_002.jpg ... we just don't have an established tagging as a node with building=african_hut and diameter=xyz, so people end up creating ways with lots of nodes to approximate the shape of the building.

I believe your idea worth discussion with editors and map styles. It's a bit off topic for this thread, except that it would reduce need of a large amount of buildings everywhere that are truly round

LaoshuBaby · 2023-12-03T11:19:58Z

about new user been restricted in mapathon.

This may be solved by relying on HOT event organizers to collect users' usernames before starting the mapathon and apply for a more relaxed rate limit for them, and must ensure that there are enough experienced mappers to review the corresponding editors. The current rate limit is definitely not perfect. , and will be more of a headache than the first 50 edits to Wikipedia, but it is difficult to find a simple and effective way to deal with the vandalism while also reduces the impact on new people who edit in friendly attitude.

In addition, 3000 points in 10 minutes, as a mapper with 5 years of experience, I can't do it even with the assistance of AI.

LaoshuBaby · 2023-12-03T12:21:43Z

Hey folks! I would really appreciate if you could stop attributing this quote from @HarelM to my user. Again, I didn't write anything like that.

Edited in #4319 (comment), now it don't contain any user's reply, only my murmur.

pnorman · 2023-12-03T16:14:00Z

I'm not sure discussion in a merged PR is the best place to hash this out.

We started requiring postgresql 12 after the migrations in openstreetmap#4319

DavidKarlas mentioned this pull request Oct 30, 2023

Limit number of edits per user and day #2342

Closed

matkoniecz reviewed Oct 30, 2023

View reviewed changes

app/controllers/api/changesets_controller.rb Outdated Show resolved Hide resolved

matkoniecz reviewed Oct 30, 2023

View reviewed changes

app/models/user.rb Outdated Show resolved Hide resolved

matkoniecz reviewed Oct 30, 2023

View reviewed changes

test/helpers/user_roles_helper_test.rb Show resolved Hide resolved

tomhughes force-pushed the edit-rate-limits branch from be185ad to 8136360 Compare October 30, 2023 20:30

tomhughes force-pushed the edit-rate-limits branch from f646c1c to 25c1bd7 Compare October 30, 2023 22:35

This was referenced Nov 30, 2023

allow read only access to view list of performed edits, area of edits, who can be contacted about OEG edit hotosm/tasking-manager#6147

Open

show who validated given segment and give ability to contact them hotosm/tasking-manager#6148

Open

tyrasd mentioned this pull request Nov 30, 2023

Report rate limit settings in capabilities API call #4380

Closed

This comment was marked as resolved.

Sign in to view

gravitystorm added a commit to gravitystorm/openstreetmap-website that referenced this pull request Dec 20, 2023

Update minimum postgresql version

9d13066

We started requiring postgresql 12 after the migrations in openstreetmap#4319

gravitystorm mentioned this pull request Dec 20, 2023

Update minimum postgresql version #4428

Merged

milan-cvetkovic mentioned this pull request Feb 7, 2024

Proposal for Messages API #4509

Closed

tomhughes mentioned this pull request Jun 19, 2024

Add the ability to limit changeset size #4908

Merged

Add the ability to rate limit edits #4319

Add the ability to rate limit edits #4319

Conversation

tomhughes commented Oct 29, 2023

mmd-osm commented Oct 30, 2023

zstadler commented Oct 30, 2023 • edited Loading

tomhughes commented Oct 30, 2023

zstadler commented Oct 30, 2023

tomhughes commented Oct 30, 2023

tomhughes commented Oct 30, 2023

zstadler commented Oct 30, 2023 • edited Loading

tomhughes commented Oct 30, 2023 • edited Loading

drolbr commented Oct 30, 2023

tomhughes commented Oct 30, 2023

mmd-osm commented Oct 30, 2023 • edited Loading

tomhughes commented Oct 30, 2023

AntonKhorev commented Oct 30, 2023

matkoniecz commented Oct 30, 2023

tomhughes commented Oct 30, 2023

Zaczero commented Oct 30, 2023

tomhughes commented Oct 30, 2023 • edited Loading

Zaczero commented Oct 30, 2023

Zaczero commented Oct 30, 2023

tomhughes commented Oct 30, 2023

Zaczero commented Oct 30, 2023

zstadler commented Oct 30, 2023 • edited Loading

gravitystorm commented Oct 30, 2023

tomhughes commented Oct 30, 2023

501Ghost commented Nov 29, 2023

anthaas commented Nov 29, 2023

tomhughes commented Nov 29, 2023

anthaas commented Nov 29, 2023 • edited Loading

woodpeck commented Nov 29, 2023

DwynDwe commented Nov 29, 2023

HarelM commented Nov 29, 2023

matkoniecz commented Nov 30, 2023

mmd-osm commented Nov 30, 2023

zstadler commented Nov 30, 2023

anthaas commented Nov 30, 2023

SomeoneElseOSM commented Nov 30, 2023

mmd-osm commented Nov 30, 2023

matkoniecz commented Dec 1, 2023

Janabau commented Dec 1, 2023

mmd-osm commented Dec 1, 2023 • edited Loading

cquest commented Dec 1, 2023

mmd-osm commented Dec 1, 2023 • edited Loading

fititnt commented Dec 1, 2023

LaoshuBaby commented Dec 3, 2023 • edited Loading

This comment was marked as resolved.

LaoshuBaby commented Dec 3, 2023 • edited Loading

pnorman commented Dec 3, 2023

zstadler commented Oct 30, 2023 •

edited

Loading

zstadler commented Oct 30, 2023 •

edited

Loading

tomhughes commented Oct 30, 2023 •

edited

Loading

mmd-osm commented Oct 30, 2023 •

edited

Loading

tomhughes commented Oct 30, 2023 •

edited

Loading

zstadler commented Oct 30, 2023 •

edited

Loading

anthaas commented Nov 29, 2023 •

edited

Loading

mmd-osm commented Dec 1, 2023 •

edited

Loading

mmd-osm commented Dec 1, 2023 •

edited

Loading

LaoshuBaby commented Dec 3, 2023 •

edited

Loading

LaoshuBaby commented Dec 3, 2023 •

edited

Loading