Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to rate limit edits #4319

Merged
merged 5 commits into from
Nov 15, 2023

Conversation

tomhughes
Copy link
Member

This aims to implement #2342 by rate limiting edits.

The limits work in a similar way to the changeset comment limits and are configurable with the defaults being a ramp up from 1000 changes per hour to 100000 per hour over the first week after a user starts editing. Users that are reported lose some allowance subject to a minimum that defaults to 100 edits per hour.

The increase is non linear, increasing as the square of time elapsed by subject to the lower bound of the initial limit so that it starts to climb after about 18 hours as shown in this graph which shows hours on the X axis versus allowed edits per hour on the Y axis.

Screenshot from 2023-10-29 17-33-10

Moderators have a separately configurable rate limit and this also introduces a new importer role that has a higher rate limit and which can be granted to accounts doing large imports/bulk edits.

Obviously once this agreed cgimap will need work to respect the rate limits.

@mmd-osm
Copy link
Contributor

mmd-osm commented Oct 30, 2023

Obviously once this agreed cgimap will need work to respect the rate limits.

It would probably be quicker to handle all non-moderator changeset uploads on the Rails port for a while and get some experience with the approach first. Working on both rails and cgimap is likely slowing things down too much.

@zstadler
Copy link

zstadler commented Oct 30, 2023

Will it be possible to use the accumulated number of changes rather than the elapsed time as the measure by which the rate is increased?

I can see a malicious account creating a benign changeset, waiting a week doing nothing suspicious, and then receiving a decent rate.
Alternatively, providing an initial 10000 benign edits, working for a period of at least 10 hours, in order to get a rate increase is much harder to achieve. On the other hand, I do not believe that would be unacceptable for new inexperienced users.

@tomhughes
Copy link
Member Author

I can see it would be better - my concern is the cost of computing that value as it means reading all the changesets for that user unless we do something to cache the value.

@zstadler
Copy link

Yes, it would be reading the changesets, not the change files.
It is not fundementally different than what's done in https://www.openstreetmap.org/user/.

It would be useful to cache the open changesets. Still, Perfect is the enemy of good

@tomhughes
Copy link
Member Author

It is not fundementally different than what's done in https://www.openstreetmap.org/user/.

I don't understand what you mean? Nothing on the user page shows the number of changes - the number of changesets yet but not the number of objects changed.

It would be useful to cache the open changesets. Still, Perfect is the enemy of good

What does the number of open changesets have to do with anything?

@tomhughes
Copy link
Member Author

It would probably be quicker to handle all non-moderator changeset uploads on the Rails port for a while and get some experience with the approach first. Working on both rails and cgimap is likely slowing things down too much.

There's no way for apache to know if the user is a moderator or not so we can't make redirects conditional like this and in any case the performance impact of doing that is likely to be substantial.

@zstadler
Copy link

zstadler commented Oct 30, 2023

I'm deeply sorry I was not expressing myself clearly.

It is not fundementally different than what's done in https://www.openstreetmap.org/user/.

I don't understand what you mean? Nothing on the user page shows the number of changes - the number of changesets yet but not the number of objects changed.

I meant to say that GET /api/0.6/changesets can provide the information needed to compute the total number of changes made by an account.

This is in contrast to an alternative approach of reading and analyzing the associated osmChange data using GET /api/0.6/changeset/#id/download calls for each changeset.

I was assuming the "Edit" link on the users page is using the GET /api/0.6/changesets API point.

It would be useful to cache the open changesets. Still, Perfect is the enemy of good

What does the number of open changesets have to do with anything?

I meant to say that if there is a cache for the total number of changes made by each user, than at any given moment, the number of useful lines in the the cache would be at the same order of magnitude as the number of open changesets in the system.

@tomhughes
Copy link
Member Author

tomhughes commented Oct 30, 2023

Well sure the data is available (which is all the API being able to fetch it proved) but we put limits on that API call - we don't just allow people to fetch a user's entire edit history because of the cost of doing that.

Talking about the API is irrelevant anyway because the server code doesn't use the API or parse XML to do things like this, it would just execute current_user.changesets.sum(:num_changes) but that will turn into something like:

SELECT SUM("changesets"."num_changes") FROM "changesets" WHERE "changesets"."user_id" = $1

which forces the database to read all the changeset records for the user.

I doubt the edit link uses that API call either - why would iD care about your previous changesets?

@drolbr
Copy link
Contributor

drolbr commented Oct 30, 2023

Talking about the API is irrelevant anyway because the server code doesn't use the API or parse XML to do things like this, it would just execute current_user.changesets.sum(:num_changes) but that will turn into something like:

SELECT SUM("changesets"."num_changes") FROM "changesets" WHERE "changesets"."user_id" = $1

Can we take advantage of changesets_user_id_created_at_idx and look only into the last hour or last 24 hours or so?

@tomhughes
Copy link
Member Author

We'd certainly use that index yes (in fact current_user.changesets does but the sum causes the order to be lost) but limiting by date is probably not a good idea as it would penalise people with lots of infrequent edits.

Better is probably only to look at the last 100/200/1000 or whatever changesets.

@mmd-osm
Copy link
Contributor

mmd-osm commented Oct 30, 2023

There's no way for apache to know if the user is a moderator or not so we can't make redirects conditional like this and in any case the performance impact of doing that is likely to be substantial.

So for our DWG members using osmtools to revert changesets, there won't be any change, since they're using the single object upload on Rails anyway. I'm thinking about whitelisting some IP addresses used by specialized revert tools, such as Thanos. This of course assumes that the tool (+server ip address) would be somehow restricted to mods only. (ping @Zaczero).

For sure there will be a performance impact for everyone else. As long as our revert tools can't really cope with the data volume, and we're still trying to work out a good strategy to identify excessive uploads, this seems acceptable to me as a temporary measure. Are you concerned that spike-0[678] can't cope with the additional load?

@tomhughes
Copy link
Member Author

Well I'm more concerned about slowing down upload calls than about literally overloading the servers.

@AntonKhorev
Copy link
Collaborator

since they're using the single object upload on Rails anyway

not necessarily

app/models/user.rb Outdated Show resolved Hide resolved
@matkoniecz
Copy link
Contributor

to 100000 per hour over the first week after a user starts editing

is such rapid growth needed? 100 000 per hour after just week is still pretty high (though I have not analysed actual editing patterns, so maybe I am underestimating how fast people often scale up their editing after joining)

@tomhughes
Copy link
Member Author

Well there are three questions - what is an appropriate final target, how long should it take to get there, and what curve should we use to get to that target.

I did consider a month (well four weeks) rather that one week but that then has an impact at the end other end in that it takes longer before the limit starts to grow and I thought it might be keeping things too long for too long.

Of course if we switch to basing it on edit counts instead of time then that goes away as an issue.

@Zaczero
Copy link

Zaczero commented Oct 30, 2023

I believe a 1000 changes per hour is too low. New users should be able to create at least 1 full changeset (10_000 changes) to avoid confusion. However, they should not be able to create 20_000 changes. Perhaps make it a daily limit instead of hourly, starting from max 20_000 changes per day.

@tomhughes
Copy link
Member Author

tomhughes commented Oct 30, 2023

I think allowing 10000 is far too much given what we're trying to combat here - no real user that hasn't dived straight into an import is going to be generating changesets that big.

I have never done a changeset that big in 16 years of editing!

@Zaczero
Copy link

Zaczero commented Oct 30, 2023

I personally dislike the new importer role; everyone should be able to perform imports without much of an issue. In other words, the rate limit should be configured in a way that does not disrupt imports, assuming the account is in good condition.

@Zaczero
Copy link

Zaczero commented Oct 30, 2023

I have never done a changeset that big in 16 years of editing!

It's not hard to imagine a newbie who performs 10_000 changes in a day (and does a single, bulk upload).

@tomhughes
Copy link
Member Author

I see better control of import/bulk edits as a useful side effect of this to be honest but other people may disagree.

The main reason for the importer role though was that it was very difficult to come up with an algorithm that would block the abuse without disputing them - there are importers that have got close to half a million edits in an hour so you would need to increase the maximum by quite a bit to allow them all.

Newbies doing lots of edits offline is a concern though that's not good practice and most newbies probably start with iD these days rather than an offline editor like JOSM.

@Zaczero
Copy link

Zaczero commented Oct 30, 2023

One more argument: reverting 20,000 changes is not difficult, but when it's 2 million or more, it becomes challenging. Whether we save an additional 10,000 changes or not is insignificant and may potentially cause more harm than good.

@zstadler
Copy link

zstadler commented Oct 30, 2023

I request that the requirements will continue to be discussed at the issue - #2342 (comment)

  • Is it a limit on:
    • The number of changesets?
      *The changes_count of the changesets?
      *The size in bytes of the change data (rate limit)?
    • or something else?
  • What are the initial limits?
  • How should they be increased?

IMHO, the PR is a place to discuss the implementation of these requirements.

I was mistaken to combine requirements and implantation aspects in the same comments above.

@gravitystorm
Copy link
Collaborator

PG::ActiveSqlTransaction: ERROR:  ALTER TYPE ... ADD cannot run inside a transaction block
/app/config/initializers/migrate.rb:55:in `add_enumeration_value'
/app/db/migrate/20231029151516_add_importer_role.rb:3:in `up'

@tomhughes
Copy link
Member Author

Ah so that is what's going on with docker.... I couldn't see what the problem was but the error messages are weirdly in the middle of the output.

So it seems that restriction was relaxed in postgres 12 which is why the main tests work as the GitHub ubuntu images use postgres 14 but our docker setup still uses postgres 11...

@501Ghost
Copy link

Hi. I just want to say that I've managed to hit the rate limit with https://www.openstreetmap.org/changeset/144568191 and subsequent changesets. It kicked in after 100k changes. An hour later I was able to upload the remaining 102k changes of my huge changeset without hitting the rate limit.

I'm happy that the rate limit exists, and the number is more than reasonable, but I was a bit surprised by it, and stressed out because I was afraid it would lead to strange editing issues/conflicts. Luckily that didn't happen and everything went smoothly.

Thank you for this nice implementation of the rate limit that didn't mess up my oversized edit.

@anthaas
Copy link

anthaas commented Nov 29, 2023

Hi, in a mapathon in Brno, Czechia lots of attendees are new mappers creating their OSM accounts for this mapathon. They are mapping https://tasks.hotosm.org/projects/15495/tasks which includes round buildings. The limit is absurd since they are trying to upload 3k+ nodes (mapped in 10 minutes) and they never will be able to do that.
image (1)
image

@tomhughes
Copy link
Member Author

Any first time user that is mapping 3000 nodes in ten minutes cannot possibly be doing high quality work - that is 5 nodes a second, every second.

@anthaas
Copy link

anthaas commented Nov 29, 2023

Building tools in JOSM creates 10-12 nodes for each bulding and it's only two click of the mouse. I can imagine very lively that result meets OSM standards.

@woodpeck
Copy link
Contributor

That would still mean 500 clicks of the mouse in 600 seconds - not counting any potential zoom operations. Frankly, this sounds like a recipe for problems, and HOT should be thankful for a rate limit that avoids cementing their bad rep for low-quality edits.

@DwynDwe
Copy link

DwynDwe commented Nov 29, 2023

It is relatively easy to make this many edits in a short time. It's not necessarily poor quality work, especially when there are validators. Unfortunately, the current restriction discourages new users who want to contribute to openstreetmap.

@HarelM
Copy link

HarelM commented Nov 29, 2023

Taking it a bit more easy and learning the system before making these huge edits makes a lot of sense, IMHO.
I would consider investing more time in teaching these new users how to properly map before making such huge amount of editing.

As I said before, if the lack of security (which was before this limit was introduced) discouraged veteran mappers from editing (which it did) I think there's no real question here which users are more valuable, at least for me it's clear...

@matkoniecz
Copy link
Contributor

The limit is absurd since they are trying to upload 3k+ nodes (mapped in 10 minutes)

@anthaas Can you provide representative sample of edits they are trying to send? JOSM can save not uploaded edits, and it is possible to open them elsewhere (in another JOSM instance). It would allow to look into whether it was good idea to block such edits. Also, you can use it to upload edits from another account (if these edits were worth uploading)

I see some building mapped as round in this area, what maybe contributes to larger volume. But first time user that is mapping 3000 nodes in ten minutes? I am really curious about data quality. If that is typical problem, can you make and post screen capture of editing activity of such user? How they edit to end with that?

(I tried to look at edit quality in this area in edits done by HOT mappers but HOT tasking manager seems to not be making easy to review edits done with it, I ended creating hotosm/tasking-manager#6147 hotosm/tasking-manager#6148 https://hotosm.slack.com/archives/C4GLC45PY/p1701330196012119 before running out of time )

@mmd-osm
Copy link
Contributor

mmd-osm commented Nov 30, 2023

@matkoniecz : maybe try osmcha.org using changeset comment "#hotosm-project-15495". Some of the larger changesets have a fair amount of round buildings, like this one: https://overpass-api.de/achavi/?changeset=144010949&relations=true - it seems these buildings have been mostly created by copy & pasting an existing round building, which takes almost no time at all in JOSM.

@zstadler
Copy link

It is very unlikely that a new user will do their edits with JOSM.
The above example is the user's 50'th edit.
In his first hour of OSM editing he created about 700 elements. That was also for an hotosm project.

Like @matkoniecz,, I would like to see a real-world new user that hits the current rate limit.
Like @HarelM, I'd rather limit the rate of contribution for new users than have veteran mappers leave OSM. Not to mention the toll on the DWG members and the community members that fights vandalism.

Personally, I've spent more than 3 weeks of work in reverting the latest vandalism in Israel as well as recovering from well-meaning attempts to re-map the affected areas. I did that knowing this PR is in the works.

@anthaas
Copy link

anthaas commented Nov 30, 2023

@matkoniecz I don't like the idea of uploading changes for other users and taking their credits. That is not the idea behind mapathons. I believe you are familiar with the workflow of humanitarian mapping using the Tasking Manager. So when a user locks a task and tries to upload mapped buildings, the next step is marking the task as mapped and locking another one. This workflow is broken by this limit since the mapper shouldn't mark the task as mapped in the Tasking Manager because changes aren't uploaded yet.

Standard workflow of the mapping is to sweep the whole task, map buildings (or other features) and submit the work. In the JOSM training session, a buildings tools plugin and mapathoner plugin is introduced to ensure buildings are squared or rounded and tagged properly. And yes, we are having a JOSM training even for the new mappers. Training session usually looks like a 1 hour live demo by an experienced mapper on how to identify buildings and how to map followed by Q&A and individual mapping.

mappingInOneMinute.osm.txt
I’m attaching a file demonstrating a one minute test of the mapping.

@SomeoneElseOSM
Copy link

To address some of the points above:

Where new users map their first changesets very quickly, and especially when they do so in JOSM, there is a clear correlation with poor-quality editing. A significant subset of DWG tickets created about new users have simply been about poor editing quality, and our advice is often "slow down, and make smaller but better quality edits". This has been more noticeable since the recent waves of vandalism where large number of objects were edited (or deleted) in a short period of time - we (the DWG) have been using various means to detect this, and in a few cases have found ourselves dealing with poor quality new editors who just needed to be encouraged to slow down a bit.

To address @zstadler's point above - these are "real-world new users that hit the current rate limit", but even though they're not mapping in bad faith, it still makes sense for them to be asked to slow down a bit. As an example from the changeset linked above, https://www.openstreetmap.org/way/1223453025 has been mapped as "perfectly circular" with 30 nodes; the imagery suggests perhaps something closer to a hexagon.

@mmd-osm
Copy link
Contributor

mmd-osm commented Nov 30, 2023

I suppose buildings in the area are really circular, like shown on this example: https://upload.wikimedia.org/wikipedia/commons/4/43/House_in_Toteil_002.jpg ... we just don't have an established tagging as a node with building=african_hut and diameter=xyz, so people end up creating ways with lots of nodes to approximate the shape of the building.

@matkoniecz
Copy link
Contributor

https://community.openstreetmap.org/t/the-new-rate-limiting-prevents-participants-to-missing-maps-mapathons-from-saving-buildings/106610/ was created

@Janabau
Copy link

Janabau commented Dec 1, 2023

Taking it a bit more easy and learning the system before making these huge edits makes a lot of sense, IMHO. I would consider investing more time in teaching these new users how to properly map before making such huge amount of editing.

As I said before, if the lack of security (which was before this limit was introduced) discouraged veteran mappers from editing (which it did) I think there's no real question here which users are more valuable, at least for me it's clear...

This is not a question which users are more valuable @mmd-osm. It is not either, or. The open mapping community should enable to map both experienced users and new users. I think we should value people who join the community and want to contribute to humanitarian organizations on the ground, as in this case. Blocking the new ones if they are not doing anything wrong, just using the powerful plugins that experienced mappers train them in, is not particularly welcoming.

@mmd-osm
Copy link
Contributor

mmd-osm commented Dec 1, 2023

This is not a question which users are more valuable @mmd-osm**

@Janabau : Oops, you're misquoting me here. I never said anything like that.

@cquest
Copy link

cquest commented Dec 1, 2023

As current vandalism is mainly tag related (changing tags on existing objects), or moving objects around (moving nodes), not counting new nodes without tags may be a way avoid this kind of problem (if it is possible and simple in the API code).

@mmd-osm
Copy link
Contributor

mmd-osm commented Dec 1, 2023

@cquest : I don't see how this would help with vandalism. Instead of moving nodes around, you could simply create lots of new nodes without any tags, and then replace existing way nodes by your new nodes. It's even more difficult to manage now, since you "only" changed 100 ways maybe, but created havoc nonetheless.

With today's limits and your proposed change, a newly created account could in a worst case scenario:

  • Upload 2'000'000 new nodes without tags in 200 changesets
  • Change 1000 ways, each having 2000 nodes, and replace the exiting nodes by the newly created ones. -> large scale screw up of way geometry is possible!

Some further remarks:

  • The old nodes are not touched in any way, they're simply no longer referenced by a way.
  • The new user could even upload more untagged nodes, thereby creating more junk in the database. However, they're not able to cause further damage to existing data beyond the 1000 ways. At least not on the first day.

I think it's fairly easy to see that this might cause some issues down the road.

@fititnt
Copy link

fititnt commented Dec 1, 2023

@mmd-osm
I suppose buildings in the area are really circular, like shown on this example: https://upload.wikimedia.org/wikipedia/commons/4/43/House_in_Toteil_002.jpg ... we just don't have an established tagging as a node with building=african_hut and diameter=xyz, so people end up creating ways with lots of nodes to approximate the shape of the building.

I believe your idea worth discussion with editors and map styles. It's a bit off topic for this thread, except that it would reduce need of a large amount of buildings everywhere that are truly round

@LaoshuBaby
Copy link
Contributor

LaoshuBaby commented Dec 3, 2023

about new user been restricted in mapathon.

This may be solved by relying on HOT event organizers to collect users' usernames before starting the mapathon and apply for a more relaxed rate limit for them, and must ensure that there are enough experienced mappers to review the corresponding editors. The current rate limit is definitely not perfect. , and will be more of a headache than the first 50 edits to Wikipedia, but it is difficult to find a simple and effective way to deal with the vandalism while also reduces the impact on new people who edit in friendly attitude.

In addition, 3000 points in 10 minutes, as a mapper with 5 years of experience, I can't do it even with the assistance of AI.

@mmd-osm

This comment was marked as resolved.

@LaoshuBaby
Copy link
Contributor

LaoshuBaby commented Dec 3, 2023

Hey folks! I would really appreciate if you could stop attributing this quote from @HarelM to my user. Again, I didn't write anything like that.

Edited in #4319 (comment), now it don't contain any user's reply, only my murmur.

@pnorman
Copy link
Contributor

pnorman commented Dec 3, 2023

I'm not sure discussion in a merged PR is the best place to hash this out.

gravitystorm added a commit to gravitystorm/openstreetmap-website that referenced this pull request Dec 20, 2023
We started requiring postgresql 12 after the migrations in openstreetmap#4319
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.