-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to rate limit edits #4319
Conversation
It would probably be quicker to handle all non-moderator changeset uploads on the Rails port for a while and get some experience with the approach first. Working on both rails and cgimap is likely slowing things down too much. |
Will it be possible to use the accumulated number of changes rather than the elapsed time as the measure by which the rate is increased? I can see a malicious account creating a benign changeset, waiting a week doing nothing suspicious, and then receiving a decent rate. |
I can see it would be better - my concern is the cost of computing that value as it means reading all the changesets for that user unless we do something to cache the value. |
Yes, it would be reading the changesets, not the change files. It would be useful to cache the open changesets. Still, Perfect is the enemy of good |
I don't understand what you mean? Nothing on the user page shows the number of changes - the number of changesets yet but not the number of objects changed.
What does the number of open changesets have to do with anything? |
There's no way for apache to know if the user is a moderator or not so we can't make redirects conditional like this and in any case the performance impact of doing that is likely to be substantial. |
I'm deeply sorry I was not expressing myself clearly.
I meant to say that GET /api/0.6/changesets can provide the information needed to compute the total number of changes made by an account. This is in contrast to an alternative approach of reading and analyzing the associated osmChange data using GET /api/0.6/changeset/#id/download calls for each changeset. I was assuming the "Edit" link on the users page is using the GET /api/0.6/changesets API point.
I meant to say that if there is a cache for the total number of changes made by each user, than at any given moment, the number of useful lines in the the cache would be at the same order of magnitude as the number of open changesets in the system. |
Well sure the data is available (which is all the API being able to fetch it proved) but we put limits on that API call - we don't just allow people to fetch a user's entire edit history because of the cost of doing that. Talking about the API is irrelevant anyway because the server code doesn't use the API or parse XML to do things like this, it would just execute SELECT SUM("changesets"."num_changes") FROM "changesets" WHERE "changesets"."user_id" = $1 which forces the database to read all the changeset records for the user. I doubt the edit link uses that API call either - why would iD care about your previous changesets? |
Can we take advantage of |
We'd certainly use that index yes (in fact Better is probably only to look at the last 100/200/1000 or whatever changesets. |
So for our DWG members using osmtools to revert changesets, there won't be any change, since they're using the single object upload on Rails anyway. I'm thinking about whitelisting some IP addresses used by specialized revert tools, such as Thanos. This of course assumes that the tool (+server ip address) would be somehow restricted to mods only. (ping @Zaczero). For sure there will be a performance impact for everyone else. As long as our revert tools can't really cope with the data volume, and we're still trying to work out a good strategy to identify excessive uploads, this seems acceptable to me as a temporary measure. Are you concerned that spike-0[678] can't cope with the additional load? |
Well I'm more concerned about slowing down upload calls than about literally overloading the servers. |
not necessarily |
is such rapid growth needed? 100 000 per hour after just week is still pretty high (though I have not analysed actual editing patterns, so maybe I am underestimating how fast people often scale up their editing after joining) |
Well there are three questions - what is an appropriate final target, how long should it take to get there, and what curve should we use to get to that target. I did consider a month (well four weeks) rather that one week but that then has an impact at the end other end in that it takes longer before the limit starts to grow and I thought it might be keeping things too long for too long. Of course if we switch to basing it on edit counts instead of time then that goes away as an issue. |
I believe a 1000 changes per hour is too low. New users should be able to create at least 1 full changeset (10_000 changes) to avoid confusion. However, they should not be able to create 20_000 changes. Perhaps make it a daily limit instead of hourly, starting from max 20_000 changes per day. |
I think allowing 10000 is far too much given what we're trying to combat here - no real user that hasn't dived straight into an import is going to be generating changesets that big. I have never done a changeset that big in 16 years of editing! |
I personally dislike the new importer role; everyone should be able to perform imports without much of an issue. In other words, the rate limit should be configured in a way that does not disrupt imports, assuming the account is in good condition. |
It's not hard to imagine a newbie who performs 10_000 changes in a day (and does a single, bulk upload). |
I see better control of import/bulk edits as a useful side effect of this to be honest but other people may disagree. The main reason for the importer role though was that it was very difficult to come up with an algorithm that would block the abuse without disputing them - there are importers that have got close to half a million edits in an hour so you would need to increase the maximum by quite a bit to allow them all. Newbies doing lots of edits offline is a concern though that's not good practice and most newbies probably start with iD these days rather than an offline editor like JOSM. |
One more argument: reverting 20,000 changes is not difficult, but when it's 2 million or more, it becomes challenging. Whether we save an additional 10,000 changes or not is insignificant and may potentially cause more harm than good. |
I request that the requirements will continue to be discussed at the issue - #2342 (comment)
IMHO, the PR is a place to discuss the implementation of these requirements. I was mistaken to combine requirements and implantation aspects in the same comments above. |
be185ad
to
8136360
Compare
|
Ah so that is what's going on with docker.... I couldn't see what the problem was but the error messages are weirdly in the middle of the output. So it seems that restriction was relaxed in postgres 12 which is why the main tests work as the GitHub ubuntu images use postgres 14 but our docker setup still uses postgres 11... |
f646c1c
to
25c1bd7
Compare
Hi. I just want to say that I've managed to hit the rate limit with https://www.openstreetmap.org/changeset/144568191 and subsequent changesets. It kicked in after 100k changes. An hour later I was able to upload the remaining 102k changes of my huge changeset without hitting the rate limit. I'm happy that the rate limit exists, and the number is more than reasonable, but I was a bit surprised by it, and stressed out because I was afraid it would lead to strange editing issues/conflicts. Luckily that didn't happen and everything went smoothly. Thank you for this nice implementation of the rate limit that didn't mess up my oversized edit. |
Hi, in a mapathon in Brno, Czechia lots of attendees are new mappers creating their OSM accounts for this mapathon. They are mapping https://tasks.hotosm.org/projects/15495/tasks which includes round buildings. The limit is absurd since they are trying to upload 3k+ nodes (mapped in 10 minutes) and they never will be able to do that. |
Any first time user that is mapping 3000 nodes in ten minutes cannot possibly be doing high quality work - that is 5 nodes a second, every second. |
Building tools in JOSM creates 10-12 nodes for each bulding and it's only two click of the mouse. I can imagine very lively that result meets OSM standards. |
That would still mean 500 clicks of the mouse in 600 seconds - not counting any potential zoom operations. Frankly, this sounds like a recipe for problems, and HOT should be thankful for a rate limit that avoids cementing their bad rep for low-quality edits. |
It is relatively easy to make this many edits in a short time. It's not necessarily poor quality work, especially when there are validators. Unfortunately, the current restriction discourages new users who want to contribute to openstreetmap. |
Taking it a bit more easy and learning the system before making these huge edits makes a lot of sense, IMHO. As I said before, if the lack of security (which was before this limit was introduced) discouraged veteran mappers from editing (which it did) I think there's no real question here which users are more valuable, at least for me it's clear... |
@anthaas Can you provide representative sample of edits they are trying to send? JOSM can save not uploaded edits, and it is possible to open them elsewhere (in another JOSM instance). It would allow to look into whether it was good idea to block such edits. Also, you can use it to upload edits from another account (if these edits were worth uploading) I see some building mapped as round in this area, what maybe contributes to larger volume. But first time user that is mapping 3000 nodes in ten minutes? I am really curious about data quality. If that is typical problem, can you make and post screen capture of editing activity of such user? How they edit to end with that? (I tried to look at edit quality in this area in edits done by HOT mappers but HOT tasking manager seems to not be making easy to review edits done with it, I ended creating hotosm/tasking-manager#6147 hotosm/tasking-manager#6148 https://hotosm.slack.com/archives/C4GLC45PY/p1701330196012119 before running out of time ) |
@matkoniecz : maybe try osmcha.org using changeset comment "#hotosm-project-15495". Some of the larger changesets have a fair amount of round buildings, like this one: https://overpass-api.de/achavi/?changeset=144010949&relations=true - it seems these buildings have been mostly created by copy & pasting an existing round building, which takes almost no time at all in JOSM. |
It is very unlikely that a new user will do their edits with JOSM. Like @matkoniecz,, I would like to see a real-world new user that hits the current rate limit. Personally, I've spent more than 3 weeks of work in reverting the latest vandalism in Israel as well as recovering from well-meaning attempts to re-map the affected areas. I did that knowing this PR is in the works. |
@matkoniecz I don't like the idea of uploading changes for other users and taking their credits. That is not the idea behind mapathons. I believe you are familiar with the workflow of humanitarian mapping using the Tasking Manager. So when a user locks a task and tries to upload mapped buildings, the next step is marking the task as mapped and locking another one. This workflow is broken by this limit since the mapper shouldn't mark the task as mapped in the Tasking Manager because changes aren't uploaded yet. Standard workflow of the mapping is to sweep the whole task, map buildings (or other features) and submit the work. In the JOSM training session, a buildings tools plugin and mapathoner plugin is introduced to ensure buildings are squared or rounded and tagged properly. And yes, we are having a JOSM training even for the new mappers. Training session usually looks like a 1 hour live demo by an experienced mapper on how to identify buildings and how to map followed by Q&A and individual mapping. mappingInOneMinute.osm.txt |
To address some of the points above: Where new users map their first changesets very quickly, and especially when they do so in JOSM, there is a clear correlation with poor-quality editing. A significant subset of DWG tickets created about new users have simply been about poor editing quality, and our advice is often "slow down, and make smaller but better quality edits". This has been more noticeable since the recent waves of vandalism where large number of objects were edited (or deleted) in a short period of time - we (the DWG) have been using various means to detect this, and in a few cases have found ourselves dealing with poor quality new editors who just needed to be encouraged to slow down a bit. To address @zstadler's point above - these are "real-world new users that hit the current rate limit", but even though they're not mapping in bad faith, it still makes sense for them to be asked to slow down a bit. As an example from the changeset linked above, https://www.openstreetmap.org/way/1223453025 has been mapped as "perfectly circular" with 30 nodes; the imagery suggests perhaps something closer to a hexagon. |
I suppose buildings in the area are really circular, like shown on this example: https://upload.wikimedia.org/wikipedia/commons/4/43/House_in_Toteil_002.jpg ... we just don't have an established tagging as a node with building=african_hut and diameter=xyz, so people end up creating ways with lots of nodes to approximate the shape of the building. |
This is not a question which users are more valuable @mmd-osm. It is not either, or. The open mapping community should enable to map both experienced users and new users. I think we should value people who join the community and want to contribute to humanitarian organizations on the ground, as in this case. Blocking the new ones if they are not doing anything wrong, just using the powerful plugins that experienced mappers train them in, is not particularly welcoming. |
As current vandalism is mainly tag related (changing tags on existing objects), or moving objects around (moving nodes), not counting new nodes without tags may be a way avoid this kind of problem (if it is possible and simple in the API code). |
@cquest : I don't see how this would help with vandalism. Instead of moving nodes around, you could simply create lots of new nodes without any tags, and then replace existing way nodes by your new nodes. It's even more difficult to manage now, since you "only" changed 100 ways maybe, but created havoc nonetheless. With today's limits and your proposed change, a newly created account could in a worst case scenario:
Some further remarks:
I think it's fairly easy to see that this might cause some issues down the road. |
I believe your idea worth discussion with editors and map styles. It's a bit off topic for this thread, except that it would reduce need of a large amount of buildings everywhere that are truly round |
This may be solved by relying on HOT event organizers to collect users' usernames before starting the mapathon and apply for a more relaxed rate limit for them, and must ensure that there are enough experienced mappers to review the corresponding editors. The current rate limit is definitely not perfect. , and will be more of a headache than the first 50 edits to Wikipedia, but it is difficult to find a simple and effective way to deal with the vandalism while also reduces the impact on new people who edit in friendly attitude. In addition, 3000 points in 10 minutes, as a mapper with 5 years of experience, I can't do it even with the assistance of AI. |
This comment was marked as resolved.
This comment was marked as resolved.
Edited in #4319 (comment), now it don't contain any user's reply, only my murmur. |
I'm not sure discussion in a merged PR is the best place to hash this out. |
We started requiring postgresql 12 after the migrations in openstreetmap#4319
This aims to implement #2342 by rate limiting edits.
The limits work in a similar way to the changeset comment limits and are configurable with the defaults being a ramp up from 1000 changes per hour to 100000 per hour over the first week after a user starts editing. Users that are reported lose some allowance subject to a minimum that defaults to 100 edits per hour.
The increase is non linear, increasing as the square of time elapsed by subject to the lower bound of the initial limit so that it starts to climb after about 18 hours as shown in this graph which shows hours on the X axis versus allowed edits per hour on the Y axis.
Moderators have a separately configurable rate limit and this also introduces a new
importer
role that has a higher rate limit and which can be granted to accounts doing large imports/bulk edits.Obviously once this agreed cgimap will need work to respect the rate limits.