-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matchmaking balancing seems broken #975
Comments
Please post some replay ids so we can investigate the issue |
you can look for more in my replays: longhead another problem related to this is when one team has a lot of rating concentrated on one player. it depends on the map if this is an advantage or disadvantage but it almost always feels unbalanced even if the balance rating may be above 90%. |
A way to solve this might be to require people to play 1v1 ladder and use that as a proxy for the tmm ratings when they have 0 tmm games. Then balance below a threshold should not be allowed. And the distance of each player to the average rating should affect the balance rating. When there is one 2000 player and a few 500 players vs a few 1000 players, depending on the map that may be an advantage or a disadvantage. Generally more mexes means the 2000 can expand way more and dominate the 1000 players. few mexes means the 500s will block the mexes and not use their eco as efficiently as the 1000s. Either way it is unbalanced so it should not be rated as well balanced. |
We already use global rating as a baseline to initialize the tmm ratings. That is why the first game you linked seemed to have this rating jump from 0 to 1092. In reality the displayed 0 rating is a visual bug, it was 1092 already, that's why the match launched, because the actual balance was good, even though the game reported it as low, because it falsely got that guy as 0 rated.
This is already implemented
|
but that must be possible to determine during balancing?
yes that is obviously an issue, I see at most 20 players in matchmaking queue at a time so an ideal balance will not always be possible. One issue I think that should be addressed though is that games are rated with high balance when there is a huge disbalance within the teams. Especially on point mirrored maps, where the symmetric spots dont actually play against each other, this can turn out extremely unfair, because the best player of each team plays against the worst of the other team. In general the balance within the teams should be included into the rating to hopefully get a bit more balanced games. On top of that it might be a nice feature in the client if you could set your minimum balance rating. That way players could decide how much they are willing to wait for a better game. But that is probably a bigger change. Also when one team is very unbalanced and the other is balanced, that also creates an unfair setting depending on the map. |
What I am getting at is that a high rated and a low rated player in the same game might be because they were premades. Trueskill doesn't take rating distribution into account in the way you describe, which is a limitation we have to work with, but in the long term it doesn't matter because these unfairnesses will balance out. Some times you are favoured, sometimes you are not, but in the end you will converge on an average rating that is fitting. Being able to set a minimum balance rating sounds like a nice feature on the surface, but is unfeasible and also not desireable. This feature has been discussed at length and the conclusion is that it is just not viable. I know that some matches feel suboptimal, but as I said it is a game of tradeoffs and I feel we are pretty much at a local optimum right now. |
That may be true but the issue is not really getting an inaccurate rating, but having unfair games. Its not fun, no matter which side you are on. An unbalanced game will either be frustrating because you lose or boring because you win easily. Its primarily a game quality issue.
That is unfortunate, because I feel like that is the only issue that could really be improved. It is impossible to know how well someone will play when they have zero games, and when there are no players in queue you can't make a better game. But making unbalanced games because TrueSkill doesn't detect it seems a bit unnecessary. Possibly there are alternative balancing methods which take intra-team balance into account? Otherwise I imagine it could be simple enough to scale the balance rating by some inverse distance of each player in a team from the average rating of all players, so that more "spread out" ratings are discouraged? It seems like an option that could at least be tested. |
I'm not sure if we are talking about the same thing. Do you mean the rating changes after a game or how people get selected for games? |
How people get selected for games. Its not really my issue what rating I end up with, I would just like to be matched into games that are as equally balanced as possible. |
the matchmaker already discourages matches where the ratings are spread out |
Oh really? Okay, because some of these games were still rated very high in terms of balance. |
Ignore the balance percent rating. This is the trueskill balance metric that is not accomodating for rating spread. It is not used by the matchmaker. For the matchmaker we use a custom algorithm to determine how balanced a game is |
aha, understood. Maybe that could be an improvement to show the actual balance rating in the client? 😅 |
in theory yes, but not really worth the effort in my opinion. It would be pretty complicated to get the number from the server to the client to the replay. And the single number doesn't tell you that much anyway |
20857269 again a game like this. I have a hard time believing this is discounted. Effectively that was a game of (1300, 1000, 1000) vs (1400, 1200, 1200). Just because one player had a 3v3 rating of 800 he got matched with two 1200s, one of which had a global rating of 1400. The 800 had a global rating of 1200. My teammates were both global and 3v3 1000. And this was when the queue was actually quite full, right now I see 20 people there. |
When looking at the 3v3 ratings this game looks pretty reasonable. What exactly is your problem there? Of course the numbers add up less when you look at the global ratings. Imagine the 800 dude was a pure astro player. You just can't rely on the global rating, that is why people get initialized with a lower rating in the matchmaker. He gained 44 points for that game, so he will soon be at the rating that fits him better. |
The problem is that a 800 rating was used to balance out two 1200 in one team. But rating doesnt work like that. A team with a 500 and two 1500s is going to be a lot better than a team of three 1200s. When rating is concentrated into one position like that it becomes much more likely that you lose stratecially and tactically. Also the difference in terms of skill between an 800 and a 1000 is not that big compared to the difference between a 1000 and a 1200. The 1200 is a lot better than the 1000 is compared to the 800. Ratings can't just be moved around between players to balance. I gave my proposal above, make the players of each team have rating as close to their average as possible. Or at least match the slots up so that each slot has similar rating. Especially when the map has uneven distribution of mexes, concentrating a lot of rating on one spot can make a huge difference. In this game one spot had mexes for 3 spots because it was a 5v5 map, but his mirror was a 1200 and he was a 1000, just because some tiny spot in the back was filled with an 800 (who was actually almost 1400 on global). This happens all the time and its very frustrating because you end up playing against much better teams. |
I think a good model to think of is to match each player against a mirror in the map. As the maps are symmetric, thats usually how it plays out. Then the rating can be used to estimate who wins each encounter. When the slots are not roughly equal then one team will lose on that spot pretty much guaranteed and there is no way to really mitigate that for the other players. So when a non important spot is assigned to a relatively low rated player, the more important spots will be assigned to better players to balance it and they will win the more important positions against a team of more even rating. In turn its also possible for the single good player to be on an unimportant spot and the low rated players lose the important spots simply because the enemy players have more even rating and thus higher rating. The single high rated player can't carry the lower rated players when he is on an unimportant slot. The only solution really is to match all slots individually and have them as similar as possible, and then the team balance should matter. But simply putting high vs low rated players into mirrored spots is always unbalanced. |
The matchmaking code is here: https://github.com/FAForever/server/blob/develop/server/matchmaker/algorithm/team_matchmaker.py By definition the skill gap between 800 and 1000 should be the same as 1000 and 1200 even if it might feel different. It will basically be impossible to quantify the importance of a slot, so I don't think it is feasible to go down that route |
Yes, I looked into it and I saw that there is a measure of rating deviation in the teams which discounts game quality, but I wonder if that discount is strong enough.. perhaps its just a matter of lowering Line 118 in 455912e
further? I feel like 250 rating points can make a huge difference in skill and when this ends up as a slot matchup it can decide a game pretty quickly. Realistically it would probably only be a 125 rating difference at worst because one team would have to be average, but even that is a lot. Maybe a value of 100 is more adequate? If I understand this correctly? Its used here server/server/matchmaker/algorithm/team_matchmaker.py Lines 294 to 301 in 455912e
|
Yes, this is the correct variable, but good luck convincing the community that it should be lowered significantly because doing that would directly lead to an increase in wait times |
but the quality requirements are already being lowered over time.. and this would mainly fix a skewed priority of team rating variety. The rating imbalance is currently prioritized over variety, which leads to exactly what we saw. Equal cumulative ratings but imbalanced teams nonetheless. I don't think it would increase wait times by that much but it should improve the game quality a lot. If it increases wait times too much maybe the maximum imbalance can be increased. The cumulative ratings aren't that accurate anyways I imagine and currently its Line 116 in 455912e
In fact, this should probably be depending on how many players are in a team. 250 imbalance is a lot in a 2v2 game but not in a 4v4 game. Maybe it should be expressed as a relative value of the cumulative rating. 250 is a lot for matches of cumulative 2000 rating but not for 4000 cumulative rating.. That might also explain why the matchmaking is worse for 4v4 matches, as the imbalance score is even more strict and the variety is increased. Edit: actually maybe its not a good idea to have imbalance be calculated relative to the cumulative rating, but probably per player. Because it would probably screw with higher rated games. But expressing a maximum imbalance per spot makes more sense than for the entire team imo. |
When I ran performance tests with artificial data the rating spread was mainly the determining factor for wait time. It is actually pretty easy to distribute multiple players in a way that both teams are almost equal, but it is very hard to find six or eight players of basically the same rating. |
Okay that makes sense.. Personally I would prefer longer wait times over worse quality games.. but I don't know how the community feels about it. What about the idea of calculating imbalance relative to the number of players in a team? Basically have the imbalance apply to each spot individually so larger teams get more leeway with the imbalance and are more likely to have less variety? |
Similarly it is way easier to mix players into comparable teams when you have four players in a team compared to only two. So while you are technically correct, it doesn't matter in practice that this balance requirement doesn't scale per player |
okay yea thats interesting.. I wonder how that actually works out.. I don't know. I feel like it happens quite frequently that matches turn out with a lot of variety. Maybe this can be investigated a bit more in case someone works on the balancing again. |
You can search the forums for discussions about it. At some point there was a thread active arguing for higher quality matches and at the some time different people argued for less wait times in another thread. Sadly they didn't connect with each other |
Thanks for your help. |
Right now we seem to be at a point where both groups are roughly the same size, so we can't be too far from the optimum. And I have decided to stay a looong way away from further tuning the algorithm because the endless discussions about every single change are just too draining |
So maybe get the config from the client? have users configure their requirements themselves? or offer 2 presets, one for quick matching and one for better quality.. |
This has already been discussed at length on the forum as well |
Another idea might be to keep the rating variety in both teams roughly equal. Just had this game 20860454 where one team was very varied and the other more balanced. The team with more variance had 2 noob players and one pro, the noob players kept feeding us and eventually lost their spots. if we had equal variance this would have been the case for both teams. |
I regularly get into really unbalanced games in matchmaking, last game I had was 14% and one before that was 57% balanced. How can this be? Its really annoying when you just lose in a teamgame because of this.
The matchmaking should really only allow games above a specific boundary or at least not allow premade teams to break balance. 14% balance is unaccaptable and just a waste of time.
The text was updated successfully, but these errors were encountered: