Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WASM] Chrome timer throttling affects runtime #51041

Closed
BrennanConroy opened this issue Apr 10, 2021 · 41 comments · Fixed by #57745
Closed

[WASM] Chrome timer throttling affects runtime #51041

BrennanConroy opened this issue Apr 10, 2021 · 41 comments · Fixed by #57745
Assignees
Labels
arch-wasm WebAssembly architecture area-Interop-mono
Milestone

Comments

@BrennanConroy
Copy link
Member

Chrome recently turned on intensive timer throttling. The side-effect of this is that code that relied on timers firing at certain intervals may not work as expected. We first got reports of this issue with the SignalR Javascript library (dotnet/aspnetcore#31079), as it relies on client and server pings to detect connection closures. In addition, the SignalR .NET Client library has the same issue as it uses a timer to send the same pings.

The Javascript library was fixed by relying on

Intensive throttling is not applied to timers that are scheduled from a network response handler

-https://bugs.chromium.org/p/chromium/issues/detail?id=1186569#c3

The Javascript fix is at dotnet/aspnetcore#31300 for reference.

Unfortunately, the .NET client can not do the same thing because it uses .NET patterns (async await) which go through the wasm runtime. The wasm runtime very likely uses timers for scheduling work which will be chained and have throttling applied to them.

Specifically, I was noticing that when using a System.IO.Pipeline, calling FlushAsync on a Pipe and ReadAsync from another thread was taking more than 30 seconds for the ReadAsync to return after FlushAsync finished.

TLDR:
Wasm Runtime uses timers which will be throttled on Chrome, this can result in poor behavior for apps.

A potential change that could be made is to restart timers on network handlers so they aren't throttled.

cc @lewing @davidfowl @halter73

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 10, 2021
@lewing lewing added the arch-wasm WebAssembly architecture label Apr 10, 2021
@ghost
Copy link

ghost commented Apr 10, 2021

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

Chrome recently turned on intensive timer throttling. The side-effect of this is that code that relied on timers firing at certain intervals may not work as expected. We first got reports of this issue with the SignalR Javascript library (dotnet/aspnetcore#31079), as it relies on client and server pings to detect connection closures. In addition, the SignalR .NET Client library has the same issue as it uses a timer to send the same pings.

The Javascript library was fixed by relying on

Intensive throttling is not applied to timers that are scheduled from a network response handler

-https://bugs.chromium.org/p/chromium/issues/detail?id=1186569#c3

The Javascript fix is at dotnet/aspnetcore#31300 for reference.

Unfortunately, the .NET client can not do the same thing because it uses .NET patterns (async await) which go through the wasm runtime. The wasm runtime very likely uses timers for scheduling work which will be chained and have throttling applied to them.

Specifically, I was noticing that when using a System.IO.Pipeline, calling FlushAsync on a Pipe and ReadAsync from another thread was taking more than 30 seconds for the ReadAsync to return after FlushAsync finished.

TLDR:
Wasm Runtime uses timers which will be throttled on Chrome, this can result in poor behavior for apps.

A potential change that could be made is to restart timers on network handlers so they aren't throttled.

cc @lewing @davidfowl @halter73

Author: BrennanConroy
Assignees: -
Labels:

arch-wasm, untriaged

Milestone: -

@lewing lewing removed the untriaged New issue has not been triaged by the area owner label Apr 10, 2021
@lewing lewing added this to the 6.0.0 milestone Apr 10, 2021
@lewing
Copy link
Member

lewing commented Apr 10, 2021

Thank you for the report, I'd seen some of discussion but hadn't connected all the pieces. Yes, we certainly use timers to yield to the browser while scheduling. We have a couple of options here that might help mitigate. Setting the milestone to 6 for the moment but it might be worth servicing.

cc @kg @pavelsavara

@WayneHiller
Copy link

I hope we can fix this before the 6.0 release. Anyone using Blazor Wasm with SignalR will be affected. I am ready to test a fix anytime :) Edge is affected too but it seems it has a 2 hour page idle time before the throttling kicks in.

@andrew-tevent
Copy link

These stability issues are causing us issues, so a .net 5 backport would be gratefully received.

@tofron
Copy link

tofron commented Apr 21, 2021

I have the same issue. i'm using 3.1.14 version

@kg
Copy link
Member

kg commented Apr 21, 2021

We are currently investigating solutions for this issue. If you have specific problem scenarios (other than the already-known SignalR issue) or reliable reproduction steps it'd be good to know about them so we can make sure whatever solution we arrive at will address the problem for you.

@andrew-tevent
Copy link

@kg I added a Chrome WS messages screenshot here.

@lewing lewing assigned kg May 4, 2021
@pavelsavara
Copy link
Member

I created simple JS only app on which it could be tested, how it behaves https://github.com/pavelsavara/timer-throttle

@jadavis42
Copy link

jadavis42 commented May 28, 2021

Hi, I want to report the problem also happens on Blazor Server side applications. In fact I am surprised why WASM applications would be affected since WASM works on HTTP (not SignalR). This is severely impacting a production application I deployed at a customer site. If the user minimizes the browser to do some other work, they can lose any data they inputted. And what's worse is my application logs them out whenever the Blazor circuit handler is disconnected, which is very annoying for my customer, as they need to keep logging in throughout the day.

Is there an ETA to fix this?

@kg
Copy link
Member

kg commented May 28, 2021

Is there an ETA to fix this?

We're in the process of adding some changes to make this problem less disruptive, but ultimately browser vendors do not want designs like this to work in web pages because things like banner ads use them to bog down your PC.

Design changes to your application can mitigate this but may involve increases to network traffic (browsers currently suppress some of these timer behaviors if you're actively talking to a server.)

We'd love to provide a comprehensive solution for this but based on research so far, making existing apps Just Work is generally not possible. We're doing our best but only browser vendors can address this in the end - they've chosen to break this in order to make most users happier (in theory)

@jadavis42
Copy link

Hi kg, I understand that it's not possible to control what all browser vendors are doing, but Microsoft is making a big push for developers to adopt their Blazor technology platform. If ultimately Blazor apps cannot run reliably in mainstream browsers, I'm not understanding how they expect Blazor to be successful. It also puts developers (like me) in a terrible bind having convinced my organization to develop apps with Blazor technology.

@WayneHiller
Copy link

WayneHiller commented May 28, 2021

@kg Not fixing the issue (finding a workaround) is not really an option. The issue renders using SignalR in Blazor almost useless. The issue has been fixed in the JS client so there has to be a way to do it in the .Net client as well.

@jadavis42 Blazor Wasm does not have the issue itself but I use the SignalR client for a number of scenarios.

https://developer.chrome.com/blog/timer-throttling-in-chrome-88/

The issue seems to be mostly related to chained setTimeout calls (> 5). If I remember right the fix in the JS SignalR client was to reduce the nested setTimeout calls.

@jadavis42
Copy link

@WayneHiller thanks for clarifying your point regarding Wasm.

@BrennanConroy
Copy link
Member Author

I want to report the problem also happens on Blazor Server side applications

It shouldn't be a problem if you've updated to a recent version of Blazor. The only outstanding issue is using the SignalR .NET Client with WASM.

@jadavis42
Copy link

Hi Brennan, my production app is running .NET 5.0.0. Is there a fix in a later version of .NET 5? Or do I need to upgrade to the latest .NET 6 preview 4?

@BrennanConroy
Copy link
Member Author

BrennanConroy commented May 28, 2021

5.0.6 should contain the fix, for server-side blazor

@jadavis42
Copy link

5.0.6 should contain the fix, for server-side blazor

Hi Brennan I tried 5.0.6 and it is working really great. 14 hours since I installed it and not a single SignalR disconnect. Thanks so much for confirming this. Nice work!

@tofron
Copy link

tofron commented Jul 17, 2021

is there any progress for Blazor WASM? I tried the latest update but it didn't do much, still disconnected after 5 minutes.

@WayneHiller
Copy link

Is there anything going on with this issue at all? Seems to be no activity. I really need this fixed so I can dig into the source to find a solution if no one else is looking into it.

@pavelsavara
Copy link
Member

pavelsavara commented Aug 18, 2021

This is good conversation about the topic. I'm not sure we want all the wasm apps to prevent browser from sleeping throttling.
@kg suggested that we only detect when we are throttled and log message to console pointing to the documentation page.

@BrennanConroy
Copy link
Member Author

@pavelsavara That's a different issue, it's about browsers sleeping tabs. This issue is about the timer throttling that chrome implemented. It's explained pretty well in the original comment of this issue #51041 (comment)

@pavelsavara
Copy link
Member

Okay. So, should we apply the same approach here too? Just detect it and document it ? Instead of fighting the intention of browser vendors ?

@BrennanConroy
Copy link
Member Author

For browser tabs sleeping: Up to you. SignalR decided not to force a workaround on users and instead provide documentation for people to workaround it themselves.

For timer throttling: SignalR changed how timers were used to not hit the timer throttling problem since it wasn't possible for users to workaround the issue. WASM can still hit the timer throttling issue so it should probably try to mitigate it somehow, unless a workaround can be found that users can apply.

@pavelsavara
Copy link
Member

Thanks, I will explore our mitigate/workaround options and report back.

@WayneHiller
Copy link

As far as I have read the issue only happens when a timer is throttled (sleeping tab, extreme throttling) and it is in a chained call 5 levels deep or more. This is the issue that was changed (fixed) in the JS SignalR client, the keep alive timer was calling itself thus nesting the calls. I don't think the issue happens if the timer callbacks are not nested calls.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 19, 2021
@pavelsavara
Copy link
Member

pavelsavara commented Aug 20, 2021

I attached draft of the PR.

The change prevents heavy throttling, if we received WebSocket event in last 5 minutes.
The light throttling (one wake per second) would still be in place anyway.

The automated test for this is not trivial and I'm still working on xharness/webdriver infrastructure to cover this.

I would appreciate if others could help testing it outside of runtime with some actual application and possibly on variety of browsers.

@pavelsavara
Copy link
Member

I tested chrome & edge.
The heavy throttling in Firefox seems to be done via budget and it's not clear to me that WS would help.

@BrennanConroy
Copy link
Member Author

The heavy throttling in Firefox

What throttling does Firefox have? I haven't seen any before.

Curious, why is the change limited to WS?

@kg
Copy link
Member

kg commented Aug 23, 2021

The heavy throttling in Firefox

What throttling does Firefox have? I haven't seen any before.

Curious, why is the change limited to WS?

It's the opposite, browser vendors are throttling everything and having an active network connection lets you avoid the worst of it

@BrennanConroy
Copy link
Member Author

It's the opposite, browser vendors are throttling everything and having an active network connection lets you avoid the worst of it

You misunderstood, I understand the issue, but the proposed PR is only for websocket connections. It would be better to apply the fix for any network activity.

@kg
Copy link
Member

kg commented Aug 23, 2021

It's the opposite, browser vendors are throttling everything and having an active network connection lets you avoid the worst of it

You misunderstood, I understand the issue, but the proposed PR is only for websocket connections. It would be better to apply the fix for any network activity.

We don't have a simple way to do that since other types of network activity don't go through our runtime layer, only WebSocket does. We could potentially install hooks on fetch, XMLHTTPRequest, and WebRTC, but that's some new engineering with associated risk. It sounds like a good idea so it might be worth filing an issue.

@lewing
Copy link
Member

lewing commented Aug 24, 2021

It's the opposite, browser vendors are throttling everything and having an active network connection lets you avoid the worst of it

You misunderstood, I understand the issue, but the proposed PR is only for websocket connections. It would be better to apply the fix for any network activity.

We don't have a simple way to do that since other types of network activity don't go through our runtime layer, only WebSocket does. We could potentially install hooks on fetch, XMLHTTPRequest, and WebRTC, but that's some new engineering with associated risk. It sounds like a good idea so it might be worth filing an issue.

All networking goes through our code. Http goes through our fetch hooks (we don't bind xmlhttprequest intentionally).

@kg
Copy link
Member

kg commented Aug 24, 2021

It's the opposite, browser vendors are throttling everything and having an active network connection lets you avoid the worst of it

You misunderstood, I understand the issue, but the proposed PR is only for websocket connections. It would be better to apply the fix for any network activity.

We don't have a simple way to do that since other types of network activity don't go through our runtime layer, only WebSocket does. We could potentially install hooks on fetch, XMLHTTPRequest, and WebRTC, but that's some new engineering with associated risk. It sounds like a good idea so it might be worth filing an issue.

All networking goes through our code. Http goes through our fetch hooks (we don't bind xmlhttprequest intentionally).

I'm referring to things like third-party libraries that might be calling directly into JS

@pavelsavara
Copy link
Member

The heavy throttling in Firefox
What throttling does Firefox have? I haven't seen any before.

https://superuser.com/questions/1500289/how-to-aggressively-throttle-background-tabs-in-firefox-using-dom-min-background
I guess that if we deplete the budget we would be throttled. But in my tests it didn't happen. Perhaps some CPU heavy workload could trigger that.

Curious, why is the change limited to WS?

The documentation I found so far only speaks about WebRTC as opt-out. We know from experience that WebSocket has same effect. Could you point me to something specific talking about fetch or XMLHTTPRequest in this context ?

I will experiment with back end of the fetch promise when called by our BrowserHttpHandler and see if that works too.
I don't think we should monkey-patch window.fetch to hook into third party JS.

@BrennanConroy
Copy link
Member Author

Timer throttling is caused by chained timers, if you kick off a new timer from any network response you will no longer have chained timers. Thus avoiding the throttling issue (at least until they add some new heuristic)

@pavelsavara
Copy link
Member

Timer throttling is caused by chained timers, if you kick off a new timer from any network response you will no longer have chained timers. Thus avoiding the throttling issue (at least until they add some new heuristic)

Response to fetch is not event, but just promise.
I implemented hook into that promise and test here pavelsavara@425252b
But unfortunately, it does NOT prevent throttling.

@BrennanConroy could you please look at my code to verify I have not made some simple mistake ? Or alternatively, could you please provide demo on which fetch prevents throttling ?

@BrennanConroy
Copy link
Member Author

Response to fetch is not event, but just promise.

Oh I see, that is unfortunate 😞

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 26, 2021
@pavelsavara pavelsavara reopened this Aug 26, 2021
@pavelsavara
Copy link
Member

Keep open till 6.0 backport is done

@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Aug 26, 2021
@karelz
Copy link
Member

karelz commented Aug 26, 2021

Fixed in 6.0 RC1 in PR #58160.
Fixed in 7.0 in PR #57745

@ghost ghost locked as resolved and limited conversation to collaborators Sep 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-Interop-mono
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants