Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H/2 stress many failure in content posting methods #55261

Closed
ManickaP opened this issue Jul 7, 2021 · 27 comments · Fixed by #58442
Closed

H/2 stress many failure in content posting methods #55261

ManickaP opened this issue Jul 7, 2021 · 27 comments · Fixed by #58442
Assignees
Labels
area-System.Net.Http disabled-test The test is disabled in source code against the issue in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@ManickaP
Copy link
Member

ManickaP commented Jul 7, 2021

Reproduces in and out of Docker. Started happening between 6.0 Preview 4 and 6.0 Preview 6 (on 4 it ran, on 6 started failing) - tested locally.
Error:

System.Exception: Expected status code OK, got InternalServerError
   at HttpStress.ClientOperations.ValidateStatusCode(HttpResponseMessage m, HttpStatusCode expectedStatus) in /home/manicka/repositories/runtime/src/libraries/System.Net.Http/tests/StressTests/HttpStress/ClientOperations.cs:line 487
   at HttpStress.ClientOperations.<>c.<<get_Operations>b__1_12>d.MoveNext() in /home/manicka/repositories/runtime/src/libraries/System.Net.Http/tests/StressTests/HttpStress/ClientOperations.cs:line 451
--- End of stack trace from previous location ---
   at HttpStress.StressClient.<>c__DisplayClass17_0.<<StartCore>g__RunWorker|0>d.MoveNext() in /home/manicka/repositories/runtime/src/libraries/System.Net.Http/tests/StressTests/HttpStress/StressClient.cs:line 204

Potentially kestrel error, cc: @JamesNK @Tratcher @halter73

Discovered in #55098

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Net.Http untriaged New issue has not been triaged by the area owner labels Jul 7, 2021
@ghost
Copy link

ghost commented Jul 7, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Reproduces in and out of Docker. Started happening between 6.0 Preview 4 and 6.0 Preview 6 (on 4 it ran, on 6 started failing) - tested locally.
Error:

System.Exception: Expected status code OK, got InternalServerError
   at HttpStress.ClientOperations.ValidateStatusCode(HttpResponseMessage m, HttpStatusCode expectedStatus) in /home/manicka/repositories/runtime/src/libraries/System.Net.Http/tests/StressTests/HttpStress/ClientOperations.cs:line 487
   at HttpStress.ClientOperations.<>c.<<get_Operations>b__1_12>d.MoveNext() in /home/manicka/repositories/runtime/src/libraries/System.Net.Http/tests/StressTests/HttpStress/ClientOperations.cs:line 451
--- End of stack trace from previous location ---
   at HttpStress.StressClient.<>c__DisplayClass17_0.<<StartCore>g__RunWorker|0>d.MoveNext() in /home/manicka/repositories/runtime/src/libraries/System.Net.Http/tests/StressTests/HttpStress/StressClient.cs:line 204

Potentially kestrel error, cc: @JamesNK @Tratcher @halter73

Discovered in #55098

Author: ManickaP
Assignees: -
Labels:

area-System.Net.Http, untriaged

Milestone: -

@ManickaP ManickaP mentioned this issue Jul 7, 2021
@ManickaP
Copy link
Member Author

ManickaP commented Jul 7, 2021

Some more digging:
kestrel log:

[15:25:43 ERR] Connection id "0HMA19AMUPRVO", Request id "0HMA19AMUPRVO:00003227": An unhandled exception was thrown by the application.
System.IO.IOException: The client reset the request stream.
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestPipeReader.ReadAsync(CancellationToken cancellationToken) in Microsoft.AspNetCore.Server.Kestrel.Core.dll:token 0x6000db2+0x7
   at System.IO.Pipelines.PipeReader.CopyToAsyncCore[TStream](TStream destination, Func`4 writeAsync, CancellationToken cancellationToken) in System.IO.Pipelines.dll:token 0x60000b8+0x236
   at HttpStress.StressServer.<>c.<<MapRoutes>b__6_6>d.MoveNext() in /home/manicka/repositories/runtime/src/libraries/System.Net.Http/tests/StressTests/HttpStress/StressServer.cs:line 254
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Routing.EndpointMiddleware.<Invoke>g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger) in Microsoft.AspNetCore.Routing.dll:token 0x60000ab+0xa9
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application) in Microsoft.AspNetCore.Server.Kestrel.Core.dll:token 0x6000d28+0x54c

Happens with slow content sending operation much more:
NO cancellation:

[7/7/2021 3:32:53 PM] Total: 7,076 Runtime: 00:00:06
	 0: GET                       Success: 469	Canceled: 0	Fail: 0
	 1: GET Partial               Success: 470	Canceled: 0	Fail: 0
	 2: GET Headers               Success: 471	Canceled: 0	Fail: 0
	 3: GET Parameters            Success: 472	Canceled: 0	Fail: 0
	 4: GET Aborted               Success: 472	Canceled: 0	Fail: 0
	 5: POST                      Success: 456	Canceled: 0	Fail: 16
	 6: POST Multipart Data       Success: 452	Canceled: 0	Fail: 21
	 7: POST Duplex               Success: 465	Canceled: 0	Fail: 9
	 8: POST Duplex Slow          Success: 459	Canceled: 2	Fail: 13
	 9: POST Duplex Dispose       Success: 456	Canceled: 0	Fail: 16
	10: POST ExpectContinue       Success: 458	Canceled: 0	Fail: 14
	11: HEAD                      Success: 472	Canceled: 0	Fail: 0
	12: PUT                       Success: 450	Canceled: 0	Fail: 22
	13: PUT Slow                  Success: 453	Canceled: 3	Fail: 16
	14: GET Slow                  Success: 468	Canceled: 1	Fail: 0
	    TOTAL                     Success: 6,943	Canceled: 6	Fail: 127

Cancellation:

[7/7/2021 3:32:18 PM] Total: 6,540 Runtime: 00:00:06
	 0: GET                       Success: 407	Canceled: 27	Fail: 0
	 1: GET Partial               Success: 403	Canceled: 32	Fail: 0
	 2: GET Headers               Success: 398	Canceled: 38	Fail: 0
	 3: GET Parameters            Success: 406	Canceled: 30	Fail: 0
	 4: GET Aborted               Success: 404	Canceled: 32	Fail: 0
	 5: POST                      Success: 409	Canceled: 20	Fail: 8
	 6: POST Multipart Data       Success: 413	Canceled: 20	Fail: 5
	 7: POST Duplex               Success: 397	Canceled: 33	Fail: 8
	 8: POST Duplex Slow          Success: 405	Canceled: 28	Fail: 5
	 9: POST Duplex Dispose       Success: 403	Canceled: 28	Fail: 5
	10: POST ExpectContinue       Success: 391	Canceled: 37	Fail: 8
	11: HEAD                      Success: 412	Canceled: 24	Fail: 0
	12: PUT                       Success: 404	Canceled: 25	Fail: 7
	13: PUT Slow                  Success: 402	Canceled: 29	Fail: 3
	14: GET Slow                  Success: 405	Canceled: 29	Fail: 0
	    TOTAL                     Success: 6,059	Canceled: 432	Fail: 49

@ManickaP ManickaP added the disabled-test The test is disabled in source code against the issue label Jul 8, 2021
@karelz
Copy link
Member

karelz commented Jul 13, 2021

@JamesNK @Tratcher @halter73 is it something you could help us with please? It seems like Kestrel regression and it blocks our stress runs :(
Should we create issue in aspnet repo?

@halter73
Copy link
Member

System.IO.IOException: The client reset the request stream.

@karelz What makes this seem like a Kestrel regression? The above exception is only thrown if Kestrel received a RST_STREAM frame from the client.

System.Exception: Expected status code OK, got InternalServerError

I'm assuming this means the client received a 500 response. This could be caused by an uncaught exception on the server, but not an exception caused by RST_STREAM since that closes the stream.

Can you collect more verbose server logs and/or collect a network trace?

@karelz karelz added this to the 6.0.0 milestone Jul 15, 2021
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Jul 15, 2021
@alnikola alnikola self-assigned this Jul 21, 2021
@alnikola
Copy link
Contributor

Client logs show that these HTTP/2 POST tests are failing only on Windows and only with timeout in one of the following 2 places:

  1. Http2Stream.WaitForDataAsync
  2. Http2Stream.WaitFor100ContinueAsync

The timeout exception message is

System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 60 seconds elapsing.

@alnikola
Copy link
Contributor

My first guess would be that the server hasn't responded in the given time.

@alnikola
Copy link
Contributor

alnikola commented Jul 23, 2021

My comment above might not be related to this issue because here the server responds with 500 to client whereas I observed timeouts. I'll continue the investigation.

@alnikola
Copy link
Contributor

alnikola commented Jul 26, 2021

It seems the issue is caused by the client sending EndOfStream without sending a RST_STREAM first when it has set the Content-Length and the server is still reading the request body when EndOfStream arrives. It's a protocol violation so the server throws the following exception and responds with 500.

[DBG] Connection id "0HMAGA52UA8E0": HTTP/2 stream error.
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2StreamErrorException: HTTP/2 stream ID 695 error (PROTOCOL_ERROR): Less data received than specified in the Content-Length header.
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Stream.OnEndStreamReceived() in Microsoft.AspNetCore.Server.Kestrel.Core.dll:token 0x6000da7+0x39
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Stream.OnDataAsync(Http2Frame dataFrame, ReadOnlySequence`1& payload) in Microsoft.AspNetCore.Server.Kestrel.Core.dll:token 0x6000da6+0x141
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Connection.ProcessDataFrameAsync(ReadOnlySequence`1& payload) in Microsoft.AspNetCore.Server.Kestrel.Core.dll:token 0x6000cff+0x12b
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Connection.ProcessFrameAsync[TContext](IHttpApplication`1 application, ReadOnlySequence`1& payload) in Microsoft.AspNetCore.Server.Kestrel.Core.dll:token 0x6000cfe+0xc0
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application) in Microsoft.AspNetCore.Server.Kestrel.Core.dll:token 0x6000cfa+0x9df

@alnikola
Copy link
Contributor

It might indicate there is a bug with how Http2Stream sends EndOfStream in SendRequestBodyAsync.

@Tratcher
Copy link
Member

https://datatracker.ietf.org/doc/html/rfc7540#section-8.1.2.6

A request or response that includes a payload body can include a
content-length header field. A request or response is also malformed
if the value of a content-length header field does not equal the sum
of the DATA frame payload lengths that form the body.

@adityamandaleeka
Copy link
Member

Can someone please try to repro this failure on .NET 5? That would help us understand if this is a regression.

@antonfirsov
Copy link
Member

This seemed to cause massive failures in the CI stress-http runs in the first week of July (see here) but the CI failures disappeared after 7/8.

@CarnaViire
Copy link
Member

This seemed to cause massive failures in the CI stress-http runs in the first week of July (see here) but the CI failures disappeared after 7/8.

AFAIK it started after updating to latest Kestrel version in #55098 because we needed it for HTTP/3, but in the same PR we put "active issue" on the failures not to pollute runs. Most possibly the massive failures you've seen were for that PR only, not for main?

@karelz
Copy link
Member

karelz commented Jul 29, 2021

The test were disabled on 7/8, so that checks out ;)

@alnikola will you be able to run your repro easily against earlier version of Kestrel (either 5.0, or earlier 6.0 Preview)?

@alnikola
Copy link
Contributor

I will try to run stress tests on 6.0 Preview 4 (it's reported to be healthy) or .NET 5.

@alnikola
Copy link
Contributor

alnikola commented Jul 29, 2021

@adityamandaleeka I ran these POST tests on .NET Core: 5.0.8, ASP.NET Core: 5.0.8 and they completed successfully. I have not got any errors.

@karelz
Copy link
Member

karelz commented Jul 29, 2021

@alnikola would it be possible to use latest .NET Core libraries for client side, while using old Kestrel? That would tell us if our current client has a bug or if it is Kestrel.

@adityamandaleeka
Copy link
Member

Thanks @alnikola, that's very useful info. I'll second @karelz's question too :) .

@alnikola
Copy link
Contributor

alnikola commented Jul 30, 2021

Strangely enough, but I cannot anymore reproduce this issue on my local machine on any .NET versions. Even the 6.0.0-preview.6.21352.12 where it failed initially, now works. It's not clear what could change in the last few days. I didn't install any new SDKs or runtime versions since then.

Please, disregard the above. There was a mistake. It is reproducing.

@alnikola
Copy link
Contributor

would it be possible to use latest .NET Core libraries for client side, while using old Kestrel? That would tell us if our current client has a bug or if it is Kestrel.

I tested with HttpClient [6.0.0-preview.6.21352.12] and Asp.Net Core [5.0.8]. All runs completed successfully. No failures.

@adityamandaleeka
Copy link
Member

Thank you for confirming, @alnikola. We'll prioritize investigating this regression.

@adityamandaleeka
Copy link
Member

@alnikola Can you please share instructions for how to run the stress test that reproduces this?

@alnikola
Copy link
Contributor

alnikola commented Aug 2, 2021

Can you please share instructions for how to run the stress test that reproduces this?

First, globally install the required SDK and runtime versions because stress tests use the globally installed .NET.

Then, in runtime repo:

  1. Go to the folder src/libraries/System.Net.Http/tests/StressTests/HttpStress
  2. Open ClientOperations.cs and comment out this if block in there. This if block currently suppresses the exceptions related to this issue.
  3. Open 2 console windows in this folder
  4. In the first window run the command to start server dotnet run -- -runMode server -serverUri https://+:5001
  5. In the second window, run the command to start client dotnet run -- -runMode client -serverUri https://localhost:5001 -ops 5 6 7 8 9 10 -maxContentLength 10000
  6. Watch the client's output and wait for exceptions to be thrown and logged into the console

@ManickaP
Copy link
Member Author

Depends on dotnet/aspnetcore#34768

@ManickaP
Copy link
Member Author

Triage: punting from 6.0 since the bug is handled on the ASP.NET side so the root cause should get fix in time. We can re-enable the stress test part once the fix is published in the nightly docker image, which might land after ZBB.

@ManickaP ManickaP modified the milestones: 6.0.0, Future Aug 12, 2021
@karelz karelz modified the milestones: Future, 7.0.0 Aug 13, 2021
alnikola added a commit that referenced this issue Aug 25, 2021
Enable HTTP/2 POST stress tests because, the Kestrel's issue with an incorrect HTTP/2 stream resetting seems to be fixed by dotnet/aspnetcore#34768.

Fixes #55261
@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Aug 25, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 31, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 31, 2021
@karelz
Copy link
Member

karelz commented Sep 1, 2021

Reopening to track backport to 6.0.

@karelz karelz reopened this Sep 1, 2021
@karelz karelz modified the milestones: 7.0.0, 6.0.0 Sep 1, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 2, 2021
@karelz
Copy link
Member

karelz commented Sep 9, 2021

Fixed in 6.0 RC2 in PR #58541.
Fixed in 7.0 in PR #58442.

@karelz karelz closed this as completed Sep 9, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Oct 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http disabled-test The test is disabled in source code against the issue in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants