-
Before filing a bug I'd like to discuss the situation: The issue is in the following: But when the app connects to a distant located NATS, batch.Error() function returns the "nats: timeout" error.
Noteworthy, this error is also returned then I connect to the SAME LOCAL NATS server, but when connection is done with an increased RTT/using more TCP hops than local connection. I've done that with the help of my VPS using SSH port forwarding to the server and back (
then
I've debugged the traffic and I've seen that normally
but with increased RTT
Also, I've cloned the diff --git a/test/kv_test.go b/test/kv_test.go
index a1b5ca4..2b62b28 100644
--- a/test/kv_test.go
+++ b/test/kv_test.go
@@ -999,7 +999,7 @@ func TestKeyValueDuplicatesWindow(t *testing.T) {
func client(t *testing.T, s *server.Server, opts ...nats.Option) *nats.Conn {
t.Helper()
- nc, err := nats.Connect(s.ClientURL(), opts...)
+ nc, err := nats.Connect("nats://localhost:4222", opts...)
if err != nil {
t.Fatalf("Unexpected error: %v", err)
} tests they expectedly succeed
But when I use NATS with increased RTT (change the port to 4220 as above) diff --git a/test/kv_test.go b/test/kv_test.go
index a1b5ca4..2b62b28 100644
--- a/test/kv_test.go
+++ b/test/kv_test.go
@@ -999,7 +999,7 @@ func TestKeyValueDuplicatesWindow(t *testing.T) {
func client(t *testing.T, s *server.Server, opts ...nats.Option) *nats.Conn {
t.Helper()
- nc, err := nats.Connect(s.ClientURL(), opts...)
+ nc, err := nats.Connect("nats://localhost:4220", opts...)
if err != nil {
t.Fatalf("Unexpected error: %v", err)
} tests
I guess this is a bug in |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
What is your default max wait and time out options?
…On Fri, Jul 19, 2024, 8:21 AM Alexey Shumkin ***@***.***> wrote:
Before filing a bug I'd like to discuss the situation:
I have a made a little demo app which reproduces the issue -
https://github.com/ashumkin/tests-nats-fetch-batch-timeout/blob/8747438a37a0ee0eab1151c4aec068fa18ecb2ef/main.go
The issue is in the following:
When the app is run connected to the local NATS instance all works as
expected: no error messages printed when there are no messages in a stream.
Just a bunch of messages like
... INFO batch processed count=0
The app is run (stream is created via nats stream add test --subjects
test --defaults)
go run main.go localhost test 5s
But when the app connects to a distant located NATS, batch.Error()
function returns the "nats: timeout" error.
E.g. to demo.nats.io + stream subscription-updates.
go run main.go demo.nats.io subscription-updates 5s
... WARN failed to fetch batch err="nats: timeout" count=0
Noteworthy, this error is also returned then I connect to the SAME LOCAL
NATS server, but when connection is done with an increased RTT/using more
TCP hops than local connection. I've done that with the help of my VPS
using SSH port forwarding to the server and back (ssh MY-VPS -L
127.0.0.1:4220:127.0.0.1:4222 -R 127.0.0.1:4222:127.0.0.1:4222 -N).
go run main.go localhost:4220 test 5s (port 4220 goes to the VPS and back
to local NATS on port 4222)
then
... WARN failed to fetch batch err="nats: timeout" count=0
nats.go version is github.com/nats-io/nats.go v1.36.0
I've debugged the traffic and I've seen that normally
1. client sends ...CONSUMER.MSG.NEXT.. requests
2. servers sends 408 timeout reponse
3. then steps 1-2 are repeated
but with increased RTT
1. client sends ...CONSUMER.MSG.NEXT.. request
2. then client sends next ...CONSUMER.MSG.NEXT.. request
3. servers sends 408 timeout response to the 1st request
4. client receives server's response and already "do not know" its
reqID
<https://github.com/nats-io/nats.go/blob/97e6a525e9d861338e89d31e81063a726a718941/js.go#L3226>
as it already have sent a new request, so it fails with the error
5. then repeat 2-4
Also, I've cloned the nats.go library (tag v1.36.0) and run the patched
diff --git a/test/kv_test.go b/test/kv_test.go
index a1b5ca4..2b62b28 100644--- a/test/kv_test.go+++ b/test/kv_test.go@@ -999,7 +999,7 @@ func TestKeyValueDuplicatesWindow(t *testing.T) {
func client(t *testing.T, s *server.Server, opts ...nats.Option) *nats.Conn {
t.Helper()- nc, err := nats.Connect(s.ClientURL(), opts...)+ nc, err := nats.Connect("nats://localhost:4222", opts...)
if err != nil {
t.Fatalf("Unexpected error: %v", err)
}
tests
go test ./test -run FetchBatch
they expectedly succeed
ok github.com/nats-io/nats.go/test 1.576s
But when I use NATS with increased RTT (change the port to 4220 as above)
diff --git a/test/kv_test.go b/test/kv_test.go
index a1b5ca4..2b62b28 100644--- a/test/kv_test.go+++ b/test/kv_test.go@@ -999,7 +999,7 @@ func TestKeyValueDuplicatesWindow(t *testing.T) {
func client(t *testing.T, s *server.Server, opts ...nats.Option) *nats.Conn {
t.Helper()- nc, err := nats.Connect(s.ClientURL(), opts...)+ nc, err := nats.Connect("nats://localhost:4220", opts...)
if err != nil {
t.Fatalf("Unexpected error: %v", err)
}
tests
go test ./test -run FetchBatch
fail!
--- FAIL: TestPullSubscribeFetchBatchWithHeartbeat (1.49s)
js_test.go:1358: Unexpected error: nats: no heartbeat received
--- FAIL: TestPullSubscribeFetchBatch (39.70s)
--- FAIL: TestPullSubscribeFetchBatch/multiple_concurrent_fetches (7.92s)
js_test.go:1547: Timeout waiting for incoming messages
--- FAIL: TestPullSubscribeFetchBatch/context_timeout,_no_error (1.38s)
js_test.go:1689: Unexpected error: context deadline exceeded
--- FAIL: TestPullSubscribeFetchBatch/request_expired (0.91s)
js_test.go:1716: Unexpected error: nats: timeout
--- FAIL: TestPullSubscribeFetchBatch/remove_durable_consumer_during_fetch (11.07s)
js_test.go:1779: Expected error: nats: consumer deleted; got: %!s(<nil>)
--- FAIL: TestPullSubscribeFetchBatch/close_subscription (11.16s)
js_test.go:1837: Unexpected error: nats: consumer not found
FAIL
FAIL github.com/nats-io/nats.go/test 41.194s
FAIL
I guess this is a bug in FetchBatch? I guess, tests must not fail
depending on RTT to the server?
—
Reply to this email directly, view it on GitHub
<#1683>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAH4GED5ZFOQQYJYNUM2UELZNEAFZAVCNFSM6AAAAABLEPTUQGVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWHE2TGMBXHE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hello @ashumkin. The timeout you're encountering is introduced as a safeguard in case we don't get the timeout response from the server. It has to be larger than the expiry set on the pull request, but the value needs to be arbitrary since we can't really speculate on the RTT. That being said, I see that the added time is way to small now (10ms), giving high probability of hitting the internal (client) timeout rather than waiting for server timeout. I'll increase the value so that it should work fine under most circumstances but you still may be getting timeout errors on unusually high RTTs. |
Beta Was this translation helpful? Give feedback.
Hello @ashumkin. The timeout you're encountering is introduced as a safeguard in case we don't get the timeout response from the server. It has to be larger than the expiry set on the pull request, but the value needs to be arbitrary since we can't really speculate on the RTT.
That being said, I see that the added time is way to small now (10ms), giving high probability of hitting the internal (client) timeout rather than waiting for server timeout. I'll increase the value so that it should work fine under most circumstances but you still may be getting timeout errors on unusually high RTTs.