Kubernetes: `request.body()` taking a long time when targeting server through a LoadBalancer service #2473

janluke · 2023-10-19T16:57:58Z

janluke
Oct 19, 2023

I have a Litestar app deployed to AKS (Azure Kubernetes Cluster). This app wraps a ML model and return some time metrics alongside the actual results. In the past days I've been running a simple test that makes a n sequential requests (no concurrence), collect time metrics and plot results. And something really weird that I can't explain happens. My metrics include the time taken for deserializing the request body, measured as following:

#
# Simplified
#
from timeit import default_timer as timer

@post(...)
async def ranking(request: Request, ...) -> Response[bytes]: 
    start = timer()
    ... # irrelevant fast code to extract the content-type and accept headers
    body = await request.body()
    data = deserialize(body, content_type, RankingRequest)  # deserialize with msgspec
    deserialization_time = timer() - start

Now, using httpx.Client().post as http client:

when I target my k8s service using port-forwarding in Kubernetes, the deserialization takes less than a ms
when I target the same service directly using its external IP (it's a LoadBalancer service), it takes 100x more

Notes:

I don't have the same issue when making exactly the same requests from postman.
Using requests.Session as http client, deserialization gets faster, but still 10x slower when using the LoadBalancer service
The overhead doesn't seem to depend on the body size, which increases linearly with the history_length you see in the x axis.

From my colleague PC (using httpx), we get similar results but in another "scale", i.e. the overhead is similar to what I get with requests.

I can't wrap my mind around this, isn't the request body already in the k8s node when I start reading it with request.body()?

provinzkraut · 2023-10-20T18:54:13Z

provinzkraut
Oct 20, 2023
Maintainer

isn't the request body already in the k8s node when I start reading it with request.body()?

Probably not. When you're awaiting request.body(), this will usually start to read data from the socket (I say usually because technically this is up to the ASGI server as it's not defined in the spec, so it could also be prefetched. I don't know of any server that does this though).

Here's a little Litestar app to demonstrate this:

from litestar import post, Request

from litestar.testing import create_test_client


@post("/")
async def handler(request: Request) -> None:
    print("start receiving")
    await request.body()
    print("done receiving")


def generate_body():
    print("start sending")
    yield b""


with create_test_client([handler]) as client:
    client.post("/", content=generate_body())

Running this will output

start receiving
start sending
done receiving

(Pure ASGI version)

import httpx
import asyncio


async def generate_body():
    print("start sending")
    yield b""


async def app(scope, receive, send):
    print("start receiving")
    await receive()
    print("done receiving")

    await send({"type": "http.response.start", "status": 200, "headers": [[b"content-type", b"text/plain"]]})
    await send({"type": "http.response.body", "body": b""})


async def main():
    async with httpx.AsyncClient(app=app, base_url="http://testserver") as client:
        await client.post("/", content=generate_body())


asyncio.run(main())

Notice how the sending doesn't start before request.body() is awaited. So what you're timing there is, in all likelihood, the network latency + deserialisation, instead of just the deserialisation.

Can I ask why you've set up your route handler this way, instead of just letting Litestar handle the deserialisation and data fetching?

This should be equivalent to your setup:

@post(...)
async def ranking(data: RankingRequest) -> Response[bytes]: 
    ...

1 reply

janluke Oct 21, 2023
Author

Thanks for the detailed answer @provinzkraut!

So what you're timing there is, in all likelihood, the network latency + deserialisation, instead of just the deserialisation.

Makes sense, but it still doesn't explain:

the huge difference between LoadBalancer and port-forwarding; I think with port-forwarding the load balancer is actually skipped (the request arrives directly to a pod), but the overhead is still too much
the fact that the overhead is completely unrelated to body size: a size 100x bigger is actually faster to be sent and deserialized
the fact that I can reduce the overhead by 1 order of magnitude by using requests instead of httpx (both sync)
that there's not overhead if I use postman.

Something I didn't mention is that I also measured the client-side response time and the overhead is reflected on that metric as well (I thought that for some reason maybe I was measuring networking itime n one case and not in the other but in the end the response time was the same: nope).

Can I ask why you've set up your route handler this way, instead of just letting Litestar handle the deserialisation and data fetching?

Oh, I wanted to reach out for that when I wrote the handler. The endpoint supports both JSON and MessagePack (for requests and response bodies), plus the serialization to MessagePack is a bit "custom" for performance reasons (I serialize a long float vector using numpy.tobytes). Also, I don't remember if I tried or not, but does Litestar support multiple media types and content negotiation without using Request?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Litestar

Kubernetes: `request.body()` taking a long time when targeting server through a LoadBalancer service #2473

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Litestar

Kubernetes: request.body() taking a long time when targeting server through a LoadBalancer service #2473

janluke Oct 19, 2023

Replies: 1 comment · 1 reply

provinzkraut Oct 20, 2023 Maintainer

janluke Oct 21, 2023 Author

Kubernetes: `request.body()` taking a long time when targeting server through a LoadBalancer service #2473

janluke
Oct 19, 2023

Replies: 1 comment 1 reply

provinzkraut
Oct 20, 2023
Maintainer

janluke Oct 21, 2023
Author