Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in aiohttp integration since v2.2.0 #8108

Closed
agronholm opened this issue Jan 15, 2024 · 8 comments · May be fixed by #11518
Closed

Memory leak in aiohttp integration since v2.2.0 #8108

agronholm opened this issue Jan 15, 2024 · 8 comments · May be fixed by #11518

Comments

@agronholm
Copy link

Summary of problem

We noticed a huge memory leak after upgrading the ddtrace dependency in our application from v1.9.0 to v2.4.0. With Datadog tracing middleware enabled (trace_app(...)), the memory footprint of the Python process keeps increasing as requests come in, and this keeps going and going until the process runs out of memory.

Which version of dd-trace-py are you using?

Any version since v2.2.0 seems to exhibit the issue, and v2.1.9 does not.

Which version of pip are you using?

23.2.1

Which libraries and their versions are you using?

`pip freeze` aiohttp==3.9.1 aiosignal==1.3.1 attrs==23.2.0 bytecode==0.15.1 cattrs==23.2.3 ddsketch==2.0.4 ddtrace==2.4.0 Deprecated==1.2.14 envier==0.5.0 frozenlist==1.4.1 idna==3.6 importlib-metadata==6.11.0 multidict==6.0.4 opentelemetry-api==1.22.0 protobuf==4.25.2 setuptools==69.0.3 six==1.16.0 typing_extensions==4.9.0 wrapt==1.16.0 xmltodict==0.13.0 yarl==1.9.4 zipp==3.17.0

How can we reproduce your problem?

Install aiohttp (v3.9.1) and ddtrace, and then start the following trivial app:

from aiohttp import web
from ddtrace import tracer
from ddtrace.contrib.aiohttp import trace_app


async def hello(request):
    return web.Response(text="Hello, world")

app = web.Application()
app.add_routes([web.get('/', hello)])
tracer.configure(enabled=False)
trace_app(app, tracer, service="aiohttp-web")
web.run_app(app, host="0.0.0.0", port=8800)

Then keep hitting it with a lot of requests and watch the memory footprint grow in a linear fashion without bounds. On v2.1.9 and earlier, I've observed the memory footprint to plateau at around 44 MB.

What is the result that you get?

The memory footprint of the process grows without bounds.

What is the result that you expected?

The memory footprint plateaus after a while.

@emmettbutler
Copy link
Collaborator

Thank you for reporting this issue, @agronholm. We'll look into it.

@nicolashardy
Copy link

Same problem. Huge impact : dd-trace last version is unusable for us.

@ZStriker19
Copy link
Contributor

Seems likely that the culprit PR is this one: This introduces basic tracing of streaming responses that stay open long after the on_prepare signal has been sent. since it's the only aiohttp integration change in 2.2.0. Waiting to end spans sounds like it could easily end up causing a memory leak, perhaps our callback is never getting called, or the request_span isn't in the request by the time we call the callback, and then we just return. I'll test this and will update here.

@agronholm
Copy link
Author

Then why is it that the trivial app I provided also exhibits this?

@mklokocka
Copy link

I just noticed this issue. This makes newer versions of ddtrace completely unusable for aiohttp apps. I tried to look into it and I could not find a clear reason so far. @ZStriker19 did you made any progress?

From my investigation I would say that:
a) Distinguishing between StreamResponse and other types seem superfluous and if the add_done_callback actually worked correctly, it should be enough to use that.
b) However, it is not clear to me that request.task should be used for the done callback. It refers to the asyncio task running this method. In theory I think it should work, but it looks like that's not always the case.
c) If the done callback is the problem, then I think this type check is why it also applies to standard, non-stream, responses. That is because Response is a subclass of StreamResponse (as is noted here as well).
d) Unfortunately it seems aiohttp in fact only exposes the on_response_prepare signal handling so there is no easy way built-in the library itself.
e) From my testing it does not look like the spans would remain unfinished. I tested both Response and StreamResponse and ddtrace debug logging seemingly always produced a message about finishing a span (but I only tested this with a few requests). This leads me to believe that there are some references to something being kept either in ddtrace or aiohttp itself. However I have no idea on what it could be, as the only change in 2.2.0 I consider suspicious is the add_done_callback.

@github-actions github-actions bot added the stale label May 7, 2024
Copy link
Contributor

github-actions bot commented Aug 5, 2024

This issue has been automatically closed after a period of inactivity. If it's a
feature request, it has been added to the maintainers' internal backlog and will be
included in an upcoming round of feature prioritization. Please comment or reopen
if you think this issue was closed in error.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2024
@sanchda sanchda removed the stale label Aug 5, 2024
@chamini2
Copy link

chamini2 commented Nov 7, 2024

Has this been tackled at all? this got auto closed.

seandstewart pushed a commit to seandstewart/dd-trace-py that referenced this issue Nov 23, 2024
@seandstewart
Copy link
Contributor

Hello - I've submitted #11518, which should remove the strong reference to the response object in the task callback, as well as fix the type check so we only add the callback when we have an actual stream response.

seandstewart pushed a commit to seandstewart/dd-trace-py that referenced this issue Nov 24, 2024
…g responses(DataDog#8108)

Without this change, the request and response objects are not freed from memory until the asyncio Task is freed, which can create a memory leak. This change leverages a contextvar to accomplish the same result as the previous version, while ensuring any memory is freed once the current async context is exited.
seandstewart pushed a commit to seandstewart/dd-trace-py that referenced this issue Nov 24, 2024
…g responses(DataDog#8108)

Without this change, the request and response objects are not freed from memory until the asyncio Task is freed, which can create a memory leak. This change leverages a contextvar to accomplish the same result as the previous version, while ensuring any memory is freed once the current async context is exited.
seandstewart pushed a commit to seandstewart/dd-trace-py that referenced this issue Nov 24, 2024
…g responses(DataDog#8108)

Without this change, the request and response objects are not freed from memory until the asyncio Task is freed, which can create a memory leak. This change leverages a contextvar to accomplish the same result as the previous version, while ensuring any memory is freed once the current async context is exited.
seandstewart pushed a commit to seandstewart/dd-trace-py that referenced this issue Nov 24, 2024
…g responses(DataDog#8108)

Without this change, the request and response objects are not freed from memory until the asyncio Task is freed, which can create a memory leak. This change leverages a contextvar to accomplish the same result as the previous version, while ensuring any memory is freed once the current async context is exited.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants