Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recycle actor heap chunks after GC instead of returning to pool #4531

Merged
merged 2 commits into from
Oct 24, 2024

Conversation

dipinhora
Copy link
Contributor

Before this commit, any unused chunks after actor heap garbage collection would be destroyed and returned to the memory pool immediately for reuse by the runtime or any actor.

This commit changes things so that instead of destroying and returning the chunks immediatelly, we assume the actor will likely need more memory as it runs more behaviors and keep the recently unused chunks around in case that happens. This is generally more efficient than destroying a chunk and getting a new one from the memory pool because both destorying a chunk and allocating a new one involve updating the pagemap for the chunk to indicate which actor owns the chunk. Updating the pagemap is an expensive operation which we can avoid if we recycle the chunks instead. The main drawback is that since actors will no longer return chunks to the memory pool immediately after a GC, the overall system might end up using more memory as any freed chunks can only be reused by the actor that owns them and the runtime and other actors can no longer reuse that memory as they previously might have been able to.

Before this commit, any unused chunks after actor heap garbage
collection would be destroyed and returned to the memory pool
immediately for reuse by the runtime or any actor.

This commit changes things so that instead of destroying and
returning the chunks immediatelly, we assume the actor will likely
need more memory as it runs more behaviors and keep the recently
unused chunks around in case that happens. This is generally more
efficient than destroying a chunk and getting a new one from the
memory pool because both destorying a chunk and allocating a new
one involve updating the pagemap for the chunk to indicate which
actor owns the chunk. Updating the pagemap is an expensive operation
which we can avoid if we recycle the chunks instead. The main
drawback is that since actors will no longer return chunks to the
memory pool immediately after a GC, the overall system might end
up using more memory as any freed chunks can only be reused by the
actor that owns them and the runtime and other actors can no longer
reuse that memory as they previously might have been able to.
@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Oct 15, 2024
small_chunk_t* n = NULL;

// recycle a small chunk if available because it avoids setting the pagemap
if (NULL != heap->small_recyclable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our normal pattern in the codebase is to compare to NULL as the second item. Is there a reason for the variance in this patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason except that i accidentally forgot an = and didn't notice and that's much easier to track down with the condition written this way..

@SeanTAllen
Copy link
Member

I see this will free any recycled chunks that aren't reused after 1 gc pass.

Can you help me work through how this will work in practice. If an actor never gets gc'd, this won't have any impact. If an actor only got gc'd once, then some unknown amount of memory would not be freed, and if the actor is gc'd more than once, the same would still apply, some amount of memory would continue to be held for recycling. Yes?

@dipinhora
Copy link
Contributor Author

I see this will free any recycled chunks that aren't reused after 1 gc pass.

Can you help me work through how this will work in practice. If an actor never gets gc'd, this won't have any impact. If an actor only got gc'd once, then some unknown amount of memory would not be freed, and if the actor is gc'd more than once, the same would still apply, some amount of memory would continue to be held for recycling. Yes?

correct..

  • if an actor is never gc'd, it doesn't matter
  • when an actor is gc'd, new unused chunks get saved for recycling, old chunks saved for recycling that were never reused get returned to the runtime memory pool
  • if an actor gets destroyed (by the cycle detector or some other mechanism), any chunks saved for recycling get returned to the runtime memory pool
  • if an actor blocks, any chunks saved for recycling are "stuck" in the actor/not returned to the runtime memory pool and cannot be re-used by the rest of the runtime or another actor

that last point is where there's possibility of tweaking things by having the actor return any chunks saved for recycling to the runtime memory pool instead of continuing to hold onto them when it blocks..

@SeanTAllen SeanTAllen added the changelog - changed Automatically add "Changed" CHANGELOG entry on merge label Oct 22, 2024
@ponylang-main
Copy link
Contributor

Hi @dipinhora,

The changelog - changed label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do.

Release notes are added by creating a uniquely named file in the .release-notes directory. We suggest you call the file 4531.md to match the number of this pull request.

The basic format of the release notes (using markdown) should be:

## Title

End user description of changes, why it's important,
problems it solves etc.

If a breaking change, make sure to include 1 or more
examples what code would look like prior to this change
and how to update it to work after this change.

Thanks.

@dipinhora
Copy link
Contributor Author

release notes added

@SeanTAllen SeanTAllen merged commit 4db530e into ponylang:main Oct 24, 2024
21 checks passed
@ponylang-main ponylang-main removed the discuss during sync Should be discussed during an upcoming sync label Oct 24, 2024
github-actions bot pushed a commit that referenced this pull request Oct 24, 2024
github-actions bot pushed a commit that referenced this pull request Oct 24, 2024
@SeanTAllen
Copy link
Member

@dipinhora first night after this was merged, all the stress tests failed.

I'm waiting to see what happens with tonight's.

@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Oct 26, 2024
@dipinhora
Copy link
Contributor Author

@dipinhora first night after this was merged, all the stress tests failed.

I'm waiting to see what happens with tonight's.

@SeanTAllen i looked through all the logs and they all end with:

Error: The operation was canceled.

it doesn't seem like anything actionable (i.e. no crashes).

@dipinhora
Copy link
Contributor Author

scratch that... looking at older stress test runs they all finished in under 30 mins - 1 hour and the new runs seem to time out after 6 hours.. something too look into..

dipinhora added a commit to dipinhora/ponyc that referenced this pull request Oct 26, 2024
The implementation of actor heap large chunk recycling from ponylang#4531
too naive and results in actors wasting huge amounts of time related
to large chunk recycling.

This commit effectively disables the large chunk recycling but
doesn't undo the code changes at the moment because the expectation
is that large chunk recycling will be re-enabled in the near future
with an improved implementation.
@dipinhora
Copy link
Contributor Author

@SeanTAllen #4534 has been opened to resolve the stress test issue.

SeanTAllen pushed a commit that referenced this pull request Oct 26, 2024
The implementation of actor heap large chunk recycling from #4531
too naive and results in actors wasting huge amounts of time related
to large chunk recycling.

This commit effectively disables the large chunk recycling but
doesn't undo the code changes at the moment because the expectation
is that large chunk recycling will be re-enabled in the near future
with an improved implementation.
dipinhora added a commit to dipinhora/ponyc that referenced this pull request Oct 27, 2024
The implementation of actor heap chunk recycling from ponylang#4531 had two
bugs. First, the large heap re-use logic (which was temporarily
disabled in ponylang#4534) had a bug related to how it updated the large
chunk recyclable list pointer in the heap. Second, the memory
clearing logic in the `ponyint_heap_endgc` function was clearing
more of the heap than it should have been resulting in a memory
leak for both small and large chunk recyclable chunks.

This commit re-enabled large chunk recycling (undoing ponylang#4534) and
fixes both bugs so that both large chunk and small chunk recycling
work as expected without memory leaks.
SeanTAllen pushed a commit that referenced this pull request Oct 27, 2024
The implementation of actor heap chunk recycling from #4531 had two
bugs. First, the large heap re-use logic (which was temporarily
disabled in #4534) had a bug related to how it updated the large
chunk recyclable list pointer in the heap. Second, the memory
clearing logic in the `ponyint_heap_endgc` function was clearing
more of the heap than it should have been resulting in a memory
leak for both small and large chunk recyclable chunks.

This commit re-enabled large chunk recycling (undoing #4534) and
fixes both bugs so that both large chunk and small chunk recycling
work as expected without memory leaks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - changed Automatically add "Changed" CHANGELOG entry on merge discuss during sync Should be discussed during an upcoming sync
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants