Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch data to reduce JNI calls. #517

Open
CedNaru opened this issue Oct 15, 2023 · 4 comments
Open

Batch data to reduce JNI calls. #517

CedNaru opened this issue Oct 15, 2023 · 4 comments
Labels
performance Related to performance problem

Comments

@CedNaru
Copy link
Member

CedNaru commented Oct 15, 2023

There are a few places in the code where I think we can get extra performances by batching data when we know the same JNI call would need to be called several times otherwise. Such cases are:

  • At launch, we need to exchange a lot of data between Godot and the JVM, both for api and scripts calls. Currently, we still have to wait a few extra seconds to start compared to regular Godot, and it doesn't get better with the number of registered scripts (which can easily reach hundreds).
  • The memory manager, during runtime ( binding and unbind Objects), and when closing (cleaning up the JVM objectDB). Closing a simple project right now can also take up a few seconds (Not entirely the fault of JNI, Remove GodotStatic, Singleton and Leak management from MemoryManager. #508 will also help to improve performances here).
  • Converting VariantArray to Array/List and Dictionary to Map. That one is the most important as it directly affect usercode and runtime performances. There are many cases where we don't want to use the Godot containers directly because they lack features compared to Kotlin containers. In such case, we want to convert the whole thing. The issue is that currently the conversion is just calling the regular Godot get() for each element, which is terribly slow (already received feedback from a few users regarding that).

The way to implement that would be to add new "batch" methods to TransferContext (both in C++ and Kotlin) to write and read stuff to the buffer. The buffer got quite a lot of extra space with its 8 kB after the increase to 16 parameters.
The steps of a batch call would be the following:
-Provide the batch write function with an array containing all the data.

  • Evaluate the size that data would take on the buffer after conversion (not always 1:1), prefer a pessimist heuristic to make sure we never overflow the buffer.
  • If the evaluated size is bigger than the buffer, split the data in smaller batches
  • Make one JNI call per batch.
  • On the other side, store the content of each batch in a dynamically sized container.
  • The layout of the buffer will be the same as before, the exception being an additional value sent to communicate the total number of elements to store on top of the regular amount of elements to read in the current batch (so we can preallocate the container).
  • Later on the other side, a batch read can be done to flush the content of the container.

How the VariantArray conversion to a List would be done:

  • the JVM will call a new VariantArray bridge method, transferring the VariantArray pointer to C++.
  • C++ would then call TransfertContext::batch_write with the Variant Array as a parameter.
  • batch_write will make as many JNI calls to the JVM as necessary to transfer all the data.
  • the VariantArray bridge method will return to the JVM
  • the JVM will call `TransfertContext::batchRead' to get all the data and map it to a List.

The minimum cost of batching is then 2 JNI calls + an additional level of data indirection/copy (C++ Array => SharedBuffer => Kotlin Batch Container => List).
That extra work will probably be reimbursed with VariantArrays of more than 5 elements, but it's something that would need to be benchmarked.

@CedNaru CedNaru added the performance Related to performance problem label Oct 15, 2023
@MartinHaeusler
Copy link
Contributor

I need to convert kotlin.ByteArray to godot.PackedByteArray quite frequently to load images from a server. Doing that in bulk would be very useful.

@CedNaru
Copy link
Member Author

CedNaru commented Sep 12, 2024

This feature has been implemented for PackedByteArray.
Sadly, it seems harder in practice to implement it for VariantArray and Dictionary.

The main issue lies with the very first step I mentioned earlier:
Evaluate the size that data would take on the buffer after conversion (not always 1:1), prefer a pessimist heuristic to make sure we never overflow the buffer.

I don't have any way to know evaluate the required size before checking every single value stored in those containers. It would require an additional call JNI to C++ that would iterate over the containers, retrieve all the Variant in it, and compute how many buffer exchanges it would require as well as the exact content of each. It's cumbersome to implement and slow as well (even if not as slow as the current solution).

I can still implement it for typed VariantArray (and soon typed Dictionaries), because the data type will be homogenous, so I just need the space taken by that type in the buffer, it would not be as optimized as the PackedArray conversions, but should be simple enough and a decent performance boost.

But I don't think an optimized conversion for VariantArray or Dictionary<Any, Any> is coming anymore.

@MartinHaeusler
Copy link
Contributor

@CedNaru I think the conversion from Kotlin's ByteArray to Godot's PackedByteArray is the most pressing concern, and I'm happy to see it addressed :) I will be making active use of this in my game, thank you!

@CedNaru
Copy link
Member Author

CedNaru commented Sep 12, 2024

That one is already implemented and merged into master. We also use that conversion (well the int64 version of it) to exchange pointers quickly between Godot and the JVM now when managing memory, it works like a charm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Related to performance problem
Projects
None yet
Development

No branches or pull requests

2 participants