Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@ByPtrPtr array argument for const pointer needs to be copied back from native call results #776

Open
atsushieno opened this issue Sep 2, 2024 · 15 comments · May be fixed by #777
Open

Comments

@atsushieno
Copy link

atsushieno commented Sep 2, 2024

I have been trying to build a JavaCPP binding for libremidi, and had been failing to bind this function:

LIBREMIDI_EXPORT
int libremidi_midi_out_port_name(
    const libremidi_midi_out_port* port, const char** name, size_t* len);

The implementation for this function assigns the actual pointer to the string into name.

Now, when I tried to use the binding (I could manage to build it with some mappings) via byte[] (excuse my Kotlin code):

name = ByteArray(1024)
println("libremidi_midi_out_port_name returned " + library.libremidi_midi_out_port_name(port, name, size))

it failed to get the expected string. It did work when I made changes to the code to use BytePointer though:

val nameBuf = ByteBuffer.allocateDirect(1024)
val namePtr = BytePointer(nameBuf) // we should only need a space for pointer...
println(namePtr.address())
println("libremidi_midi_out_port_name returned " + library.libremidi_midi_out_port_name(port, namePtr, size))
println(namePtr.address())
println("name size: " + size.get())
name = ByteArray(size.get().toInt())
namePtr.asBuffer().get(name)

The thing is that the pointee string at const char** name can be modified, and when we pass byte[] that needs to be copied back to the binding parameter. It seems that JavaCPP works if the C function argument were char**.

The cause of the problem is at Generator.parametersAfter():

            // If const array, then use JNI_ABORT to avoid copying unmodified data back to JVM
            final String releaseArrayFlag;
            if (cast.contains(" const *") || cast.startsWith("(const ")) {
                releaseArrayFlag = "JNI_ABORT";
            } else {
                releaseArrayFlag = "0";
            }

The resulting generated code (jnilibremidi.cpp), especially for libremidi_midi_out_port_name() with byte[], looks like this.

This assumes that the const char ** argument must be used only as an input and not to alter the output. But that's not correct. It would be true only if it is const char * const *. This StackOverflow question describes it well, namely:

const char* the_string : I can change which char the_string points to, but I cannot modify the char to which it points.

const char* const the_string : I cannot change which char the_string points to, nor can I modify the char to which it points.

It is implemented for some optimization, but to ensure the implementation correctness the copying condition at parametersAfter() should rather be like:

            if (cast.contains("* const") || cast.endsWith("const)")) {
                releaseArrayFlag = "JNI_ABORT";
            } else {
                releaseArrayFlag = "0";
            }

I will create a PR based on this suggestion.

atsushieno added a commit to atsushieno/javacpp that referenced this issue Sep 2, 2024
…* const`s.

This fixes bytedeco#776

Before this change, we did not copy array contents back to the argument array
for "const **" as it is regarded as immutable. But it in fact it mutable.
This StackOverflow question describes it well.
https://stackoverflow.com/questions/4949254/const-char-const-versus-const-char

After this change, it will avoid copying if the parameter pointer itself is
const.
@saudet
Copy link
Member

saudet commented Sep 2, 2024

From what I understand of how libremidi_midi_out_port() works, the pointer of "name" itself gets returned, and since that is not memory allocated by the JVM, the mode flag has no effect anyway so that won't help:
https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html
You need to use BytePointer, not ByteBuffer or byte[].

@atsushieno
Copy link
Author

JNI ReleaseByteArrayElements() is designed to work with altered pointer as it still takes the native buffer location. In your code, the function invocation looks like this -

if (arg1 != NULL) env->ReleaseByteArrayElements(arg1, (jbyte*)ptr1, JNI_ABORT);

here (jbyte*) ptr1 could be a different pointer from what GetByteArrayElements() returned.

I created the PR as I confirmed that the change brought in the expected result with byte[] argument.

@saudet
Copy link
Member

saudet commented Sep 2, 2024

It might work with your version of the JDK, but it's not part of the JNI specs. Anyway, in the worst case it gets ignored, so could you make your change additive, such that JNI_ABORT does not get used for any case that it is not currently being used for?

@HGuillemet
Copy link
Contributor

HGuillemet commented Sep 3, 2024

I also doubt that passing to ReleaseByteArrayElements a pointer that does not "belong" to JNI is a right thing to do. At best there will be a memory leak since the original pointer returned by GetByteArrayElements will never be freed.

Anyway, you don't want to copy the data pointed by the new pointer to the Java array, do you ? We don't even know its length: it can differ from the original Java array.

So the question for me is whether something must be changed in JavaCPP.
As is, the presets author must limit the mapping of the argument to BytePointer with info maps. If [s]he does not, overloads are generated that does not do what we would expect them to do, as @atsushieno experienced.
Could we automatically avoid to generate the ByteBuffer and byte[] overloads when the library can change the pointer (something like there is more than 1 indirection and the pointer is not const) ?

@atsushieno
Copy link
Author

atsushieno commented Sep 4, 2024

@HGuillemet you're right on that JVM does not know the exact size of buffer. It does not apply to this case, (edit) but As long as ReleaseByteArrayElements() copies the expected buffer content back to the byte[] argument and I can control the expectation I would find it useful. In theory JVM can release the internal copies of the buffer when its bound JVM array is GC-ed anyway (JNI documentation does not state that ReleaseByteArrayElements() frees the memory from the pointer argument). But that is likely a corner case enhancement.

Ragardless, I am all for the right implementation. My understanding as of now is that (1) current implementation is wrong in that it is copying the buffer back to byte[] if the argument is non-const, and (2) there is no documentation that describes what kind of Java methods are created from a function that takes one or more C/C++ array-likes and no guidelines are shown.

@HGuillemet
Copy link
Contributor

(1) current implementation is wrong in that it is copying the buffer back to byte[] if the argument is non-const

Right. The following code, without infomap, causes a segmentation fault:

void libcall(char** name) {
  *name =NULL;
}
  byte[] b = new byte[] { 18 };
  libcall(b);
  System.out.println(b[0]);

While the same code with const char** works and prints 18 for no valid reason.
Note that, something like this:

static const char* a = "a";
void libcall(char** name) {
  *name = (char *) a;
}
byte[] b = new byte[] { 18, 19, 20, 21 };
libcall(b);
System.out.println(b[0]);

Also aborts and doesn't print 96: the new pointer is not copied back.

@saudet
Copy link
Member

saudet commented Sep 5, 2024

So, if I understand correctly, the changes in pull #777 would make it crash not only in the char** case, but also in the const char** case. What's the point of those changes?

@atsushieno
Copy link
Author

If I understand correctly, the latest consequence is that current code (without my change) is already problematic depending on how the invoked native code behaves (updates the pointer, with short buffer than the length of array).

I left my PR open only because depending on the consequence it might be useful to update the change.

@HGuillemet
Copy link
Contributor

I think the right logic would be to prevent the generator to generate the overloads that use primitive arrays or buffers when the C++ function can change the value of a pointer in the chain of indirections.
For instance, in the case of 3 indirections: char ***, char * const **, const char * const **, etc...
But char * const * const * is ok.

And, if the overloads are generated, the JNI_ABORT should be used iif the char themselves cannot be change by the library, as @atsushieno pointed out, for instance const char * const * const *.

@saudet
Copy link
Member

saudet commented Sep 5, 2024

Ok, sounds alright, please approve pull #777 if it looks good, thanks!

@atsushieno
Copy link
Author

atsushieno commented Sep 5, 2024

Yes. I would find it best if we can still optionally generate those extra methods if a binding author specifies some annotation in the mappings.

But that should be documented, and actually how the array -like mappings work are currently not documented. We (binding developers) would have no idea on how things work and what we are not supposed to do with those generated methods (like, I was told to not do this only on this issue).

@atsushieno
Copy link
Author

Oh, no, let's not approve #777 as is, that's not what we want.

@saudet
Copy link
Member

saudet commented Sep 5, 2024

But that should be documented, and actually how the array -like mappings are currently not documented. We (binding developers) would have no idea on how things work and what we are not supposed to do with those generated methods (like, I was told to not do this only on this issue).

There are a lot of things that are just mechanically generated with no guarantees to work. Contributions are welcome!

@HGuillemet
Copy link
Contributor

There are a lot of things that are just mechanically generated with no guarantees to work.

So we don't try to implement the first part of my last message ? Or do you see an easy way to do it ?

@saudet
Copy link
Member

saudet commented Sep 5, 2024

Sure, that sounds good. @ByPtrPtr should only map the case of 2 indirections though, so no need to think further than that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants