Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constraints for vector tuple types #43

Open
nick-knight opened this issue Jun 29, 2023 · 6 comments
Open

Constraints for vector tuple types #43

nick-knight opened this issue Jun 29, 2023 · 6 comments

Comments

@nick-knight
Copy link

How do we pass vector tuple types to/from extended asm templates? It seems that using the insertion/extraction intrinsics (with the first tuple element) might be unsafe.

@kito-cheng @leekillough

@kito-cheng
Copy link
Collaborator

It's safe to use vr constraint for tuple types, and compiler could recognized the type and use the right info, this could work on GCC trunk, but seems clang trunk will got ... crash.

#include <riscv_vector.h>

void foo(){
    vint32m1x2_t v1, v2; 
    asm volatile ("# %0 %1": "=vr"(v1) : "vr"(v2));
}

@leekillough
Copy link

But how to pass certain fields of tuples as non-tuple vector registers?

If (v0,v1) is a tuple called vx, how do I pass vx.v0 or vx.v1 to inline assembly or non-segment intrinsics?

Depositing/extracting vectors from tuple aggregate types seems to defeat the purpose of segment loads/stores, unless it's just massaging for the compiler and introduces no new instructions (moves).

@kito-cheng
Copy link
Collaborator

But how to pass certain fields of tuples as non-tuple vector registers?

If (v0,v1) is a tuple called vx, how do I pass vx.v0 or vx.v1 to inline assembly or non-segment intrinsics?

Depositing/extracting vectors from tuple aggregate types seems to defeat the purpose of segment loads/stores, unless it's just massaging for the compiler and introduces no new instructions (moves).

Yes, using vget/vset to depositing/extracting vectors from tuple types, compiler will try to allocate same register to prevent extra move instruction, if you saw a move instruction and you think it not necessary, you could report bug to llvm or GCC community since that might be potential performance regression issue.

@leekillough
Copy link

leekillough commented Jul 6, 2023

The tuple intrinsic type, since it's already a type outside of C/C++ proper, could have array indexing tuple[0 .. NFIELDS-1], and this would be a lot more straightforward. It would return an lvalue of a numbered field, and it would be a constraint violation to be outside of the range 0 .. NFIELDS-1 (or to use a value which isn't a compile-time constant).

Even if array subscripting is not practical, some intrinsic like __rvv_tuple_field() to return a numbered tuple field as an lvalue, which can be assigned to or converted to an rvalue, would be more intuitive than inserting or extracting, which sometimes requires creating extra variables that hopefully the compiler will merge with the tuples'.

Porting code which used the old syntax would also be a lot easier, since you would only need to replace things like xvec_real with xvec[0] or __rvv_tuple_field(xvec, 0), and xvec_imag with xvec[1] or __rvv_tuple_field(xvec, 1). It would work whether xvec[0] and xvec[1] ended up on the LHS or RHS of an assignment, and would not need to create new temporary local variables of vector type, or require the compiler to assign them to the tuple fields' same vector registers -- it would just access them directly.

@kito-cheng
Copy link
Collaborator

@leekillough honestly we've consider adding subscripting syntax for tuple type, I could imagining it would be useful and much simple for user - but unfortunately we are lack of engineering resource to implement that :(

@leekillough
Copy link

@leekillough honestly we've consider adding subscripting syntax for tuple type, I could imagining it would be useful and much simple for user - but unfortunately we are lack of engineering resource to implement that :(

Here's a preview of what not having such a feature would require doing, unless I'm missing something:

Use the new tuple intrinsics to get rid of build errors in X280 BLIS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants