Add sifive_x280 configuration #737

Aaron-Hutchinson · 2023-03-29T19:05:11Z

This PR adds a new configuration to BLIS, called sifive_x280. This configuration is built for the RISC-V instruction set architecture and is optimized for SiFive's X280 processor. Included are implementations for most level 1, 1f, and 3 kernels, with the level 3 gemm and gemmtrsm kernels receiving the most attention.

Since this configuration targets RISC-V, compiling it and running tests on typical machines is challenging. For convenience, we've written a simple script that aims to make testing this configuration easier. The script can be found here, which has the following flow:

downloads and builds the RISC-V GNU Linux toolchain (just for the C runtime)
downloads and builds the RISC-V LLVM Linux toolchain, integrating the C runtime from GNU
downloads QEMU and builds the riscv64 Linux usermode emulator
downloads BLIS, configures it for sifive_x280, builds it, and runs make check.

Developers for the sifive_x280 implementation (in alphabetical order):

Special thanks to @fgvanzee for their assistance in debugging various issues and helping our team understand the BLIS framework.

We look forward to your feedback and are very excited to join the BLIS community.

devinamatthews · 2023-03-29T20:42:43Z

@Aaron-Hutchinson @nick-knight @myeh01 awesome work, much appreciated! Regarding steps 1-3 of the testing process, can these product be pre-built? This would really help CI build times... @angsch and @leekillough have been putting similar things here.

nick-knight · 2023-03-29T22:58:06Z

@devinamatthews Yes, absolutely. The GNU toolchain build, in particular, is substantial. But your comment touches on a larger shortcoming of our PR: we have not addressed CI. (We meant to add a comment about this when we submitted the PR.) We are hoping for some guidance from the community on the best way to go about this, since we have little experience with setting up CI, and none with BLIS CI in particular.

devinamatthews · 2023-03-30T02:08:44Z

The PR can be merged without it. Once we get at least one RISC-V configuration running reliably in Travis then adding more shouldn't be too difficult.

angsch · 2023-03-30T12:15:09Z

The PR can be merged without it. Once we get at least one RISC-V configuration running reliably in Travis then adding more shouldn't be too difficult.

I think that we can extend the CI infrastructure that @leekillough and I set up. I am happy to help here. Further, before merging the PR, it would be good to check how the x280 target interacts with the auto configure and ISA detection work that we added.

leekillough · 2023-03-30T17:09:36Z

Is there a C macro which is always defined when an X280 compiler is being used?

There is an auto-detect mechanism which auto-detects RISC-V architecture based on __riscv* macros. It is used when ./configure auto is invoked.

I want to improve it so that it can also detect X280, because with our PR, it will detect X280 as rv64iv.

If X280 is detected when configure auto is used, do you want it choose the sifive_x280 configuration?

Aaron-Hutchinson · 2023-03-31T22:10:51Z

Regarding steps 1-3 of the testing process, can these product be pre-built? This would really help CI build times... @angsch and @leekillough have been putting similar things here.

I'd be happy to upload a tarball of the prebuilt toolchain and QEMU for CI purposes. It looks like there's already a QEMU tarball in the link in your post, so I can try replacing the QEMU portion of our automation script with just downloading and unpacking that tarball. I can also do something similar with the prebuilt toolchain once it's uploaded.

I think then translating the script over to CI would be much smoother.

Is there a C macro which is always defined when an X280 compiler is being used?

Our automation script uses the upstream toolchain, so I'm not sure if there would be a way to differentiate it from rv64iv through C preprocessor macros. @nick-knight would be able to say more, but is out-of-office through next week.

devinamatthews · 2023-03-31T22:16:24Z

Is there anything like cpuid on RISC-V?

leekillough · 2023-03-31T23:26:01Z

@devinamatthews: There is no need to use a runtime cpuid on RISC-V, because there are predefined macros in the RISC-V C API. Using a cross-compiler and executing RISC-V on a different host architecture, requiring the use of a simulator in configure, would be awkward. Fortunately, the RISC-V C API provides preprocessor macros for architecture detection, so $(CC) -E can be used to autodetect the RISC-V architecture.

@devinamatthews, @Aaron-Hutchinson @nick-knight :

There are two RISC-V autodetection header files in PR693:

bli_riscv_cpuid.h, which returns one of rv32i, rv32iv, rv64i, rv64iv or generic, depending on whether one of 4 major RISC-V architectures are detected (XLEN=32 and XLEN=64, with and without V vector extension). In the configure script, if this autodetection header returns generic, then the existing BLIS autodetection mechanism is fallen back on.

bli_riscv_detect_arch.h, which returns the full detected RISC-V architecture string, such as rv64imafdcv. The result of this header is used to form the -march= option. On some versions of Clang and GCC, -march=...v needs to be specified to enable the V vector extension to be enabled, which is forced in the BLIS rv32iv and rv64iv configurations by using -DFORCE_RISCV_VECTOR when preprocessing bli_riscv_detect_arch.h, because preprocessing the header with default compiler options would not enable V.

devinamatthews · 2023-04-01T01:24:40Z

But if two companies make rv64iv chips how do you tell them apart?

leekillough · 2023-04-01T01:34:57Z

But if two companies make rv64iv chips how do you tell them apart?

Hence my question in #737 (comment).

@angsch and I have created a foundational RISC-V BLIS port which should be adaptable to all RISC-V variants. But we understand that there may be specific BLIS implementations for specific RISC-V implementations.

The BLIS RISC-V autodetection mechanism is able to identify base features of the RISC-V implementation, such as whether A, M, V extensions are available, but unless there is a C macro to autodetect x280 or other implementations, the BLIS user will need to specify ./configure sifive_x280 instead of ./configure auto in order to get the most features out of a particular RISC-V implementation.

Aaron-Hutchinson · 2023-04-03T23:24:40Z

Regarding prebuilding the toolchain for CI, I'm not sure how portable the toolchain that our script creates is. It appears it hardcodes some of the filepaths, and I fear this may cause some issues if I were to create a tarball of my local build and upload it (I have limited knowledge in this area, so correct me if I'm wrong).

Would it be possible to have one of the CI machines build the toolchain itself and save the result for future runs?

angsch · 2023-04-04T07:55:58Z

Regarding prebuilding the toolchain for CI, I'm not sure how portable the toolchain that our script creates is. It appears it hardcodes some of the filepaths, and I fear this may cause some issues if I were to create a tarball of my local build and upload it (I have limited knowledge in this area, so correct me if I'm wrong).

That concern is justified. I encountered incompatibilities when I first packaged qemu. To package qemu, I had to replicate the build environment of the CI machine. Further, the build of the toolchain was susceptible to the execution environment. I think that the incompatibilities are solely due to dismatching version of linked libraries such as glibc.

I suggest that you use the tarball of qemu and the toolchain that Lee and I use in our PR. That runs successfully on the CI machine.

angsch · 2023-04-04T07:56:58Z

Would it be possible to have one of the CI machines build the toolchain itself and save the result for future runs?

I tried this and it is not possible. The Travis runs will hit a timeout.

Aaron-Hutchinson · 2023-04-04T17:22:59Z

I tried this and it is not possible. The Travis runs will hit a timeout.

Can the timeout be increased for the steps that build the toolchain/QEMU?

angsch · 2023-04-04T18:07:58Z

Can the timeout be increased for the steps that build the toolchain/QEMU?

We were recommended to aim at a runtime of below 10 minutes for our rv[32,64]iv target. Note that make -j does not do the trick. Further, since your CI target will be triggered also when something unrelated is pushed (e.g. a non-RISC-V target), building the toolchain will burn CPU hours.

Aaron-Hutchinson · 2023-04-04T18:14:46Z

We were recommended to aim at a runtime of below 10 minutes for our rv[32,64]iv target. Note that make -j does not do the trick. Further, since your CI target will be triggered also when something unrelated is pushed (e.g. a non-RISC-V target), building the toolchain will burn CPU hours.

Again please forgive my limited experience in this area. I would think there would be a way to save the toolchain and QEMU builds for use over multiple CI invocations and only build them when they either don't already exist on the machine or the builds become out of date. This way, they're only built once on the CI machine and nearly all CI runs will skip over the build steps for the toolchain and QEMU.

devinamatthews · 2023-04-04T19:13:34Z

I think Travis also has Docker images of the CI environment which you can run locally.

leekillough · 2023-04-04T19:19:05Z

@Aaron-Hutchinson:

GitHub has a 100 MB limit on tracked files before it requires paid service.

Instead of files stored in the distribution, we would need to use released binaries, which have a 2 GB limit. That is the same 2 GB limit for Git Large File Storage in GitHub.

Travis has quotas on how much CPU, memory and disk space can be used. Once the credits run out for a billing period, they must be bought with paid-for credits, or wait until the next billing period. See this also.

According to @angsch, the dependency on linked libraries makes it a necessity to build the toolchain in an environment that is compatible with the CI machines. So you need to build on a fresh Ubuntu Focal machine / Docker container.

devinamatthews · 2023-04-04T19:20:23Z

Chunked tar.gz? From: "Field G. Van Zee" ***@***.***> Reply-To: flame/blis ***@***.***> Date: Tuesday, April 4, 2023 at 2:19 PM To: flame/blis ***@***.***> Cc: "Matthews, Devin" ***@***.***>, Mention ***@***.***> Subject: Re: [flame/blis] Add sifive_x280 configuration (PR #737) [EXTERNAL SENDER] @Aaron-Hutchinson<https://github.com/Aaron-Hutchinson>: GitHub has a 100 MB limit<https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github> on tracked files before it requires paid service. Instead of files stored in the distribution, we would need to use released binaries, which have a 2 GB limit<https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#distributing-large-binaries>. That is the same 2 GB limit<https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage> for Git Large File Storage in GitHub. Travis has quotas<https://docs.travis-ci.com/user/billing-faq/> on how much CPU, memory and disk space can be used. Once the credits run out for a billing period, they must be bought with paid-for credits, or wait until the next billing period. See this also<https://www.jeffgeerling.com/blog/2020/travis-cis-new-pricing-plan-threw-wrench-my-open-source-works>. According to @angsch<https://github.com/angsch>, the dependency on linked libraries makes it a necessity to build the toolchain in an environment that is compatible with the CI machines. So you need to build on a fresh Ubuntu Focal machine / Docker container. — Reply to this email directly, view it on GitHub<#737 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIAZIM57GKN75CBKFPUUALW7RX3HANCNFSM6AAAAAAWMJJWJA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Aaron-Hutchinson · 2023-04-05T18:54:00Z

@Aaron-Hutchinson:

GitHub has a 100 MB limit on tracked files before it requires paid service.

Instead of files stored in the distribution, we would need to use released binaries, which have a 2 GB limit. That is the same 2 GB limit for Git Large File Storage in GitHub.

Travis has quotas on how much CPU, memory and disk space can be used. Once the credits run out for a billing period, they must be bought with paid-for credits, or wait until the next billing period. See this also.

According to @angsch, the dependency on linked libraries makes it a necessity to build the toolchain in an environment that is compatible with the CI machines. So you need to build on a fresh Ubuntu Focal machine / Docker container.

I'm proposing that we do not track any toolchain/QEMU related files on GitHub, and just use build caching for them. It looks like Travis has built-in functionality for exactly this kind of purpose. See here and here. This line from the first link is particularly relevant:

Caches lets Travis CI store directories between builds, which is useful for storing dependencies that take longer to compile or download.

fgvanzee · 2023-04-10T20:11:36Z

@Aaron-Hutchinson Caching sounds fine to me. I read the links you provided, but I'm still not 100% certain how we would employ caching in this context. (Travis could use a few more examples in their documentation!)

Aaron-Hutchinson · 2023-04-10T21:05:47Z

@Aaron-Hutchinson Caching sounds fine to me. I read the links you provided, but I'm still not 100% certain how we would employ caching in this context. (Travis could use a few more examples in their documentation!)

I agree that Travis' documentation is not very thorough. I've read a little bit about this feature and it's something I'd like to try pursuing.

Does anyone know if there is a local version of Travis CI I can use on my own machine to test the results of changes to the .travis.yaml file? The answers I've found from searching around are greatly out of date.

devinamatthews · 2023-04-10T21:19:52Z

I believe there is a local version using Docker. At least there was a few years ago.

Aaron-Hutchinson · 2023-04-10T22:55:21Z

I haven't been able to find any official documentation on a local version, and unofficial discussions I've come across are a few years old and don't appear to work any more. It looks like they may have made this an Enterprise feature.

leekillough · 2023-04-11T01:16:55Z

Caching is not recommended for built toolchains (unless that document is outdated), and used to not be performed for Docker images, but seems to be now. See this and this too.

angsch · 2023-04-13T15:58:58Z

config/sifive_x280/make_defs.mk

+CPPROCFLAGS    :=
+CMISCFLAGS     := $(CMISCFLAGS_SIFIVE) -fdata-sections -ffunction-sections \
+                  -fdiagnostics-color=always -fno-rtti -fno-exceptions \
+                  -std=gnu++17


Should this read -std=gnu17? I think that gnu++17 is a C++-only option.

-std=gnu++17 should be removed completely since BLIS already adds std=c99.

Thanks. We just copied this from the generic make_defs.mk without really understanding what was required by the project. IIRC, a bunch of the warning flags are also redundant (generated somewhere else in the build system).

@Aaron-Hutchinson I think you forgot to update CMISCFLAGS when you rebased

Thanks for the reminder! I did indeed forget. This will be fixed in the upcoming commit.

leekillough · 2023-04-19T02:35:57Z

Since this configuration targets RISC-V, compiling it and running tests on typical machines is challenging. For convenience, we've written a simple script that aims to make testing this configuration easier. The script can be found here, which has the following flow:
* downloads and builds the RISC-V GNU Linux toolchain (just for the C runtime)

* downloads and builds the RISC-V LLVM Linux toolchain, integrating the C runtime from GNU

* downloads QEMU and builds the riscv64 Linux usermode emulator

* downloads BLIS, configures it for `sifive_x280`, builds it, and runs `make check`.

RISC-V General Toolchain Builder

The following script is used in-house @tactcomplabs:

build-riscv.txt (rename to build-riscv.sh).

It supports any valid RISC-V ARCH / ABI / VLEN combination (e.g., rv64imafdcv/lp64d, rv32imaf/ilp32f).
It supports GCC and Clang/LLVM (thanks to @cmuellner).
It supports QEMU and Spike/PK.
It supports specifying the branch/tag/commit of the riscv-gnu-toolchain to use (e.g., master, rvv-next, latest).
It clones from the repositories as needed, starting with a clean build each time.
It autodetects missing package dependencies on Debian-based platforms.
It creates a riscv.sh script to source to set all of the environment variables to cross-compile and run a software package with a simulator.
It unsets any environment variables set beforehand which could affect builds.
It uses color highlighting and interactive prompts if it's run on a terminal.
It profiles the time spent in each Bash function.

To use it, edit the variables at the top of the file, e.g.,

# Variables defining the RISC-V toolchain

# Build parameters
RISCV_ARCH=rv64imafdv
RISCV_ABI=lp64d
RISCV_VLEN=128

# gnu or llvm
COMPILER=gnu

# latest: The most recent tagged released in RISC-V toolchain
# rvv-next: An experimental RISC-V toolchain branch (stale?)
# master: The latest development branch
# <commitID>
RISCV_GNU_TAG=rvv-next

# qemu or spike
RISCV_SIM=qemu

and then run ./build-riscv.sh or bash ./build-riscv.txt.

To Build BLIS

After the toolchain is built, cd blis and type, e.g.,

source ~/riscv/rv64imafdv_lp64d_vlen128/riscv.sh
./configure rv64iv
make -j
make -j checkblis-fast

Build issues encountered with this PR

(The C++ options have been removed, and merge conflicts eliminated, in sifive#3 .)

Your script sets:

TESTSUITE_WRAPPER="$QEMU_PATH -cpu $QEMU_CPU -L $CLANG_CROSS_INSTALL_DIR/sysroot"

while also using:

BLIS_OPTIONS="--prefix=sifive_x280 --disable-shared"

... which seems to exclude shared libraries, while also specifying options to use them.

When using QEMU, our script sets:

export QEMU_LD_PREFIX=$RISCV/sysroot

... which allows QEMU to work with BLIS shared libraries.

When I build my toolchain with tag rvv-next and then attempt to build BLIS with sifive_x280, I get the following error:

Compiling obj/sifive_x280/kernels/sifive_x280/1/bli_addv_sifive_x280_intr/bli_addv_sifive_x280_intr.o ('sifive_x280' CFLAGS for kernels)
In file included from kernels/sifive_x280/1/bli_addv_sifive_x280_intr/bli_addv_sifive_x280_intr.c:40:
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/./bli_addv_sifive_x280_intr_real.c: In function 'bli_saddv_sifive_x280_intr':
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/../../riscv_overloaded_intrinsics.h:38:34: warning: implicit declaration of function '__riscv_vsetvl_e32m8' [-Wimplicit-function-declaration]
   38 | #define VSETVL_(PRECISION, LMUL) __riscv_vsetvl_e##PRECISION##LMUL
      |                                  ^~~~~~~~~~~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/../../riscv_overloaded_intrinsics.h:39:33: note: in expansion of macro 'VSETVL_'
   39 | #define VSETVL(PRECISION, LMUL) VSETVL_(PRECISION, LMUL)
      |                                 ^~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/./bli_addv_sifive_x280_intr_real.c:52:21: note: in expansion of macro 'VSETVL'
   52 |         size_t vl = VSETVL(PREC, LMUL)(avl);
      |                     ^~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/../../riscv_overloaded_intrinsics.h:43:37: warning: implicit declaration of function '__riscv_vle32_v_f32m8' [-Wimplicit-function-declaration]
   43 | #define VLE_V_F_(PRECISION, LMUL)   __riscv_vle##PRECISION##_v_f##PRECISION##LMUL
      |                                     ^~~~~~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/../../riscv_overloaded_intrinsics.h:44:36: note: in expansion of macro 'VLE_V_F_'
   44 | #define VLE_V_F(PRECISION, LMUL)   VLE_V_F_(PRECISION, LMUL)
      |                                    ^~~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/./bli_addv_sifive_x280_intr_real.c:56:20: note: in expansion of macro 'VLE_V_F'
   56 |             xvec = VLE_V_F(PREC, LMUL) (x, vl);
      |                    ^~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/../../riscv_overloaded_intrinsics.h:43:37: error: incompatible types when assigning to type 'vfloat32m8_t' from type 'int'
   43 | #define VLE_V_F_(PRECISION, LMUL)   __riscv_vle##PRECISION##_v_f##PRECISION##LMUL
      |                                     ^~~~~~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/../../riscv_overloaded_intrinsics.h:44:36: note: in expansion of macro 'VLE_V_F_'
   44 | #define VLE_V_F(PRECISION, LMUL)   VLE_V_F_(PRECISION, LMUL)
      |                                    ^~~~~~~~
kernels/sifive_x280/1/bli_addv_sifive_x280_intr/./bli_addv_sifive_x280_intr_real.c:56:20: note: in expansion of macro 'VLE_V_F'
   56 |             xvec = VLE_V_F(PREC, LMUL) (x, vl);
      |                    ^~~~~~~
compilation terminated due to -Wfatal-errors.
make: *** [Makefile:696: obj/sifive_x280/kernels/sifive_x280/1/bli_addv_sifive_x280_intr/bli_addv_sifive_x280_intr.o] Error 1

Is there a rvv-next riscv-gnu-toolchain configure option which needs to be specified, in order to enable the vector intrinsics?

@angsch @nick-knight @Aaron-Hutchinson @devinamatthews @fgvanzee @ct-clmsn

angsch · 2023-04-19T16:03:13Z

@Aaron-Hutchinson In order to avoid duplication, I tested the QEMU tarball that Lee and I use. I face the compilation problem with the vector intrinsics, too, so my test experimentally enabled all extension that x280 has. Based on these tests, I am confident that you can use the same QEMU tarball for your CI. The tarball lives in a sibling repo: https://github.com/flame/ci-utils/blob/master/riscv/qemu-riscv-2023.02.25-ubuntu-20.04.tar.gz.

nick-knight · 2023-04-19T18:15:31Z

Thanks for all the feedback, sorry we're slow to respond.

Regarding the RISC-V vector intrinsics issue, this name-mangling was introduced recently at the behest of the RISC-V Toolchains SIG, in riscv-non-isa/riscv-c-api-doc#31. It made its way into the vector intrinsics API, version 0.11 (multiple PRs, I won't try to list them all). That API change, in turn, appeared in LLVM 16.0.0. Unfortunately, I don't know the status with GCC. Historically, GCC has lagged LLVM w.r.t. chasing unratified/churning RISC-V specs, so I'm not surprised that LLVM works but GCC does not.

On that last point, in case it isn't clear, the RISC-V vector intrinsics API is a community project, sponsored by RISC-V International:

We are working towards v1.0 of the API but have not frozen yet. And it looks like we'll miss the GCC 13 window. The task group meets monthly; we'd love your company. If you have questions on GCC support for the latest intrinsics API changes, this is the right community to bring it up with.

leekillough · 2023-06-29T01:37:07Z

Ah yes, the RVV intrinsics API is still not frozen, we should be prepared for churn.

I am willing to do it for this PR, since I have been locally keeping it up to date .

…code. However, it does not get correct results for complex BLIS routines which use segment loads (or call those that do). The intrinsic types check out and make sense, but it returns wrong answers. It's probably something really simple. For historical reference, see: riscv-non-isa/riscv-c-api-doc#43 flame#737 (comment) https://reviews.llvm.org/D152134 riscv-non-isa/rvv-intrinsic-doc#139 riscv-non-isa/rvv-intrinsic-doc#198 https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/auto-generated/intrinsic_funcs/03_vector_stride_segment_load_store_instructions_zvlsseg.md https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/auto-generated/intrinsic_funcs/04_vector_indexed_segment_load_store_instructions_zvlsseg.md

Added whitespace and other formatting fixes

Restore changes from sifive-blis-private#28

Aaron-Hutchinson · 2023-10-12T17:51:02Z

Our team would like to get this PR merged soon. We have some updates coming in shortly with minor changes, such as resolving the merge conflicts and updating the RISC-V intrinsics.

What is the best way forward regarding the CI issue? From what I can tell from the comments above this is still unresolved.

angsch · 2023-10-12T19:40:08Z

What is the best way forward regarding the CI issue? From what I can tell from the comments above this is still unresolved.

When you have updated the PR, I am happy to test locally if you can reuse the binaries that are used in the current CI pipeline. I am optimistic that the CI suggestions from above still work.

* Updated RISC-V intrinsics to match LLVM 17.0.2

Aaron-Hutchinson · 2023-10-17T05:47:05Z

All of the developmental changes we planned to make are now merged into add_sifive_x280, and the RISC-V intrinsic updates and merge conflicts have been addressed. I believe our team is happy with the state of the branch.

@angsch If you're able and willing to run the CI tests locally, I think the branch should be in a stable place to do so now. Thank you!

angsch · 2023-10-19T14:06:42Z

The following should work. I think it makes sense to use the same compiler version for all RISC-V targets, so the compiler version is bumped below for the already existing targets.

diff --git a/.travis.yml b/.travis.yml
index 848cb184..bdfafb6b 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -86,6 +86,11 @@ matrix:
     env: OOT=0 TEST=FAST SDE=0 THR="none" BLD="--disable-shared" CONF="rv32iv" \
       CC=riscv32-unknown-linux-gnu-gcc \
       LDFLAGS=-static
+  - os: linux
+    compiler: clang
+    env: OOT=0 TEST=FAST SDE=0 THR="none" BLD="--disable-shared" CONF="sifive_x280" \
+      CC=clang \
+      LDFLAGS=-static
 install:
 - if [ "$CC" = "gcc"  ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then export CC="gcc-9"; fi
 - if [ -n "$PACKAGES" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo apt-get install -y $PACKAGES; fi
@@ -106,6 +111,12 @@ script:
     export CXX=$DIST_PATH/../toolchain/riscv/bin/riscv32-unknown-linux-gnu-g++;
     export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv32 -cpu rv32,vext_spec=v1.0,v=true,vlen=128 -B 0x100000";
   fi
+- if [ "$CONF" = "sifive_x280" ]; then
+    $DIST_PATH/travis/do_riscv.sh "$CONF";
+    export CC=$DIST_PATH/../toolchain/riscv/bin/clang;
+    export CXX=$DIST_PATH/../toolchain/riscv/bin/clang++;
+    export TESTSUITE_WRAPPER="$DIST_PATH/../toolchain/qemu-riscv64 -cpu rv64,vext_spec=v1.0,v=true,vlen=512 -B 0x100000";
+  fi
 - $DIST_PATH/configure -p `pwd`/../install -t $THR $BLD CC=$CC $CONF
 - pwd
 - ls -l
diff --git a/travis/do_riscv.sh b/travis/do_riscv.sh
index a51d3306..9a114b0e 100755
--- a/travis/do_riscv.sh
+++ b/travis/do_riscv.sh
@@ -3,18 +3,21 @@
 set -e
 set -x
 
-TAG=2023.02.25
+TAG=2023.10.18
 
 # The prebuilt toolchains only support hardfloat, so we only
 # test these for now.
 case $1 in
 	"rv32iv")
-	TARBALL=riscv32-glibc-ubuntu-20.04-nightly-${TAG}-nightly.tar.gz
+	TARBALL=riscv32-glibc-ubuntu-20.04-gcc-nightly-${TAG}-nightly.tar.gz
 	;;
 	"rv64iv")
-	TARBALL=riscv64-glibc-ubuntu-20.04-nightly-${TAG}-nightly.tar.gz
+	TARBALL=riscv64-glibc-ubuntu-20.04-gcc-nightly-${TAG}-nightly.tar.gz
 	;;
+	"sifive_x280")
+	TARBALL=riscv64-glibc-ubuntu-20.04-llvm-nightly-${TAG}-nightly.tar.gz
 	*)
+	;;
 	exit 1
 	;;
 esac

I zipped the patch due to Github's constraints of what can be attached.
0001-Add-sifive_x280-to-CI.zip

Aaron-Hutchinson · 2023-10-19T22:53:35Z

Thanks @angsch. I've opened a PR here to apply the CI patch and update the make_defs.mk.

Aaron-Hutchinson · 2023-10-31T22:36:34Z

@angsch Looks like CI has failed after applying the patch due to not being able to find the compiler:

configure: user specified a C compiler via CC (./../toolchain/riscv/bin/riscv64-unknown-linux-gnu-gcc).
configure: *** Could not find the C compiler specified via CC ('./../toolchain/riscv/bin/riscv64-unknown-linux-gnu-gcc').
configure: *** A working C compiler is required. Please set CC
configure: *** to a C compiler that exists (or unset CC).
The command "$DIST_PATH/configure -p `pwd`/../install -t $THR $BLD CC=$CC $CONF" exited with 1.

Any idea what went wrong?

alexsifivetw · 2023-11-01T03:43:26Z

travis/do_riscv.sh

 	;;
+	"sifive_x280")
+	TARBALL=riscv64-glibc-ubuntu-20.04-llvm-nightly-${TAG}-nightly.tar.gz


We already have a QEMU in this tarball file. Is it necessary to get another one using the following commands?

# Once CI upgrades to jammy, the next three lines can be removed. # The qemu version installed via packages (qemu-user qemu-user-binfmt) # is sufficient. TARBALL_QEMU=qemu-riscv-2023.02.25-ubuntu-20.04.tar.gz wget https://github.com/flame/ci-utils/raw/master/riscv/${TARBALL_QEMU} tar -xf $TARBALL_QEMU

We just need to update TARBALL to riscv64-glibc-ubuntu-{JAMMY_VER}-gcc-nightly-${TAG}-nightly.tar.gz if the CI is upgraded.

Good point, I didn't notice that now both the LLVM and the GNU toolchain include qemu.

alexsifivetw · 2023-11-01T03:59:11Z

@angsch Looks like CI has failed after applying the patch due to not being able to find the compiler:

configure: user specified a C compiler via CC (./../toolchain/riscv/bin/riscv64-unknown-linux-gnu-gcc).
configure: *** Could not find the C compiler specified via CC ('./../toolchain/riscv/bin/riscv64-unknown-linux-gnu-gcc').
configure: *** A working C compiler is required. Please set CC
configure: *** to a C compiler that exists (or unset CC).
The command "$DIST_PATH/configure -p `pwd`/../install -t $THR $BLD CC=$CC $CONF" exited with 1.

Any idea what went wrong?

Does soft link work?

ln -s -f /your/path/to/clang /usr/bin/clang
CC=clang

Could you try using CC environment variable with absolutely path?

angsch · 2023-11-01T12:49:07Z

I think that a syntax error before that introduces the problem. My mistake, sorry.
Can we try do_riscv.shwith

+	"sifive_x280")
+	TARBALL=riscv64-glibc-ubuntu-20.04-llvm-nightly-${TAG}-nightly.tar.gz
+	;;
 	 *)
 	exit 1
 	;;
 esac

(The ;; and *) before exit are flipped)

In the meanwhile, I will try the qemu builds shipped with the toolchain.

Aaron-Hutchinson · 2023-11-01T17:30:13Z

Thanks for the correction @angsch. Looks like with that fix the PR has passed the CI.

fgvanzee · 2023-11-02T22:55:31Z

Thank you everyone for your contributions and engagement on this PR!

Does anyone else have any comments before I merge? 🚀

Details: - Added a new 'sifive_x280' subconfiguration for SiFive's x280 RISC-V instruction set architecture. The subconfig registers kernels from a correspondingly new kernel set, also named 'sifive_x280'. - Added the aforementioned kernel set, which includes intrinsics- and assembly-based implementations of most level-1v kernels along with level-1f kernels axpy2v dotaxpyv, packm kernels, and level-3 gemm, gemmtrsm_l, and gemmtrsm_u microkernels (plus supporting files). - Registered the 'sifive_x280' subconfig as belonging to a singleton family by the same name. - Added an entry to '.travis.yml' to test the new subconfig via qemu. - Updates to 'travis/do_riscv.sh' script to support the 'sifive_x280' subconfig and to reflect updated tarball names. - Special thanks to Lee Killough, Devin Matthews, and Angelika Schwarz for their engagement on this commit. - (cherry picked from commit 05388dd) Fixed HPX barrier synchronization (#783) Details: - Fixed hpx barrier synchronization. HPX was hanging on larger cores because blis was using non-hpx synchronization primitives. But when using hpx-runtime only hpx-synchronization primitives should be used. Hence, a C style wrapper hpx_barrier_t is introduced to perform hpx barrier operations. - Replaced hpx::for_loop with hpx::futures. Using hpx::for_loop with hpx::barrier on n_threads greater than actual hardware thread count causes synchronization issues making hpx hanging. This can be avoided by using hpx::futures, which are relatively very lightweight, robust and scalable. - (cherry picked from 7a87e57) Fixed bug in sup threshold registration. (#782) Details: - Fixed a bug that resulted in BLIS non-deterministically calling the gemmsup handler, irrespective of the thresholds that are registered via bli_cntx_set_blkszs(). - Deep dive: In bli_cntx_init_ref.c, the default values for the gemmsup thresholds (BLIS_[MNK]T blocksizes) wre being set to zero so that no operation ever matched the criteria for gemmsup (unless specific sup thresholds are registered). HOWEVER, these thresholds are set via bli_cntx_set_blkszs() which calls bli_blksz_copy_if_pos(), which was only coping the thresholds into the gks' cntx_t if the values were strictly positive. Thus, the zero values passed into bli_cntx_set_blkszs() were being ignored and those threshold slots within the gks were left uninitialized. The upshot of this is that the reference gemmsup handler was being called for gemm problems essentially at random (and as it turns out, very rarely the reference gemmsup implementation would encounter a divide-by-zero error). - The problem was fixed by changing bli_blksz_copy_if_pos() so that it copies values that are non-negative (values >= 0 instead of > 0). The function was also renamed to bli_blksz_copy_if_nonneg() - Also needed to standardize use of -1 as the sole value to embed into blksz_t structs as a signal to bli_cntx_set_blkszs() to *not* register a value for that slot (and instead let whatever existing values remain). This required updates to the bli_cntx_init_*() functions for bgq, cortexa9, knc, penryn, power7, and template subconfigs, as some of these codes were using 0 instead of -1. - Fixes #781. Thanks to Devin Matthews for identifying, diagnosing, and proposing a fix for this issue. - (cherry picked from 8fff1e3) Update zen3 subconfig to support NVHPC compilers. (#779) Details: - Parse $(CC_VENDOR) values of "nvc" in 'zen3' make_defs.mk file. - Minor refactor to accommodate above edit. - CREDITS file update. - (cherry picked from 1e264a4) Fixed brokenness when sba is disabled. (#777) Details: - Previously, disabling the sba via --disable-sba-pools resulted in a segfault due to a sanity-check-triggering abort(). The problem was that the sba, as currently used in the l3 thread decorators, did not yet (fully) support pools being disabled. The solution entailed creating wrapper function, bli_sba_array_elem(), which either calls bli_apool_array_elem() (when sba pools are enabled at configure time) or returns a NULL sba_pool pointer (when sba pools are disabled), and calling bli_sba_array_elem() in place of bli_apool_array_elem(). Note that the NULL pointer returned by bli_sba_array_elem() when the sba pools are disabled does no harm since in that situation the pointer goes unreferenced when acquiring and releasing small blocks. Thanks to John Mather for reporting this bug. - Guarded the bodies of bli_sba_init() and bli_sba_finalize() with #ifdef BLIS_ENABLE_SBA_POOLS. I don't think this was actually necessary to fix the aforementioned bug, but it seems like good practice. - Moved the code in bli_l3_thrinfo_create() that checked that the array* pointer is non-NULL before calling bli_sba_array_elem() (previously bli_apool_array_elem()) into the definition of bli_sba_array_elem(). - Renamed various instances of 'pool' variables and function parameters to 'sba_pool' to emphasize what kind of pool it represents. - Whitespace changes. - (cherry picked from c2099ed) Implemented [cz]symv_(), [cz]syr_(), [cz]rot_(). (#778) Details: - Expanded existing BLAS compatibility APIs to provide interfaces to [cz]symv_(), [cz]syr_(). This was easy since those operations were already implemented natively in BLIS; the APIs were previously omitted only because they were not formally part of the BLAS. - Implemented [cz]rot_() by feeding code from LAPACK 3.11 through f2c. - Thanks to James Foster for pointing out that LAPACK contains these additional symbols, which prompted these additions, as well as for testing the [cz]rot_() functions from Julia's test infrastructure. - CREDITS file update. - (cherry picked from 37ca4fd) Fixes to HPC runtime code path. (#773) Details: - Fixed hpx::for_each invocation and replace with hpx::for_loop. The HPX runtime was initialized using hpx::start, but the hpx::for_each function was being called on a non-hpx runtime (i.e standard BLIS runtime - single main thread). To run hpx::for_each on HPX runtime correctly, the code now uses hpx::run_as_hpx_thread(func, args...). - Replaced hpx::for_each with hpx::for_loop, which eliminates use of hpx::util::counting_iterator. - Employ hpx::execution::chunk_size(1) to make sure that a thread resides on a particular core. - Replaced hpx::apply() with updated version hpx::post(). - Initialize tdata->id = 0 in libblis.c to 0, as it is the main thread and is needed for writing results to output file. - By default, if not specified, the HPX runtime uses all N threads/cores available in the system. But, if we want to only specify n_threads out N threads, we use hpx::execution::experimental::num_cores(n_threads). - (cherry picked from a4a6329) Fixed broken link in Multithreading.md. (#774) Details: - Replaced 404'd link in docs/Multithreading.md with an archive from The Wayback Machine. - CREDITS file update. - (cherry picked from c6546c1)

leekillough mentioned this pull request Apr 12, 2023

Merge sifive_x280 with RISC-V base architecture support changes sifive/sifive-blis#3

Closed

angsch reviewed Apr 13, 2023

View reviewed changes

nick-knight mentioned this pull request Jul 27, 2023

I get "undefined reference to `__tls_get_addr'" when linking using the gnu toolchain #760

Open

Aaron-Hutchinson and others added 5 commits October 12, 2023 09:36

Initial commit for sifive_x280 configuration.

025c7e6

Added whitespace and other formatting fixes

3a105de

Restore changes from sifive-blis-private#28

a0ec386

Merge pull request #1 from sifive/pr/ahutchinson/whitespace_fixes

efd65d3

Added whitespace and other formatting fixes

Merge pull request #2 from sifive/myeh/invscalv_scrub2

8663e95

Restore changes from sifive-blis-private#28

Aaron-Hutchinson force-pushed the add_sifive_x280 branch from acd68f9 to 8663e95 Compare October 13, 2023 20:34

Aaron-Hutchinson added 2 commits October 13, 2023 19:39

Updated RISC-V intrinsics to match LLVM 17.0.2 (#5)

e0f6f4f

* Updated RISC-V intrinsics to match LLVM 17.0.2

Updated cache tile sizes and added packing kernels (#6)

9009528

Aaron-Hutchinson mentioned this pull request Oct 19, 2023

Updated CI settings for sifive_x280 sifive/sifive-blis#7

Merged

Updated CI settings for sifive_x280 (#7)

61784f1

alexsifivetw reviewed Nov 1, 2023

View reviewed changes

Fixed type in do_riscv.sh cases (#8)

752b942

Minor reorder of 'sifive_x280' kernel set.

5e2a914

nick-knight approved these changes Nov 2, 2023

View reviewed changes

CREDITS file update.

2d26d99

fgvanzee merged commit 05388dd into flame:master Nov 3, 2023
1 of 2 checks passed

Add sifive_x280 configuration #737

Add sifive_x280 configuration #737

Conversation

Aaron-Hutchinson commented Mar 29, 2023

devinamatthews commented Mar 29, 2023

nick-knight commented Mar 29, 2023

devinamatthews commented Mar 30, 2023

angsch commented Mar 30, 2023

leekillough commented Mar 30, 2023

Aaron-Hutchinson commented Mar 31, 2023

devinamatthews commented Mar 31, 2023

leekillough commented Mar 31, 2023

devinamatthews commented Apr 1, 2023

leekillough commented Apr 1, 2023

Aaron-Hutchinson commented Apr 3, 2023

angsch commented Apr 4, 2023

angsch commented Apr 4, 2023

Aaron-Hutchinson commented Apr 4, 2023

angsch commented Apr 4, 2023

Aaron-Hutchinson commented Apr 4, 2023

devinamatthews commented Apr 4, 2023

leekillough commented Apr 4, 2023

devinamatthews commented Apr 4, 2023 via email

Aaron-Hutchinson commented Apr 5, 2023

fgvanzee commented Apr 10, 2023

Aaron-Hutchinson commented Apr 10, 2023

devinamatthews commented Apr 10, 2023

Aaron-Hutchinson commented Apr 10, 2023

leekillough commented Apr 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leekillough commented Apr 19, 2023

RISC-V General Toolchain Builder

To Build BLIS

Build issues encountered with this PR

angsch commented Apr 19, 2023

nick-knight commented Apr 19, 2023 • edited Loading

leekillough commented Jun 29, 2023

Aaron-Hutchinson commented Oct 12, 2023

angsch commented Oct 12, 2023

Aaron-Hutchinson commented Oct 17, 2023

angsch commented Oct 19, 2023

Aaron-Hutchinson commented Oct 19, 2023

Aaron-Hutchinson commented Oct 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexsifivetw commented Nov 1, 2023

angsch commented Nov 1, 2023

Aaron-Hutchinson commented Nov 1, 2023

fgvanzee commented Nov 2, 2023 • edited Loading

nick-knight commented Apr 19, 2023 •

edited

Loading

fgvanzee commented Nov 2, 2023 •

edited

Loading