Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed regression check for 3.9.1 on macos sequoia #469

Open
chenrui333 opened this issue Sep 16, 2024 · 7 comments
Open

failed regression check for 3.9.1 on macos sequoia #469

chenrui333 opened this issue Sep 16, 2024 · 7 comments

Comments

@chenrui333
Copy link

Trying to regression build 3.9.1 for macos sequoia, but ran into the following test failure

==> /opt/homebrew/bin/gfortran -o test /opt/homebrew/Cellar/arpack/3.9.1/share/arpack/dnsimp.f /opt/homebrew/Cellar/arpack/3.9.1/share/arpack/mmio.f -L/opt/homebrew/Cellar/arpack/3.9.1/lib -larpack -L/opt/homebrew/opt/openblas/lib -lopenblas
  ==> ./test
  Error: arpack: failed
  Error: arpack: failed
  An exception occurred within a child process:
    Minitest::Assertion: Expected /reached/ to match " ** On entry to DLASCL parameter number  4 had an illegal value\n ** On entry to DLASCL parameter number  4 had an illegal value\n  \n  Error with _naupd, info =        -9999\n  Check the documentation of _naupd\n  \n".
  /opt/homebrew/Library/Homebrew/vendor/bundle/ruby/3.3.0/gems/minitest-5.25.1/lib/minitest/assertions.rb:1[76](https://github.com/Homebrew/homebrew-core/actions/runs/10792126279/job/29932784189#step:5:77):in `assert'

full build log in here, https://github.com/Homebrew/homebrew-core/actions/runs/10792126279/job/29932784189

@fghoussen
Copy link
Collaborator

Not sure if I could help.
What are the options of the configure? Does your lapack switched to lp64 or ilp64? Did your env changed? Can't tell!

@cho-m
Copy link

cho-m commented Sep 30, 2024

In Homebrew, we only see this behavior on macOS 15 Sequoia.

Can also confirm that installing the binaries built on macOS 14 Sonoma hits the same On entry to DLASCL parameter number 4 had an illegal value when run on macOS 15.

Tests run on ARM64 / Apple Silicon, but same behavior when testing x86_64 binaries via Rosetta.

We are running the test from https://github.com/opencollab/arpack-ng/blob/master/TESTS/dnsimp.f and it looks like we are checking for reached (I guess the maximum iterations?)


For some of your questions:

What are the options of the configure?

https://github.com/Homebrew/homebrew-core/blob/master/Formula/a/arpack.rb#L30-L36, so without variables it becomes:

      --disable-dependency-tracking
      --prefix=/opt/homebrew/Cellar/arpack/3.9.1/libexec
      --with-blas="-L/opt/homebrew/opt/openblas/lib -lopenblas"
      F77=mpif77
      --enable-mpi
      --enable-icb
      --enable-eigen

Does your lapack switched to lp64 or ilp64?

Should still be the same one included with 32-bit integer OpenBLAS

Did your env changed?

The main difference is macOS major version. Build variables, non-Apple libraries, etc should be similar across runners.

@fghoussen
Copy link
Collaborator

Should still be the same one included with 32-bit integer OpenBLAS

arpack depends on BLAS / LAPACK implementations (netlib, mkl, ...): you may try with an older version of OpenBLAS. In case, the problem disappear, it may be due to the version of OpenBLAS you use.

@cho-m
Copy link

cho-m commented Sep 30, 2024

I did try some non-OpenBLAS implementations and they all hit some error so I would guess it isn't OpenBLAS-specific:

  • Accelerate linked using LIBS=-framework Accelerate
    BLAS                :
    LAPACK              :
    EIGEN               : -I/opt/homebrew/Cellar/eigen/3.4.0_1/include/eigen3
    LIBS                : -framework Accelerate
    LDADD               :
    
    ** On entry to DLASCL, parameter number  4 had an illegal value
    ** On entry to DLASCL, parameter number  4 had an illegal value
    
      Error with _naupd, info =        -9999
      Check the documentation of _naupd
    
  • NETLIB reference
    BLAS                : -L/opt/homebrew/opt/lapack/lib -lblas
    LAPACK              : -L/opt/homebrew/opt/lapack/lib -llapack
    EIGEN               : -I/opt/homebrew/Cellar/eigen/3.4.0_1/include/eigen3
    LIBS                :
    LDADD               :
    
     ** On entry to DLASCL parameter number  4 had an illegal value
    Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG
    

@fghoussen
Copy link
Collaborator

Do you export these variables?

export FC=mpif90 # Uses gfortran.
export FFLAGS="-ff2c -fno-second-underscore"
export CC=mpicc # Uses clang.
export CFLAGS="-Qunused-arguments"
export CXX=mpic++ # Uses clang++.
export CXXFLAGS="-Qunused-arguments"
cmake -DBLA_VENDOR=Apple -DEXAMPLES=ON -DICB=ON -DEIGEN=ON -DMPI=ON ..

@cho-m
Copy link

cho-m commented Oct 1, 2024

One thing I found was running ctest in build directory passes on what I think is the same test.

dnsimp_tst
test 1
      Start  1: dnsimp_tst

1: Test command: /tmp/arpack-20240930-10914-5rwi80/arpack-ng-3.9.1/build/TESTS/dnsimp
1: Working Directory: /tmp/arpack-20240930-10914-5rwi80/arpack-ng-3.9.1/build
1: Test timeout computed to be: 10000000
1: 
1:  Ritz values (Real, Imag) and residual residuals
1:  -----------------------------------------------
1:                Col   1       Col   2       Col   3
1:   Row   1:   -1.96023D+00   2.40614D-01   5.62865D-15
1:   Row   2:   -1.96023D+00  -2.40614D-01   5.62865D-15
1:   Row   3:   -1.28819D+00   1.49056D+00   5.23259D-15
1:   Row   4:   -1.28819D+00  -1.49056D+00   5.23259D-15
1:   Row   5:   -1.66676D+00   0.00000D+00   6.47624D-15
1:   Row   6:   -1.38893D+00   8.11056D-01   4.79491D-15
1:   Row   7:   -1.38893D+00  -8.11056D-01   4.79491D-15
1:   
1:   
1:   Maximum number of iterations reached.
1:   
1:   
1:   _NSIMP 
1:   ====== 
1:   
1:   Size of the matrix is         2500
1:   The number of Ritz values requested is           11
1:   The number of Arnoldi vectors generated (NCV) is           20
1:   What portion of the spectrum: SR
1:   The number of converged Ritz values is            7
1:   The number of Implicit Arnoldi update iterations taken is           31
1:   The number of OP*x is          180
1:   The convergence criterion is    1.1102230246251565E-016
1:   
 1/13 Test  #1: dnsimp_tst .......................   Passed    0.37 sec

Comparing compilation commands for test, it looks like difference is optimization.

In Homebrew, we build the test without optimization flags. I have now confirmed that:

  • -O0 and -O1 fail
  • -O2 and -O3 pass
gfortran -O0 -o test dnsimp.f mmio.f -L/opt/homebrew/opt/arpack/lib -larpack -L/opt/homebrew/opt/openblas/lib -lopenblas./test
 ** On entry to DLASCL parameter number  4 had an illegal value
 ** On entry to DLASCL parameter number  4 had an illegal value

  Error with _naupd, info =        -9999
  Check the documentation of _naupdgfortran -O1 -o test dnsimp.f mmio.f -L/opt/homebrew/opt/arpack/lib -larpack -L/opt/homebrew/opt/openblas/lib -lopenblas./test
 ** On entry to DLASCL parameter number  4 had an illegal value
 ** On entry to DLASCL parameter number  4 had an illegal value

  Error with _naupd, info =        -9999
  Check the documentation of _naupdgfortran -O2 -o test dnsimp.f mmio.f -L/opt/homebrew/opt/arpack/lib -larpack -L/opt/homebrew/opt/openblas/lib -lopenblas./test

 _naupd: Number of update iterations taken
 -----------------------------------------
    1 -    1:    31


 _naupd: Number of wanted "converged" Ritz values
 ------------------------------------------------
    1 -    1:     7
...

@fghoussen
Copy link
Collaborator

In -O0 does -Wall -Werror give some hints? Does adding -ffpe-trap=zero,overflow,underflow,invalid -fcheck=all give more info? You may hit overflow here

d(j,3) = d(j,3) / abs(d(j,1))
or
d(j,3) = d(j,3) / dlapy2(d(j,1),d(j,2))
(potential divide by zero ?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants