You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If this is a bug report, please use the following template.
Otherwise, please delete the rest of the template.
Where does this bug appear?
Check all that apply:
MacOS
Linux
Cray
GCC
Clang
Intel compiler
MPICH and derivatives (MVAPICH2, Intel MPI, Cray MPI, etc.)
Open-MPI
Operating system
What is the output of uname -a?
Linux l0 4.18.0-17-generic #18~18.04.1-Ubuntu SMP Fri Mar 15 15:27:12 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Compiler
gcc
What is the output of ${COMPILER} -v or ${COMPILER} --version?
gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
PRK build information
Please attach or inline make.defs.
#
# This file shows the GCC toolchain options for PRKs using
# OpenMP, MPI and/or Fortran coarrays only.
#
# Base compilers and language options
#
VERSION=-7
# C99 is required in some implementations.
CC=gcc${VERSION} -std=c11 -pthread
#EXTRA_CLIBS=-lrt
# All of the Fortran code is written for the 2008 standard and requires preprocessing.
FC=gfortran${VERSION} -std=f2008 -cpp
# C++11 may not be required but does no harm here.
CXX=g++${VERSION} -std=gnu++17 -pthread
#
# Compiler flags
#
# -mtune=native is appropriate for most cases.
# -march=native is appropriate if you want portable binaries.
DEFAULT_OPT_FLAGS=-O3 -mtune=native -ffast-math
#DEFAULT_OPT_FLAGS=-O0
DEFAULT_OPT_FLAGS+=-g3
#DEFAULT_OPT_FLAGS+=-fsanitize=undefined
#DEFAULT_OPT_FLAGS+=-fsanitize=undefined,leak
#DEFAULT_OPT_FLAGS+=-fsanitize=address
#DEFAULT_OPT_FLAGS+=-fsanitize=thread
# If you are compiling for KNL on a Xeon login node, use the following:
# DEFAULT_OPT_FLAGS=-g -O3 -march=knl
# See https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html for details.
#
#DEFAULT_OPT_FLAGS+=-fopt-info-vec-missed
DEFAULT_OPT_FLAGS+=-Wall #-Werror
DEFAULT_OPT_FLAGS+=-Wno-ignored-attributes -Wno-deprecated-declarations
#DEFAULT_OPT_FLAGS+=-mavx -mfma
#
# OpenMP flags
#
OPENMPFLAG=-fopenmp
OPENMPSIMDFLAG=-fopenmp-simd
OFFLOADFLAG=-foffload="-O3 -v"
ORNLACCFLAG=-fopenacc
#
# OpenCL flags
#
# MacOS
OPENCLFLAG=-framework OpenCL
# Linux
#OPENCLDIR=/etc/alternatives/opencl-intel-tools
#OPENCLFLAG=-I${OPENCLDIR} -L${OPENCLDIR}/lib64 -lOpenCL
OPENCLFLAG+=-Wno-ignored-attributes -Wno-deprecated-declarations
METALFLAG=-framework MetalPerformanceShaders
#
# SYCL flags
#
# triSYCL
# https://github.com/triSYCL/triSYCL is header-only so just clone in Cxx11 directory...
SYCLDIR=./triSYCL
SYCLCXX=${CXX} -std=c++17 ${OPENMPFLAG}
SYCLFLAG=-I$(SYCLDIR)/include
# ProGTX
# https://github.com/ProGTX/sycl-gtx
#SYCLDIR=${HOME}/Work/OpenCL/sycl-gtx
#SYCLCXX=${CXX} ${OPENMPFLAG}
#SYCLFLAG=-DUSE_SYCL -I${SYCLDIR}/sycl-gtx/include -L${SYCLDIR}/build/sycl-gtx -lsycl-gtx ${OPENCLFLAG}
METALFLAG=-framework MetalPerformanceShaders
#
# OCCA
#
#OCCADIR=${HOME}/prk-repo/Cxx11/occa
#
# Cilk
#
#CILKFLAG=-fcilkplus
#
# TBB
#
TBBDIR=/usr/local/Cellar/tbb/2019_U5_1
TBBFLAG=-I${TBBDIR}/include -L${TBBDIR}/lib -ltbb
#
# Parallel STL, Boost, etc.
#
BOOSTFLAG=-I/usr/local/Cellar/boost/1.69.0_2/include
RANGEFLAG=-DUSE_BOOST_IRANGE ${BOOSTFLAG}
#RANGEFLAG=-DUSE_RANGES_TS -I./range-v3/include
PSTLFLAG=${OPENMPSIMDFLAG} ${TBBFLAG} ${RANGEFLAG}
#PSTLFLAG=${OPENMPSIMDFLAG} ${TBBFLAG} -DUSE_INTEL_PSTL -I./pstl/include ${RANGEFLAG}
KOKKOSDIR=/opt/kokkos/gcc
KOKKOSFLAG=-I${KOKKOSDIR}/include -L${KOKKOSDIR}/lib -lkokkos ${OPENMPFLAG}
RAJADIR=/opt/raja/gcc
RAJAFLAG=-I${RAJADIR}/include -L${RAJADIR}/lib -lRAJA ${OPENMPFLAG} ${TBBFLAG}
THRUSTDIR=/Users/jrhammon/Work/NVIDIA/thrust
THRUSTFLAG=-I${THRUSTDIR} ${RANGEFLAG}
#
# SYCL flags
#
# triSYCL
# https://github.com/triSYCL/triSYCL is header-only so just clone in Cxx11 directory...
SYCLDIR=./triSYCL
SYCLCXX=${CXX} -O3 -Wall -std=c++17 ${OPENMPFLAG}
SYCLFLAG=-I${SYCLDIR}/include ${BOOSTFLAG} -DTRISYCL
# ProGTX
# https://github.com/ProGTX/sycl-gtx
#SYCLDIR=${HOME}/Work/OpenCL/sycl-gtx
#SYCLCXX=${CXX} ${OPENMPFLAG}
#SYCLFLAG=-I${SYCLDIR}/sycl-gtx/include -L${SYCLDIR}/build/sycl-gtx -lsycl-gtx ${OPENCLFLAG}
SYCLFLAG+=${RANGEFLAG}
#
# SYCL flags
#
# triSYCL
# https://github.com/triSYCL/triSYCL is header-only so just clone in Cxx11 directory...
SYCLDIR=./triSYCL
SYCLCXX=${CXX} -std=c++17 ${OPENMPFLAG}
SYCLFLAG=-I${SYCLDIR}/include ${BOOSTFLAG}
# ProGTX
# https://github.com/ProGTX/sycl-gtx
#SYCLDIR=${HOME}/Work/OpenCL/sycl-gtx
#SYCLCXX=${CXX} ${OPENMPFLAG}
#SYCLFLAG=-DUSE_SYCL -I${SYCLDIR}/sycl-gtx/include -L${SYCLDIR}/build/sycl-gtx -lsycl-gtx ${OPENCLFLAG}
#
# CBLAS for C++ DGEMM
#
BLASFLAG=-DACCELERATE -framework Accelerate
CBLASFLAG=-DACCELERATE -framework Accelerate -flax-vector-conversions
#
# CUDA flags
#
# Mac w/ CUDA emulation via https://github.com/hughperkins/coriander
#NVCC=/opt/llvm/cocl/bin/cocl
# Linux w/ NVIDIA CUDA
NVCC=nvcc
CUDAFLAGS=-g -O3 -std=c++11 -arch=sm_50
# https://github.com/tensorflow/tensorflow/issues/1066#issuecomment-200574233
CUDAFLAGS+=-D_MWAITXINTRIN_H_INCLUDED
#
# ISPC
#
ISPC=ispc
ISPCFLAG=-O3 --target=host --opt=fast-math
#
# MPI
#
# We assume you have installed an implementation of MPI-3 that is in your path.
MPICC=mpicc -std=c99
#
# Fortran 2008 coarrays
#
# see https://github.com/ParRes/Kernels/blob/master/FORTRAN/README.md for details
# single-node
COARRAYFLAG=-fcoarray=single -lcaf_single
# multi-node
# COARRAYFLAG=-fcoarray=lib -lcaf_mpi
MEMKINDDIR=/home/parallels/PRK/deps
MEMKINDFLAGS=-I${MEMKINDDIR}/include -L${MEMKINDDIR}/lib -lmemkind -Wl,-rpath=${MEMKINDDIR}/lib
Output showing problem
When the sparse OpenMP benchmark is run with ./sparse 12 2 11 10 the program tries to write data to memory which has not been allocated. To find this error, please comment out line 262 and 263 of sparse.c, and then outside the for loop, on line 265 add printf("nent: %llu elm: %llu \n", nent, elm+4);. This will show that the length of col_index, nent, is 171966464, and the program tries to write to 171966467. Due to the parallel nature of this program, there is a strong chance that each thread is writing outside of it the boundaries of its own array. Furthermore, if the program is compiled with icc --check-pointers=rw on linux, the solution fails to validate.
If the output is short, please inline it here.
Otherwise, please pipe it to a plain text file and attach that file.
Note that you may need to use $command 2>&1 $log to capture the error messages.
Please do not attach screenshots of your terminal.
The text was updated successfully, but these errors were encountered:
I believe the boundary indices are valid. 171966467 printed is elm+4 (elm=171966463), in the case of not commenting out 262 and 263, elm will be reassigned at line 262 and end up being 171966464 after the loop exits. Since the print statement at 265 is out side the matrix loop (249-264), elm will never be used again, so there should be no out of bound errors. (If you run it with icc where the solution validates, you will get the same output.)
But validation does fail when compiled with gcc, and I think the source of the error is at line 285. temp should be declared private as well to avoid race condition. I still need to run more cases and check if there are other problems, but I believe this is the one.
For some reason icc is immune to this bug, I couldn't replicate this issue even with -check-pointers=rw, quite curious.
What type of issue is this?
If this is a bug report, please use the following template.
Otherwise, please delete the rest of the template.
Where does this bug appear?
Check all that apply:
Operating system
What is the output of
uname -a
?Linux l0 4.18.0-17-generic #18~18.04.1-Ubuntu SMP Fri Mar 15 15:27:12 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Compiler
gcc
What is the output of
${COMPILER} -v
or${COMPILER} --version
?gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
PRK build information
Please attach or inline
make.defs
.Output showing problem
When the sparse OpenMP benchmark is run with
./sparse 12 2 11 10
the program tries to write data to memory which has not been allocated. To find this error, please comment out line 262 and 263 of sparse.c, and then outside the for loop, on line 265 addprintf("nent: %llu elm: %llu \n", nent, elm+4);
. This will show that the length of col_index, nent, is 171966464, and the program tries to write to 171966467. Due to the parallel nature of this program, there is a strong chance that each thread is writing outside of it the boundaries of its own array. Furthermore, if the program is compiled withicc --check-pointers=rw
on linux, the solution fails to validate.If the output is short, please inline it here.
Otherwise, please pipe it to a plain text file and attach that file.
Note that you may need to use
$command 2>&1 $log
to capture the error messages.Please do not attach screenshots of your terminal.
The text was updated successfully, but these errors were encountered: