We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output of OSU bandwidth test for 2 processes within the same numa node.
numa
[satishk@tcn643 ~]$ export UCX_TLS=self,sm,rc [satishk@tcn643 ~]$ module list Currently Loaded Modules: 1) 2024 5) GCC/13.3.0 9) libpciaccess/0.18.1-GCCcore-13.3.0 13) UCX/1.16.0-GCCcore-13.3.0 17) UCC/1.3.0-GCCcore-13.3.0 2) GCCcore/13.3.0 6) numactl/2.0.18-GCCcore-13.3.0 10) hwloc/2.10.0-GCCcore-13.3.0 14) libfabric/1.21.0-GCCcore-13.3.0 18) OpenMPI/5.0.3-GCC-13.3.0 3) zlib/1.3.1-GCCcore-13.3.0 7) XZ/5.4.5-GCCcore-13.3.0 11) OpenSSL/3 15) PMIx/5.0.2-GCCcore-13.3.0 19) gompi/2024a 4) binutils/2.42-GCCcore-13.3.0 8) libxml2/2.12.7-GCCcore-13.3.0 12) libevent/2.1.12-GCCcore-13.3.0 16) PRRTE/3.0.5-GCCcore-13.3.0 20) OSU-Micro-Benchmarks/7.4-gompi-2024a [satishk@tcn643 ~]$ mpirun -np 2 osu_bw # OSU MPI Bandwidth Test v7.4 # Datatype: MPI_CHAR. # Size Bandwidth (MB/s) 1 14.83 2 28.95 4 59.76 8 117.89 16 233.75 32 473.54 64 959.85 128 1345.67 256 2256.81 512 4174.19 1024 7200.82 2048 9516.23 4096 7852.14 8192 16860.11 16384 20182.35 32768 22114.76 65536 23987.90 131072 25063.84 262144 25373.04 524288 25517.19 1048576 25532.16 2097152 25554.54 4194304 25544.87 [satishk@tcn643 ~]$
Output of OSU test for 2 processes (1 process per node) on 2 different nodes connected via Infiniband:
[satishk@tcn643 ~]$ mpirun --report-bindings -np 2 -npernode 1 osu_bw [tcn643.local.snellius.surf.nl:434708] Rank 0 bound to package[0][core:0] [tcn645.local.snellius.surf.nl:1363413] Rank 1 bound to package[0][core:0] # OSU MPI Bandwidth Test v7.4 # Datatype: MPI_CHAR. # Size Bandwidth (MB/s) 1 4.15 2 8.31 4 16.51 8 32.63 16 65.81 32 131.58 64 252.30 128 496.55 256 919.54 512 1758.46 1024 3038.37 2048 4922.13 4096 8093.16 8192 11333.75 16384 20285.03 32768 23012.06 65536 25481.33 131072 26875.02 262144 27416.41 524288 27707.10 1048576 27845.91 2097152 27940.24 4194304 27963.20 [satishk@tcn643 ~]$
mpirun --report-bindings -np 2 -npernode 1 osu_bw
ucx_info -v
[satishk@tcn643 ~]$ ucx_info -v # Library version: 1.16.0 # Library path: /sw/arch/RHEL9/EB_production/2024/software/UCX/1.16.0-GCCcore-13.3.0/lib/libucs.so.0 # API headers version: 1.16.0 # Git branch '', revision e4bb802 # Configured with: --prefix=/sw/arch/RHEL9/EB_production/2024/software/UCX/1.16.0-GCCcore-13.3.0 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --enable-optimizations --enable-cma --enable-mt -- with-verbs --without-java --without-go --disable-doxygen-doc --disable-logging --disable-debug --disable-assertions --disable-params-check [satishk@tcn643 ~]$
[satishk@tcn643 ~]$ echo $UCX_TLS self,sm,rc
[satishk@tcn643 ~]$ uname -a Linux tcn643.local.snellius.surf.nl 5.14.0-427.31.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 9 14:06:03 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux [satishk@tcn643 ~]$ cat /etc/redhat-release Red Hat Enterprise Linux release 9.4 (Plow) [satishk@tcn643 ~]$
rpm -q rdma-core
rpm -q libibverbs
ofed_info -s
ibstat
ibv_devinfo -vv
hca_id: mlx5_0 transport: InfiniBand (0) fw_ver: 26.39.2048 node_guid: 387c:7603:007c:ab4e sys_image_guid: 387c:7603:007c:ab4e vendor_id: 0x02c9 vendor_part_id: 4127 hw_ver: 0x0 board_id: LNV0000000049 phys_port_cnt: 1 max_mr_size: 0xffffffffffffffff page_size_cap: 0xfffffffffffff000 max_qp: 131072 max_qp_wr: 32768 device_cap_flags: 0x25321c36 BAD_PKEY_CNTR BAD_QKEY_CNTR AUTO_PATH_MIG CHANGE_PHY_PORT PORT_ACTIVE_EVENT SYS_IMAGE_GUID RC_RNR_NAK_GEN MEM_WINDOW XRC MEM_MGT_EXTENSIONS MEM_WINDOW_TYPE_2B RAW_IP_CSUM MANAGED_FLOW_STEERING max_sge: 30 max_sge_rd: 30 max_cq: 16777216 max_cqe: 4194303 max_mr: 16777216 max_pd: 8388608 max_qp_rd_atom: 16 max_ee_rd_atom: 0 max_res_rd_atom: 2097152 max_qp_init_rd_atom: 16 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 16777216 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 0 max_mcast_grp: 2097152 max_mcast_qp_attach: 240 max_total_mcast_qp_attach: 503316480 max_ah: 2147483647 max_fmr: 0 max_srq: 8388608 max_srq_wr: 32767 max_srq_sge: 31 max_pkeys: 128 local_ca_ack_delay: 16 general_odp_caps: ODP_SUPPORT ODP_SUPPORT_IMPLICIT rc_odp_caps: SUPPORT_SEND SUPPORT_RECV SUPPORT_WRITE SUPPORT_READ SUPPORT_ATOMIC SUPPORT_SRQ uc_odp_caps: NO SUPPORT ud_odp_caps: SUPPORT_SEND xrc_odp_caps: SUPPORT_SEND SUPPORT_WRITE SUPPORT_READ SUPPORT_ATOMIC SUPPORT_SRQ completion timestamp_mask: 0x7fffffffffffffff hca_core_clock: 1000000kHZ raw packet caps: C-VLAN stripping offload Scatter FCS offload IP csum offload Delay drop device_cap_flags_ex: 0x3000001425321C36 RAW_SCATTER_FCS PCI_WRITE_END_PADDING Unknown flags: 0x3000000000000000 tso_caps: max_tso: 262144 supported_qp: SUPPORT_RAW_PACKET rss_caps: max_rwq_indirection_tables: 524288 max_rwq_indirection_table_size: 2048 rx_hash_function: 0x1 rx_hash_fields_mask: 0x800000FF supported_qp: SUPPORT_RAW_PACKET max_wq_type_rq: 8388608 packet_pacing_caps: qp_rate_limit_min: 1kbps qp_rate_limit_max: 25000000kbps supported_qp: SUPPORT_RAW_PACKET tag matching not supported cq moderation caps: max_cq_count: 65535 max_cq_period: 4095 us maximum available device memory: 131072Bytes num_comp_vectors: 63 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet max_msg_sz: 0x40000000 port_cap_flags: 0x04010000 port_cap_flags2: 0x0000 max_vl_num: invalid value (0) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 1 gid_tbl_len: 255 subnet_timeout: 0 init_type_reply: 0 active_width: 1X (1) active_speed: 25.0 Gbps (32) phys_state: LINK_UP (5) GID[ 0]: fe80:0000:0000:0000:3a7c:76ff:fe7c:ab4e, RoCE v1 GID[ 1]: fe80::3a7c:76ff:fe7c:ab4e, RoCE v2 GID[ 2]: 0000:0000:0000:0000:0000:ffff:ac12:3a8d, RoCE v1 GID[ 3]: ::ffff:172.18.58.141, RoCE v2 GID[ 4]: 0000:0000:0000:0000:0000:ffff:9188:3a8d, RoCE v1 GID[ 5]: ::ffff:145.136.58.141, RoCE v2 GID[ 6]: fe80:0000:0000:0000:93b1:9349:73b1:4b4a, RoCE v1 GID[ 7]: fe80::93b1:9349:73b1:4b4a, RoCE v2 hca_id: mlx5_1 transport: InfiniBand (0) fw_ver: 26.39.2048 node_guid: 387c:7603:007c:ab4f sys_image_guid: 387c:7603:007c:ab4e vendor_id: 0x02c9 vendor_part_id: 4127 hw_ver: 0x0 board_id: LNV0000000049 phys_port_cnt: 1 max_mr_size: 0xffffffffffffffff page_size_cap: 0xfffffffffffff000 max_qp: 131072 max_qp_wr: 32768 device_cap_flags: 0x25321c36 BAD_PKEY_CNTR BAD_QKEY_CNTR AUTO_PATH_MIG CHANGE_PHY_PORT PORT_ACTIVE_EVENT SYS_IMAGE_GUID RC_RNR_NAK_GEN MEM_WINDOW XRC MEM_MGT_EXTENSIONS MEM_WINDOW_TYPE_2B RAW_IP_CSUM MANAGED_FLOW_STEERING max_sge: 30 max_sge_rd: 30 max_cq: 16777216 max_cqe: 4194303 max_mr: 16777216 max_pd: 8388608 max_qp_rd_atom: 16 max_ee_rd_atom: 0 max_res_rd_atom: 2097152 max_qp_init_rd_atom: 16 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 16777216 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 0 max_mcast_grp: 2097152 max_mcast_qp_attach: 240 max_total_mcast_qp_attach: 503316480 max_ah: 2147483647 max_fmr: 0 max_srq: 8388608 max_srq_wr: 32767 max_srq_sge: 31 max_pkeys: 128 local_ca_ack_delay: 16 general_odp_caps: ODP_SUPPORT ODP_SUPPORT_IMPLICIT rc_odp_caps: SUPPORT_SEND SUPPORT_RECV SUPPORT_WRITE SUPPORT_READ SUPPORT_ATOMIC SUPPORT_SRQ uc_odp_caps: NO SUPPORT ud_odp_caps: SUPPORT_SEND xrc_odp_caps: SUPPORT_SEND SUPPORT_WRITE SUPPORT_READ SUPPORT_ATOMIC SUPPORT_SRQ completion timestamp_mask: 0x7fffffffffffffff hca_core_clock: 1000000kHZ raw packet caps: C-VLAN stripping offload Scatter FCS offload IP csum offload Delay drop device_cap_flags_ex: 0x3000001425321C36 RAW_SCATTER_FCS PCI_WRITE_END_PADDING Unknown flags: 0x3000000000000000 tso_caps: max_tso: 262144 supported_qp: SUPPORT_RAW_PACKET rss_caps: max_rwq_indirection_tables: 524288 max_rwq_indirection_table_size: 2048 rx_hash_function: 0x1 rx_hash_fields_mask: 0x800000FF supported_qp: SUPPORT_RAW_PACKET max_wq_type_rq: 8388608 packet_pacing_caps: qp_rate_limit_min: 1kbps qp_rate_limit_max: 25000000kbps supported_qp: SUPPORT_RAW_PACKET tag matching not supported cq moderation caps: max_cq_count: 65535 max_cq_period: 4095 us maximum available device memory: 131072Bytes num_comp_vectors: 63 port: 1 state: PORT_DOWN (1) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet max_msg_sz: 0x40000000 port_cap_flags: 0x04010000 port_cap_flags2: 0x0000 max_vl_num: invalid value (0) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 1 gid_tbl_len: 255 subnet_timeout: 0 init_type_reply: 0 active_width: 4X (2) active_speed: 10.0 Gbps (4) phys_state: DISABLED (3) GID[ 0]: fe80:0000:0000:0000:3a7c:76ff:fe7c:ab4f, RoCE v1 GID[ 1]: fe80::3a7c:76ff:fe7c:ab4f, RoCE v2 hca_id: mlx5_2 transport: InfiniBand (0) fw_ver: 28.39.2048 node_guid: 946d:ae03:0054:2b46 sys_image_guid: 946d:ae03:0054:2b46 vendor_id: 0x02c9 vendor_part_id: 4129 hw_ver: 0x0 board_id: LNV0000000056 phys_port_cnt: 1 max_mr_size: 0xffffffffffffffff page_size_cap: 0xfffffffffffff000 max_qp: 131072 max_qp_wr: 32768 device_cap_flags: 0x21361c36 BAD_PKEY_CNTR BAD_QKEY_CNTR AUTO_PATH_MIG CHANGE_PHY_PORT PORT_ACTIVE_EVENT SYS_IMAGE_GUID RC_RNR_NAK_GEN MEM_WINDOW UD_IP_CSUM XRC MEM_MGT_EXTENSIONS MEM_WINDOW_TYPE_2B MANAGED_FLOW_STEERING max_sge: 30 max_sge_rd: 30 max_cq: 16777216 max_cqe: 4194303 max_mr: 16777216 max_pd: 8388608 max_qp_rd_atom: 16 max_ee_rd_atom: 0 max_res_rd_atom: 2097152 max_qp_init_rd_atom: 16 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 16777216 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 0 max_mcast_grp: 2097152 max_mcast_qp_attach: 240 max_total_mcast_qp_attach: 503316480 max_ah: 2147483647 max_fmr: 0 max_srq: 8388608 max_srq_wr: 32767 max_srq_sge: 31 max_pkeys: 128 local_ca_ack_delay: 16 general_odp_caps: ODP_SUPPORT ODP_SUPPORT_IMPLICIT rc_odp_caps: SUPPORT_SEND SUPPORT_RECV SUPPORT_WRITE SUPPORT_READ SUPPORT_ATOMIC SUPPORT_SRQ uc_odp_caps: NO SUPPORT ud_odp_caps: SUPPORT_SEND xrc_odp_caps: SUPPORT_SEND SUPPORT_WRITE SUPPORT_READ SUPPORT_ATOMIC SUPPORT_SRQ completion timestamp_mask: 0x7fffffffffffffff hca_core_clock: 1000000kHZ device_cap_flags_ex: 0x3000005021361C36 PCI_WRITE_END_PADDING Unknown flags: 0x3000004000000000 tso_caps: max_tso: 0 rss_caps: max_rwq_indirection_tables: 0 max_rwq_indirection_table_size: 0 rx_hash_function: 0x0 rx_hash_fields_mask: 0x0 max_wq_type_rq: 0 packet_pacing_caps: qp_rate_limit_min: 0kbps qp_rate_limit_max: 0kbps max_rndv_hdr_size: 64 max_num_tags: 127 max_ops: 32768 max_sge: 1 flags: IBV_TM_CAP_RC cq moderation caps: max_cq_count: 65535 max_cq_period: 4095 us maximum available device memory: 131072Bytes num_comp_vectors: 63 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 98 port_lid: 1164 port_lmc: 0x00 link_layer: InfiniBand max_msg_sz: 0x40000000 port_cap_flags: 0xa351e848 port_cap_flags2: 0x0432 max_vl_num: 4 (3) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 8 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 100.0 Gbps (128) phys_state: LINK_UP (5) GID[ 0]: fe80:0000:0000:0000:946d:ae03:0054:2b46
ucx_info -d
[satishk@tcn643 ~]$ ucx_info -d # # Memory domain: self # Component: self # register: unlimited, cost: 0 nsec # remote key: 0 bytes # rkey_ptr is supported # memory types: host (access,reg_nonblock,reg,cache) # # Transport: self # Device: memory # Type: loopback # System device: <unknown> # # capabilities: # bandwidth: 0.00/ppn + 19360.00 MB/sec # latency: 0 nsec # overhead: 10 nsec # put_short: <= 4294967295 # put_bcopy: unlimited # get_bcopy: unlimited # am_short: <= 8K # am_bcopy: <= 8K # domain: cpu # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 0 bytes # iface address: 8 bytes # error handling: ep_check # # # Memory domain: tcp # Component: tcp # register: unlimited, cost: 0 nsec # remote key: 0 bytes # memory types: host (access,reg_nonblock,reg,cache) # # Transport: tcp # Device: eno2np0 # Type: network # System device: eno2np0 (0) # # capabilities: # bandwidth: 2200.00/ppn + 0.00 MB/sec # latency: 5223 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 1 # device num paths: 1 # max eps: 256 # device address: 6 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure, ep_check, keepalive # # Transport: tcp # Device: enp2s0f4u1u6 # Type: network # System device: <unknown> # # capabilities: # bandwidth: 48.09/ppn + 0.00 MB/sec # latency: 6555 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 1 # device num paths: 1 # max eps: 256 # device address: 6 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure, ep_check, keepalive # # Transport: tcp # Device: ib0 # Type: network # System device: ib0 (1) # # capabilities: # bandwidth: 2200.00/ppn + 0.00 MB/sec # latency: 5201 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 1 # device num paths: 1 # max eps: 256 # device address: 6 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure, ep_check, keepalive # # Transport: tcp # Device: lo # Type: network # System device: <unknown> # # capabilities: # bandwidth: 11.91/ppn + 0.00 MB/sec # latency: 10960 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 1 # device num paths: 1 # max eps: 256 # device address: 18 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure, ep_check, keepalive # # Transport: tcp # Device: vlan-pub.136 # Type: network # System device: eno2np0 (0) # # capabilities: # bandwidth: 2200.00/ppn + 0.00 MB/sec # latency: 5223 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 0 # device num paths: 1 # max eps: 256 # device address: 6 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure, ep_check, keepalive # # # Connection manager: tcp # max_conn_priv: 2064 bytes # # Memory domain: sysv # Component: sysv # allocate: unlimited # remote key: 12 bytes # rkey_ptr is supported # memory types: host (access,alloc,cache) # # Transport: sysv # Device: memory # Type: intra-node # System device: <unknown> # # capabilities: # bandwidth: 0.00/ppn + 15360.00 MB/sec # latency: 80 nsec # overhead: 10 nsec # put_short: <= 4294967295 # put_bcopy: unlimited # get_bcopy: unlimited # am_short: <= 100 # am_bcopy: <= 8256 # domain: cpu # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 8 bytes # iface address: 8 bytes # error handling: ep_check # # # Memory domain: posix # Component: posix # allocate: <= 197940984K # remote key: 24 bytes # rkey_ptr is supported # memory types: host (access,alloc,cache) # # Transport: posix # Device: memory # Type: intra-node # System device: <unknown> # # capabilities: # bandwidth: 0.00/ppn + 15360.00 MB/sec # latency: 80 nsec # overhead: 10 nsec # put_short: <= 4294967295 # put_bcopy: unlimited # get_bcopy: unlimited # am_short: <= 100 # am_bcopy: <= 8256 # domain: cpu # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 8 bytes # iface address: 8 bytes # error handling: ep_check # # # Memory domain: mlx5_0 # Component: ib # register: unlimited, dmabuf, cost: 16000 + 0.060 * N nsec # remote key: 8 bytes # local memory handle is required for zcopy # memory invalidation is supported # memory types: host (access,reg,cache) # # Transport: dc_mlx5 # Device: mlx5_0:1 # Type: network # System device: mlx5_0 (0) # # capabilities: # bandwidth: 2739.46/ppn + 0.00 MB/sec # latency: 860 nsec # overhead: 40 nsec # put_short: <= 2K # put_bcopy: <= 8256 # put_zcopy: <= 1G, up to 11 iov # put_opt_zcopy_align: <= 512 # put_align_mtu: <= 1K # get_bcopy: <= 8256 # get_zcopy: 65..1G, up to 11 iov # get_opt_zcopy_align: <= 512 # get_align_mtu: <= 1K # am_short: <= 2046 # am_bcopy: <= 8254 # am_zcopy: <= 8254, up to 3 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 1K # am header: <= 138 # domain: device # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to iface # device priority: 45 # device num paths: 1 # max eps: inf # device address: 18 bytes # iface address: 7 bytes # error handling: buffer (zcopy), remote access, peer failure, ep_check # # # Transport: rc_verbs # Device: mlx5_0:1 # Type: network # System device: mlx5_0 (0) # # capabilities: # bandwidth: 2739.46/ppn + 0.00 MB/sec # latency: 800 + 1.000 * N nsec # overhead: 75 nsec # put_short: <= 124 # put_bcopy: <= 8256 # put_zcopy: <= 1G, up to 5 iov # put_opt_zcopy_align: <= 512 # put_align_mtu: <= 1K # get_bcopy: <= 8256 # get_zcopy: 65..1G, up to 5 iov # get_opt_zcopy_align: <= 512 # get_align_mtu: <= 1K # am_short: <= 123 # am_bcopy: <= 8255 # am_zcopy: <= 8255, up to 4 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 1K # am header: <= 127 # domain: device # atomic_add: 64 bit # atomic_fadd: 64 bit # atomic_cswap: 64 bit # connection: to ep # device priority: 45 # device num paths: 1 # max eps: 256 # device address: 18 bytes # ep address: 7 bytes # error handling: peer failure, ep_check # # # Transport: rc_mlx5 # Device: mlx5_0:1 # Type: network # System device: mlx5_0 (0) # # capabilities: # bandwidth: 2739.46/ppn + 0.00 MB/sec # latency: 800 + 1.000 * N nsec # overhead: 40 nsec # put_short: <= 2K # put_bcopy: <= 8256 # put_zcopy: <= 1G, up to 14 iov # put_opt_zcopy_align: <= 512 # put_align_mtu: <= 1K # get_bcopy: <= 8256 # get_zcopy: 65..1G, up to 14 iov # get_opt_zcopy_align: <= 512 # get_align_mtu: <= 1K # am_short: <= 2046 # am_bcopy: <= 8254 # am_zcopy: <= 8254, up to 3 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 1K # am header: <= 186 # domain: device # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to ep # device priority: 45 # device num paths: 1 # max eps: 256 # device address: 18 bytes # ep address: 10 bytes # error handling: buffer (zcopy), remote access, peer failure, ep_check # # # Transport: ud_verbs # Device: mlx5_0:1 # Type: network # System device: mlx5_0 (0) # # capabilities: # bandwidth: 2739.46/ppn + 0.00 MB/sec # latency: 830 nsec # overhead: 105 nsec # am_short: <= 116 # am_bcopy: <= 1016 # am_zcopy: <= 1016, up to 5 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 1K # am header: <= 920 # connection: to ep, to iface # device priority: 45 # device num paths: 1 # max eps: inf # device address: 18 bytes # iface address: 3 bytes # ep address: 6 bytes # error handling: peer failure, ep_check # # # Transport: ud_mlx5 # Device: mlx5_0:1 # Type: network # System device: mlx5_0 (0) # # capabilities: # bandwidth: 2739.46/ppn + 0.00 MB/sec # latency: 830 nsec # overhead: 80 nsec # am_short: <= 180 # am_bcopy: <= 1016 # am_zcopy: <= 1016, up to 3 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 1K # am header: <= 132 # connection: to ep, to iface # device priority: 45 # device num paths: 1 # max eps: inf # device address: 18 bytes # iface address: 3 bytes # ep address: 6 bytes # error handling: peer failure, ep_check # # # Memory domain: mlx5_1 # Component: ib # register: unlimited, dmabuf, cost: 16000 + 0.060 * N nsec # remote key: 8 bytes # local memory handle is required for zcopy # memory invalidation is supported # memory types: host (access,reg,cache) # < no supported devices found > # # Memory domain: mlx5_2 # Component: ib # register: unlimited, dmabuf, cost: 16000 + 0.060 * N nsec # remote key: 8 bytes # local memory handle is required for zcopy # memory invalidation is supported # memory types: host (access,reg,cache) # # Transport: dc_mlx5 # Device: mlx5_2:1 # Type: network # System device: mlx5_2 (1) # # capabilities: # bandwidth: 26896.18/ppn + 0.00 MB/sec # latency: 660 nsec # overhead: 40 nsec # put_short: <= 2K # put_bcopy: <= 8256 # put_zcopy: <= 1G, up to 11 iov # put_opt_zcopy_align: <= 512 # put_align_mtu: <= 4K # get_bcopy: <= 8256 # get_zcopy: 65..1G, up to 11 iov # get_opt_zcopy_align: <= 512 # get_align_mtu: <= 4K # am_short: <= 2046 # am_bcopy: <= 8254 # am_zcopy: <= 8254, up to 3 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 4K # am header: <= 138 # domain: device # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to iface # device priority: 70 # device num paths: 2 # max eps: inf # device address: 3 bytes # iface address: 7 bytes # error handling: buffer (zcopy), remote access, peer failure, ep_check # # # Transport: rc_verbs # Device: mlx5_2:1 # Type: network # System device: mlx5_2 (1) # # capabilities: # bandwidth: 26896.18/ppn + 0.00 MB/sec # latency: 600 + 1.000 * N nsec # overhead: 75 nsec # put_short: <= 124 # put_bcopy: <= 8256 # put_zcopy: <= 1G, up to 5 iov # put_opt_zcopy_align: <= 512 # put_align_mtu: <= 4K # get_bcopy: <= 8256 # get_zcopy: 65..1G, up to 5 iov # get_opt_zcopy_align: <= 512 # get_align_mtu: <= 4K # am_short: <= 123 # am_bcopy: <= 8255 # am_zcopy: <= 8255, up to 4 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 4K # am header: <= 127 # domain: device # atomic_add: 64 bit # atomic_fadd: 64 bit # atomic_cswap: 64 bit # connection: to ep # device priority: 70 # device num paths: 2 # max eps: 256 # device address: 3 bytes # ep address: 7 bytes # error handling: peer failure, ep_check # # # Transport: rc_mlx5 # Device: mlx5_2:1 # Type: network # System device: mlx5_2 (1) # # capabilities: # bandwidth: 26896.18/ppn + 0.00 MB/sec # latency: 600 + 1.000 * N nsec # overhead: 40 nsec # put_short: <= 2K # put_bcopy: <= 8256 # put_zcopy: <= 1G, up to 14 iov # put_opt_zcopy_align: <= 512 # put_align_mtu: <= 4K # get_bcopy: <= 8256 # get_zcopy: 65..1G, up to 14 iov # get_opt_zcopy_align: <= 512 # get_align_mtu: <= 4K # am_short: <= 2046 # am_bcopy: <= 8254 # am_zcopy: <= 8254, up to 3 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 4K # am header: <= 186 # domain: device # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to ep # device priority: 70 # device num paths: 2 # max eps: 256 # device address: 3 bytes # ep address: 10 bytes # error handling: buffer (zcopy), remote access, peer failure, ep_check # # # Transport: ud_verbs # Device: mlx5_2:1 # Type: network # System device: mlx5_2 (1) # # capabilities: # bandwidth: 26896.18/ppn + 0.00 MB/sec # latency: 630 nsec # overhead: 105 nsec # am_short: <= 116 # am_bcopy: <= 4088 # am_zcopy: <= 4088, up to 5 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 4K # am header: <= 3992 # connection: to ep, to iface # device priority: 70 # device num paths: 2 # max eps: inf # device address: 3 bytes # iface address: 3 bytes # ep address: 6 bytes # error handling: peer failure, ep_check # # # Transport: ud_mlx5 # Device: mlx5_2:1 # Type: network # System device: mlx5_2 (1) # # capabilities: # bandwidth: 26896.18/ppn + 0.00 MB/sec # latency: 630 nsec # overhead: 80 nsec # am_short: <= 180 # am_bcopy: <= 4088 # am_zcopy: <= 4088, up to 3 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 4K # am header: <= 132 # connection: to ep, to iface # device priority: 70 # device num paths: 2 # max eps: inf # device address: 3 bytes # iface address: 3 bytes # ep address: 6 bytes # error handling: peer failure, ep_check # # # Connection manager: rdmacm # max_conn_priv: 54 bytes # # Memory domain: cma # Component: cma # register: unlimited, cost: 9 nsec # memory types: host (access,reg_nonblock,reg,cache) # # Transport: cma # Device: memory # Type: intra-node # System device: <unknown> # # capabilities: # bandwidth: 0.00/ppn + 11145.00 MB/sec # latency: 80 nsec # overhead: 2000 nsec # put_zcopy: unlimited, up to 16 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 1 # get_zcopy: unlimited, up to 16 iov # get_opt_zcopy_align: <= 1 # get_align_mtu: <= 1 # connection: to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 8 bytes # iface address: 4 bytes # error handling: peer failure, ep_check # [satishk@tcn643 ~]$
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Describe the bug
Output of OSU bandwidth test for 2 processes within the same
numa
node.Output of OSU test for 2 processes (1 process per node) on 2 different nodes connected via Infiniband:
Steps to Reproduce
mpirun --report-bindings -np 2 -npernode 1 osu_bw
ucx_info -v
:Setup and versions
rpm -q rdma-core
orrpm -q libibverbs
ofed_info -s
ibstat
oribv_devinfo -vv
command:`ibv_devinfo -vv`
Additional information (depending on the issue)
ucx_info -d
to show transports and devices recognized by UCX:`ucx_info -d`
The text was updated successfully, but these errors were encountered: