Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEIGH_TABLE not populated with VXLAN routes #3384

Open
bradh352 opened this issue Nov 20, 2024 · 0 comments
Open

NEIGH_TABLE not populated with VXLAN routes #3384

bradh352 opened this issue Nov 20, 2024 · 0 comments

Comments

@bradh352
Copy link

bradh352 commented Nov 20, 2024

Observed on master and 202405 (with PR #3383 applied to make VXLANs actually work).

Basic architecture is VXLAN EVPN with an l3 irb/vni interface on the switches participating in the vxlan fabric.

|------------| 10.0.0.50             |---------------------------------|
|    host    |-----------------------|             sonic1              |
|------------|             Ethernet8 | Loopback0: 172.16.0.1           |
                           Untagged  | VLAN 2/VNI 10002 irb: 10.0.0.71 |
                           VLAN2     |---------------------------------|
                           VNI 10002                 | Ethernet54
                                                     | BGP Unnumbered
                                                     |
                                                     |
                                          Ethernet54 |
                                      BGP Unnumbered |
                                     |---------------------------------|
                                     |              sonic2             |
                                     | Loopback0: 172.16.0.2           |
                                     | VLAN 2/VNI 10002 irb: 10.0.0.72 |
                                     |---------------------------------|

In sw2 I've noticed log entries like:

2024 Nov 20 21:42:55.482102 sw2 WARNING swss#arp_update[900]: 108 MAC mismatch for 10.0.0.50 on Vlan2 - kernel: 18:5a:58:2a:e8:20, APPL_DB:

Then when I investigate, NEIGH_TABLE in APPL_DB doesn't have any neighbors listed for Vlan2.

# sonic-db-dump -n APPL_DB -y -k "NEIGH_TABLE:Vlan2:*"
{}

But the kernel has the neighbor listed as added by BGP/Zebra:

root@sw2:~# ip neigh show dev Vlan2
10.0.0.71 lladdr 74:86:e2:43:33:05 extern_learn NOARP proto zebra 
10.0.0.50 lladdr 18:5a:58:2a:e8:20 extern_learn NOARP proto zebra 
fe80::7686:e2ff:fe43:3305 lladdr 74:86:e2:43:33:05 extern_learn NOARP proto zebra 

And the type-2 routes look good:

# vtysh -c "show bgp l2vpn evpn"
BGP table version is 7, local router ID is 172.16.0.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.0.1:2
 *> [2]:[0]:[48]:[18:5a:58:2a:e8:20]
                    172.16.0.1                             0 4210000001 i
                    RT:32897:10002 ET:8
 *> [2]:[0]:[48]:[18:5a:58:2a:e8:20]:[32]:[10.0.0.50]
                    172.16.0.1                             0 4210000001 i
                    RT:32897:10002 ET:8
 *> [2]:[0]:[48]:[74:86:e2:43:33:05]:[32]:[10.0.0.71]
                    172.16.0.1                             0 4210000001 i
                    RT:32897:10002 ET:8
 *> [2]:[0]:[48]:[74:86:e2:43:33:05]:[128]:[fe80::7686:e2ff:fe43:3305]
                    172.16.0.1                             0 4210000001 i
                    RT:32897:10002 ET:8
 *> [3]:[0]:[32]:[172.16.0.1]
                    172.16.0.1                             0 4210000001 i
                    RT:32897:10002 ET:8
Route Distinguisher: 172.16.0.2:2
 *> [2]:[0]:[48]:[74:86:e2:43:28:05]:[32]:[10.0.0.72]
                    172.16.0.2                         32768 i
                    ET:8 RT:32898:10002
 *> [2]:[0]:[48]:[74:86:e2:43:28:05]:[128]:[fe80::7686:e2ff:fe43:2805]
                    172.16.0.2                         32768 i
                    ET:8 RT:32898:10002
 *> [3]:[0]:[32]:[172.16.0.2]
                    172.16.0.2                         32768 i
                    ET:8 RT:32898:10002

Displayed 8 out of 8 total prefixes

Going over to the originating VTEP (sw1) where the host is directly connected, the NEIGH_TABLE is populated as expected:

# sonic-db-dump -n APPL_DB -y -k "NEIGH_TABLE:Vlan2:*"
{
  "NEIGH_TABLE:Vlan2:10.0.0.50": {
    "expireat": 1732144228.8517292,
    "ttl": -0.001,
    "type": "hash",
    "value": {
      "family": "IPv4",
      "neigh": "18:5a:58:2a:e8:20"
    }
  }
}

And we see these log entries.

2024 Nov 20 22:13:42.369623 sw1 NOTICE swss#orchagent: :- addNeighbor: Created neighbor ip 10.0.0.50, 18:5a:58:2a:e8:20 on Vlan2
2024 Nov 20 22:13:42.370310 sw1 NOTICE syncd#syncd: [none] SAI_API_NEXT_HOP:brcm_sai_create_next_hop:334 nhid 3 vr_id 0 ip af:v4 addr:10.0.0.50 rif-id 1 tunnel-id 0 vni 0
2024 Nov 20 22:13:42.370474 sw1 NOTICE syncd#syncd: [none] SAI_API_NEXT_HOP:_brcm_sai_xgs_create_ip_nexthop:554 nhid 3 eg-if 400004 rif 0 vid 0 port/tid(0x0) is_trunk(0)
2024 Nov 20 22:13:42.371069 sw1 NOTICE swss#orchagent: :- addNextHop: Created next hop 10.0.0.50 on Vlan2

I'm assuming there is some event that should cause population of the NEIGH_TABLE on sw2, which likely should also trigger off programming of the neighbor into the ASIC.

The behavior is pings go through between 10.0.0.50 and 10.0.0.72 for about 10s, then stops for 300s which appears to be the mac aging timer, then continues for about 10s, then stops again for 300s, rinse and repeat. I'm assuming this is due to some slow (during learning) vs fast path logic with the ASIC.

@bradh352 bradh352 changed the title NEIGH_TABLE not populated with VXLAN routes leading to WARNING NEIGH_TABLE not populated with VXLAN routes Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant