Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context deadline exceeded during vppapi call #802

Open
glazychev-art opened this issue Feb 12, 2024 · 2 comments
Open

Context deadline exceeded during vppapi call #802

glazychev-art opened this issue Feb 12, 2024 · 2 comments

Comments

@glazychev-art
Copy link
Contributor

Description

Let's assume that the request timeout exceeded during some vppapi call. And at this time the Request was processed by VPP, but in our application govpp returns a context timeout error.
As a result, we have leaked vpp resources.

For example:

  1. Create a tap interface with almost expired context:
    tapCreateV2 := &tapv2.TapCreateV2{
    ID: ^uint32(0),
    UseRandomMac: true,
    NumRxQueues: 1,
    TxRingSz: 1024,
    RxRingSz: 1024,
    HostIfNameSet: true,
    HostIfName: mechanism.GetInterfaceName(),
    HostNamespaceSet: true,
    HostNamespace: nsFilename,
    TapFlags: tapv2.TAP_API_FLAG_TUN,
    }
    if conn.GetPayload() == payload.Ethernet {
    tapCreateV2.TapFlags ^= tapv2.TAP_API_FLAG_TUN
    }
    rsp, err := tapv2.NewServiceClient(vppConn).TapCreateV2(ctx, tapCreateV2)
    if err != nil {
    return errors.Wrap(err, "vppapi TapCreateV2 returned error")
    }
  2. It returns an error with context deadline exceeded
  3. But, based on &tapv2.TapCreateV2{...}, VPP creates a corresponding interface in the client namespace with a specific name.
  4. Since we received an error on the line 84, we cannot control this interface in the application. We can't delete it, for example.
  5. We have a resource leak.

Logs:

...
Feb  6 11:27:50.854�[31m [ERRO] [id:34655f24-cc23-4a94-84c3-94a0e317d25d] [type:networkService] �[0m(32.3)                                  vppapi TapCreateV2 returned error: context deadline exceeded
...
vpp# show int addr
tap6 (up):
  L2 xconnect vxlan_tunnel2
tap7 (dn):
tap8 (up):
  L2 xconnect vxlan_tunnel3

Possible solutions

  1. Make vppapi calls atomic. For example, implement transactions
  2. Check the remaining context timeout before each vppapi call
    https://github.com/networkservicemesh/vpphelper/blob/e2b961f768b67dfe0687f5aa90696ffdeffba203/connection.go#L100
  3. Create a chain element, that will check the remaining context timeout for the each endpoint (NSMgr, Forwarder, NSE...)
@ljkiraly
Copy link
Contributor

I am facing with a packet drop issue. Checking the vpp status seems that the symptoms are very similar in my case.

An NSC interface (nsm-3) drops packets:

60: nsm-3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 02:fe:e6:a4:f6:48 brd ff:ff:ff:ff:ff:ff
    alias server-eric-dsc-fdr-7bd9967b59-62478-proxy.vpn1-b.sc-diam.deascmqv01-0
    RX:  bytes packets errors dropped  missed   mcast
      55181402  491133      0       0       0       0
    TX:  bytes packets errors dropped carrier collsns
      50193628  433318      0  300475       0       0

Based on vppctl command outputs the 'tap22' interface is a remainder, not connected to any other tap or vxlan:

tap10 (up):
  L2 xconnect tap14
tap22 (dn):
tap14 (up):
  L2 xconnect tap10
Interface: tap10 (ifindex 2)
  name "nsm-3"
  host-ns "/proc/3154179/fd/100"
  host-mac-addr: 02:fe:e6:a4:f6:48
--
Interface: tap22 (ifindex 35)
  name "nsm-3"
  host-ns "/proc/3154179/fd/93"
  host-mac-addr: 02:fe:c6:0e:de:4b
--
Interface: tap14 (ifindex 31)
  name "nse1-3969"
  host-ns "/proc/3154179/fd/121"
host-mac-addr: 02:fe:4a:b2:9e:77

From kernel link printout seems that there are two queues used by nsm-3 (numqueues 2)

60: nsm-3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 02:fe:e6:a4:f6:48 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65521
    tun type tap pi off vnet_hdr on multi_queue numqueues 2 numdisabled 0 persist off addrgenmode eui64 numtxqueues 256 numrxqueues 256 gso_max_size 65536 gso_max_segs 65535
    alias server-eric-dsc-fdr-7bd9967b59-62478-proxy.vpn1-b.sc-diam.deascmqv01-0

By default NSM creates the tap with single queue using just one fd to read/write the packets. Creating the second tap with interface name 'nsm-3' in vpp does not allocate a new device just open a new fd and use as a second queue to the existing tun device('nsm-3'). Since 'tap22' and the associated fd is not used, the packets are dropped by the kernel.

@denis-tingaikin, @Ex4amp1e : May the PR #867 solve this resource leak issue?

@denis-tingaikin
Copy link
Member

@ljkiraly Yes, the solution for networkservicemesh/cmd-forwarder-vpp#1133 will resolve this issue, as will #378.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants