Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve resiliency on ip context policy routes #513

Merged
merged 2 commits into from
Sep 20, 2022

Conversation

LionelJouin
Copy link
Member

@LionelJouin LionelJouin commented Aug 23, 2022

When a forwarder was restarting, the handled table IDs by the forwarded
were forgotten which caused duplicated tables/rules/routes in the pod
and potentially left routes which could cause traffic disturbances in
some cases.

Now, the forwarder is checking if the tables which exist in the pod are
corresponding to the requested policies before creating any other ones.
In case such tables exist, the forwarder will consider it as its own
table.

#426

@LionelJouin LionelJouin added the bug Something isn't working label Aug 23, 2022
@LionelJouin LionelJouin force-pushed the policy-resiliency branch 3 times, most recently from 45d41ad to c9eaf56 Compare August 23, 2022 14:32
When a forwarder was restarting, the handled table IDs by the forwarded
were forgotten which caused duplicated tables/rules/routes in the pod
and potentially left routes which could cause traffic disturbances in
some cases.

Now, the forwarder is checking if the tables which exist in the pod are
corresponding to the requested policies before creating any other ones.
In case such tables exist, the forwarder will consider it as its own
table.

Signed-off-by: Lionel Jouin <[email protected]>
Copy link
Contributor

@glazychev-art glazychev-art left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I have a couple of suggestions, what do you think about this?

}
ps = make(map[int]*networkservice.PolicyRoute)
tableIDs.Store(conn.GetId(), ps)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will be here only once for each client, so if the map is not empty, then nothing needs to be restored.

else {
    return nil
}

missingPolicies, _ := getPolicyDifferences(ps, conn.Context.IpContext.Policies)

// try to find the corresponding missing policies in the network namespace of the pod
for _, policy := range missingPolicies {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need missingPolicies. Because if forwarder was restarted then ps is always empty and we need to add all of conn.Context.IpContext.Policies

for _, policy := range conn.Context.IpContext.Policies {
...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, both comments come together. I was thinking maybe it would be possible the rule to be already existing in the pod while not registered in the map. But you are right, that could probably not happen. I will fix it

* Fix based on comment in PR/513
* After a forwarder restart, the routes could have been removed, so
  instead of trying to recover the rule/route, the heal part will delete
  them and they will be recreated.

Signed-off-by: Lionel Jouin <[email protected]>
@LionelJouin
Copy link
Member Author

@glazychev-art , in the new commit I changed the way to handle the healing. In case of a forwarder restart, It could happen that the routes are removed, the most simple way I found is to flush the rule/route, so they will be recreated in the common.go.

@glazychev-art
Copy link
Contributor

LGTM

@edwarnicke edwarnicke merged commit 0a1f4be into networkservicemesh:main Sep 20, 2022
nsmbot pushed a commit to networkservicemesh/cmd-nse-l7-proxy that referenced this pull request Sep 20, 2022
…k-kernel@main

PR link: networkservicemesh/sdk-kernel#513

Commit: 0a1f4be
Author: Ed Warnicke
Date: 2022-09-20 09:53:28 -0500
Message:
  - Merge pull request #513 from Nordix/policy-resiliency
Signed-off-by: NSMBot <[email protected]>
nsmbot pushed a commit to networkservicemesh/sdk-sriov that referenced this pull request Sep 20, 2022
…k-kernel@main

PR link: networkservicemesh/sdk-kernel#513

Commit: 0a1f4be
Author: Ed Warnicke
Date: 2022-09-20 09:53:28 -0500
Message:
  - Merge pull request #513 from Nordix/policy-resiliency
Signed-off-by: NSMBot <[email protected]>
nsmbot pushed a commit to networkservicemesh/sdk-vpp that referenced this pull request Sep 20, 2022
…k-kernel@main

PR link: networkservicemesh/sdk-kernel#513

Commit: 0a1f4be
Author: Ed Warnicke
Date: 2022-09-20 09:53:28 -0500
Message:
  - Merge pull request #513 from Nordix/policy-resiliency
Signed-off-by: NSMBot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants