Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/e2e: Migrate e2e test for s390x to GHA #295

Merged
merged 1 commit into from
Jan 17, 2024

Conversation

BbolroC
Copy link
Member

@BbolroC BbolroC commented Nov 28, 2023

This PR is to migrate the following e2e tests for s390x in Jenkins to GHA:

  • confidential-containers-operator-main-ubuntu-20.04-s390x-containerd_kata-qemu-baseline
  • confidential-containers-operator-main-ubuntu-20.04-s390x-containerd_kata-qemu-PR
  • confidential-containers-operator-main-ubuntu-20.04-s390x-SE-daily

It will be staying in a work-in-progress state and the workflow structure will be changed based on that of x86_64. (i.e. a decision could be made when #260 is finalised.)

The workflow itself has been tested several times

The rationale for the {pre,post} action in the workflow is as follows:

  1. A self-hosted runner for s390x cannot be instantiated and used instantly on request, rather should be ready/running 24/7. It is inevitable to institute a hook for the management over test runs.
  2. The management script could be committed to the repo, but it wouldn't be flexible enough to tackle any environmental issues on the runner.

Of course, an introduction of the actions makes it hard to incorporate workflows for different architectures into one via matrix because if: for the selective run for an architecture (e.g. if: ${{ matrix.arch == 's390x' }}) conflicts with if: always() which is used for the post action. I look forward to any feedback for those two actions.

There are 2 things which must be prepared before the merge:

  1. Registration for a self-hosted runner for s390x
  2. Configuration of a secret named REGISTRY_CREDENTIAL_ENCODED.

Thanks.

Fixes: #294

Signed-off-by: Hyounggyu Choi [email protected]

@portersrc
Copy link
Member

Hey @BbolroC, looks good to me.
Regarding the pre- and post-action scripts and if: ${{ matrix.arch == 's390x' }} vs. if: always(): You're already using runs-on: s390x and a self-hosted runner for this workflow, so does that not solve the problem?

@BbolroC
Copy link
Member Author

BbolroC commented Dec 4, 2023

Hey @BbolroC, looks good to me. Regarding the pre- and post-action scripts and if: ${{ matrix.arch == 's390x' }} vs. if: always(): You're already using runs-on: s390x and a self-hosted runner for this workflow, so does that not solve the problem?

Hi, thank you for your comment. I'd like to add more information to further clarify the conflict I previously explained:

Let's say we would like to incorporate 2 different architectures by using 2 runners named x86_64 and s390x, respectively.

name: Example Workflow

jobs:
  job1:
    matrix:
      os: [x86_64, s390x]
    runs-on: {{ matrix.os }}

    steps:
      - name: Pre Step
        run: echo "Running Pre Step"
        if: {{ matrix.os == 's390x' }}

      - name: Main Step
        run: echo "Running Main Step"

      - name: Post Step
        if: {{ matrix.os == 's390x' }}
        run: echo "Running Post Step"

Thinking that the {pre,post} steps are only required for s390x, the conditional if: {{ matrix.os == 's390x' }} at the post step prevents the step from achieving what if: always() did before because it is not 100% true. Consequently, the post step fails to execute when other steps encounter an error.

Upon further consideration, a potential resolution could involve passing a matrix element as an environment variable into a step, like:

      - name: Post Step
        if: always()
        run: echo "Dealing with a step selectively based on an environment variable `OS`"
        env:
          OS: {{ matrix.os }}

I can confidently state that the introduction of the {pre,post} actions wouldn't pose any issues for incorporating workflows across multiple architectures. Thanks.

steps:
- name: Adjust a permission for repo
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @BbolroC should this chown call be part of the pre-action too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are totally right. Thanks for pointing it out. I will update the PR.

FYI: this optimisation will also apply to kata GHA (kata-containers/kata-containers#8649) soon. 😉

@wainersm
Copy link
Member

Hey @BbolroC, looks good to me. Regarding the pre- and post-action scripts and if: ${{ matrix.arch == 's390x' }} vs. if: always(): You're already using runs-on: s390x and a self-hosted runner for this workflow, so does that not solve the problem?

Hi, thank you for your comment. I'd like to add more information to further clarify the conflict I previously explained:

Let's say we would like to incorporate 2 different architectures by using 2 runners named x86_64 and s390x, respectively.

name: Example Workflow

jobs:
  job1:
    matrix:
      os: [x86_64, s390x]
    runs-on: {{ matrix.os }}

    steps:
      - name: Pre Step
        run: echo "Running Pre Step"
        if: {{ matrix.os == 's390x' }}

      - name: Main Step
        run: echo "Running Main Step"

      - name: Post Step
        if: {{ matrix.os == 's390x' }}
        run: echo "Running Post Step"

Thinking that the {pre,post} steps are only required for s390x, the conditional if: {{ matrix.os == 's390x' }} at the post step prevents the step from achieving what if: always() did before because it is not 100% true. Consequently, the post step fails to execute when other steps encounter an error.

Upon further consideration, a potential resolution could involve passing a matrix element as an environment variable into a step, like:

      - name: Post Step
        if: always()
        run: echo "Dealing with a step selectively based on an environment variable `OS`"
        env:
          OS: {{ matrix.os }}

Indeed it is a possible solution. You can use an environment variable as shown above and/or you can check the existence of a post-step script:

      - name: Take a post-action
        if: always()
        run: |
          if [ -f "${HOME}/script/post_action.sh" ];then
            ${HOME}/script/post_action.sh cc-operator-pr
          fi

Likewise the pre-step.

Another solution would be to decouple the post-step from this workflow. If you use the https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_run event than you can get a 2nd workflow ran afterwards which could run the post-step actions. However I think this solution kind of overkill.

I can confidently state that the introduction of the {pre,post} actions wouldn't pose any issues for incorporating workflows across multiple architectures. Thanks.

@wainersm
Copy link
Member

@ryansavino could you also jump in this discussion? Essentially @BbolroC, @portersrc and I have figured out ways to have a multi-arch and non-TEE/TEE integrated workflow for running the e2e tests. I do believe that for SEV/SNP you will need to have pre/post actions as well.

@BbolroC BbolroC changed the title WIP tests/e2e: Migrate e2e test for s390x to GHA tests/e2e: Migrate e2e test for s390x to GHA Jan 11, 2024
@BbolroC
Copy link
Member Author

BbolroC commented Jan 11, 2024

As discussed above, the workflows introduced in this PR are incorporated into the existing x86_64 workflow ccruntime_e2e.yaml. So the workflows of interest are triggered not only by PR, but also nightly.

The whole workflow looks like https://github.com/BbolroC/cc-operator/actions/runs/7492328011/job/20395717523.

@wainersm @stevenhorsman @fidencio please take a look at this and give me a feedback. Thanks!

./run-local.sh -r "${{ matrix.runtimeclass }}" -u
args="-u"
if [ $RUNNING_INSTANCE = "s390x" ]; then
args=""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BbolroC so s390x is not going to run the undo (-u) operations from the e2e framework. Will it be done (undo) by the post-action then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was concluded that the option -u is not suitable for a self-hosted runner context. The functionality of the option is delegated to the {pre,post}-action.


jobs:
e2e-on-push:
if: ${{ contains(github.event.pull_request.labels.*.name, 'ok-to-test') }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice that you are protecting the pipeline with labels. We even talked today on #confidentia-containers-ci about securing our pipelines and this is one measurement.

@@ -0,0 +1,17 @@
name: ccruntime e2e test on-push
on:
pull_request_target:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing from pull_request to pull_request_target will change the behavior of actions/checkout@v4, i.e., it will check out the pull request tree which is not rebased with main. We will need to rebase the check'ed code ourselves to emulate the same behavior of pull_request. This is a well-known problem, we can fix it as kata CI and in a follow up PR if you prefer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, then I will borrow the script and place it in this PR rather than instituting it in the next PR.

@@ -0,0 +1,17 @@
name: ccruntime e2e test on-push
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: not use on-push on the workflow name, file and jobs as this workflow is triggered on pull request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will rename ones relevant to this comment. Thanks for the comment.

Copy link
Contributor

@ldoktor ldoktor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @wainersm said it all, it looks almost ready, just the little naming issue and (if you have time) the pull_request_target rebase it should be ready to go.

@BbolroC BbolroC force-pushed the migrate-to-gha-s390x branch 3 times, most recently from 27a0adf to 71b3f69 Compare January 16, 2024 12:34
@BbolroC
Copy link
Member Author

BbolroC commented Jan 16, 2024

@wainersm @ldoktor I've updated the PR based on your feedback. Looking forward to the next step. Thanks!

@BbolroC BbolroC force-pushed the migrate-to-gha-s390x branch 4 times, most recently from 8436322 to 4b8f594 Compare January 16, 2024 13:04
This PR is to migrate the following e2e tests for s390x to GHA:

- confidential-containers-operator-main-ubuntu-20.04-s390x-containerd_kata-qemu-baseline
- confidential-containers-operator-main-ubuntu-20.04-s390x-containerd_kata-qemu-PR
- confidential-containers-operator-main-ubuntu-20.04-s390x-SE-daily

They are incorporated into the existing x86_64 test workflow `ccruntime_e2e.yaml`.

The workflows incorparated are triggered not only by PR, but also nightly at 2:00 UTC.

Fixes: confidential-containers#294

Signed-off-by: Hyounggyu Choi <[email protected]>
Copy link
Member

@wainersm wainersm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ready to go. Thanks @BbolroC !

# Recover from any previous rebase left halfway
git rebase --abort 2> /dev/null || true
if ! git rebase origin/${TARGET_BRANCH}; then
>&2 echo "Rebase failed, exiting"
Copy link
Member

@stevenhorsman stevenhorsman Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Choi, in the kata-containers version there was a different path for self-hosted runner than deleted the workspace directory to avoid merge conflicts impacting runs IIRC - is that needed here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally removed those lines here, wanting to see if that is also the case for this repo. Even if that is the case again, that could be handled by {pre,post}-action, I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the explanation!

Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BbolroC
Copy link
Member Author

BbolroC commented Jan 16, 2024

Now, we are facing the pull_request_target problem (the old checks marked as required are not triggered any more). Please help me out!

@stevenhorsman
Copy link
Member

Now, we are facing the pull_request_target problem (the old checks marked as required are not triggered any more). Please help me out!

Just to confirm, this is just a one off where we need to remove them as required and then after this merge we've have new jobs to make required that superceded them?

@BbolroC
Copy link
Member Author

BbolroC commented Jan 16, 2024

Now, we are facing the pull_request_target problem (the old checks marked as required are not triggered any more). Please help me out!

Just to confirm, this is just a one off where we need to remove them as required and then after this merge we've have new jobs to make required that superceded them?

Yes, I think so, unfortunately. If we don't like this approach, let's stop proceeding things and start re-discussing what could be a better way to trigger the tests on the expected github events for a PR in a protective (resource) way.

@stevenhorsman
Copy link
Member

Yes, I think so, unfortunately. If we don't like this approach, let's stop proceeding things and start re-discussing what could be a better way to trigger the tests on the expected github events for a PR in a protective (resource) way.

I think that makes sense to take the hit now, especially given we are far away from releases. @wainersm are you happy with this?

Copy link
Contributor

@ldoktor ldoktor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, what is the problem with the required tests? I remember the discussion but still new to this repo. In different repos when GH CI did not scale we simply tested the PR locally and pushed things via cmdline (where the required conditions are not checked). Worst case we can always revert if the GH runners find the main version broken.

@wainersm
Copy link
Member

Yes, I think so, unfortunately. If we don't like this approach, let's stop proceeding things and start re-discussing what could be a better way to trigger the tests on the expected github events for a PR in a protective (resource) way.

I think that makes sense to take the hit now, especially given we are far away from releases. @wainersm are you happy with this?

Yes @stevenhorsman , I'm happy with that.

@BbolroC I added an entry on the agenda of our meeting today to discuss the methods we have used to protect the self-hosted runners.

@wainersm
Copy link
Member

Thanks for the changes, what is the problem with the required tests? I remember the discussion but still new to this repo. In different repos when GH CI did not scale we simply tested the PR locally and pushed things via cmdline (where the required conditions are not checked). Worst case we can always revert if the GH runners find the main version broken.

Hi @ldoktor , let me see if I can explain accurately...

This PR change the current jobs so that they are triggered on a different event (pull_request_target) and now requires the ok-to-test label. If we add the ok-to-test label on this PR the new jobs will be triggered but will fail because on a pull_request_target it reads the workflows from the target branch (main) which doesn't exist until this PR is merged. I.e. it is chicken-and-eggs situation. So we will need to make the current jobs non-required so that we can merge the PR first.

@wainersm wainersm merged commit 24fc700 into confidential-containers:main Jan 17, 2024
5 checks passed
@BbolroC BbolroC deleted the migrate-to-gha-s390x branch January 17, 2024 16:17
BbolroC added a commit to BbolroC/cc-operator that referenced this pull request Jan 17, 2024
This is to fix 2 issues identified during the follow-up test for confidential-containers#295.

 - git-helper.sh misses an execution permission
 - actions/checkout@v4 does not fetch the code from the git ref

This commit is to fix these issues.

Signed-off-by: Hyounggyu Choi <[email protected]>
AdithyaKrishnan added a commit to AdithyaKrishnan/operator that referenced this pull request Mar 1, 2024
This PR is to migrate the AMD SEV and SNP e2e tests from Jenkins to GHA.

Builds on top of PR confidential-containers#295

Signed-off-by: Adithya Krishnan Kannan<[email protected]>
AdithyaKrishnan added a commit to AdithyaKrishnan/operator that referenced this pull request Mar 1, 2024
This PR is to migrate the AMD SEV and SNP e2e tests from Jenkins to GHA.

Builds on top of PR confidential-containers#295

Signed-off-by: Adithya Krishnan Kannan<[email protected]>
AdithyaKrishnan added a commit to AdithyaKrishnan/operator that referenced this pull request Mar 1, 2024
This PR is to migrate the AMD SEV and SNP e2e tests from Jenkins to GHA.

Builds on top of PR confidential-containers#295

Signed-off-by: Adithya Krishnan Kannan <[email protected]>
AdithyaKrishnan added a commit to AdithyaKrishnan/operator that referenced this pull request Mar 1, 2024
This PR is to migrate the AMD SEV and SNP e2e tests from Jenkins to GHA.

Builds on top of PR confidential-containers#295

Signed-off-by: Adithya Krishnan Kannan <[email protected]>
AdithyaKrishnan added a commit to AdithyaKrishnan/operator that referenced this pull request Mar 8, 2024
This PR is to migrate the AMD SEV and SNP e2e tests from Jenkins to GHA.
Includes some fixes suggested by ldoktor and fitzthum.
Builds on top of PR confidential-containers#295

Signed-off-by: Adithya Krishnan Kannan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

E2e test for s390x should be migrated to GHA
5 participants