Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split the github workflow in CI and CD #1063

Merged
merged 34 commits into from
Apr 3, 2024
Merged

Split the github workflow in CI and CD #1063

merged 34 commits into from
Apr 3, 2024

Conversation

famura
Copy link
Contributor

@famura famura commented Mar 20, 2024

What does this implement/fix? Explain your changes

The general idea is that we want to run a CI workflow that is more lightweight, hence, allows for faster turnover rates, and a CD workflow that runs more tests, computes the coverage, and maybe in the future directly pushes the latest package version to PyPi.

Does this close any currently open issues?

No.

Any relevant code examples, logs, error output, etc?

See the actions.

Any other comments?

I chose to run the pre-commit hooks in a different way than in test.yml such that we only need to specify one container and do everything in it.

For the first checks, the CD workflow will run on draft PRs. This should be changed when this PR is close to acceptance.

Let's debate on the CD workflow running the slow tests.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask. We're here to
help! This is simply a reminder of what we are going to look for before merging
your code.

  • I have read and understood the contribution
    guidelines
  • I agree with re-licensing my contribution from AGPLv3 to Apache-2.0.
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have reported how long the new tests run and potentially marked them
    with pytest.mark.slow.
  • New and existing unit tests pass locally with my changes
  • I performed linting and formatting as described in the contribution
    guidelines
  • I rebased on main (or there are no conflicts with main)

@famura famura added enhancement New feature or request architecture Internal changes without API consequences less-urgent This is beyond the current 2 week horizon hackathon labels Mar 20, 2024
@famura famura self-assigned this Mar 20, 2024
@Baschdl
Copy link
Contributor

Baschdl commented Mar 20, 2024

I like the idea of running the slow tests during CD to make sure they are run at all.
Did you test how big the impact of running with/without coverage is? The downside of running CI without coverage would be that we don't get a notification if a PR reduces test coverage significantly.

@famura famura marked this pull request as ready for review March 20, 2024 17:17
@famura
Copy link
Contributor Author

famura commented Mar 20, 2024

Did you test how big the impact of running with/without coverage is? The downside of running CI without coverage would be that we don't get a notification if a PR reduces test coverage significantly.

I very much agree. I am thinking of putting the coverage back in for CI, especially since the CD will only run once a PR is merged (if it stays as I intended)

I did not check the difference in cov with included slow tests

@famura
Copy link
Contributor Author

famura commented Mar 20, 2024

Even though this PR is not marked as a draft any more, it still is. The reason is that I want to debug the if clause for ignoring draft PRs but first I need to make sure that the respective workflow triggers at all xD

.github/workflows/cd.yml Outdated Show resolved Hide resolved
@famura famura marked this pull request as draft March 21, 2024 09:22
@famura
Copy link
Contributor Author

famura commented Mar 21, 2024

Ignoring draft PRs works

@famura
Copy link
Contributor Author

famura commented Mar 25, 2024

@tomMoral @janfb @Baschdl I now configured it to run only the fast tests for every push and the fast+slow tests for every push on a PR. Note, that this will lead to double execution of tests. I don't have a strong opinion in this and can easily revert it. As said before

I don't think we should do that in the CD workflow as its purpose is a bit of a different one (computing max coverage, doc, etc.). Moreover, I think that it is very unlikely that there is a failure introduced due to Python version incompatibility which only occurs in the slow tests and not the others.

I don't believe that we will catch too many error's with this, but the coverage estimate will be better for PRs

@Baschdl
Copy link
Contributor

Baschdl commented Mar 25, 2024

To quote myself (#1063 (comment)):

I think @michaeldeistler had a strong opinion against increasing the runtime of our CI.

and increasing it to over 30min is quite a lot. I would be in favor of running the slow tests as well as tests on other Python versions somewhere but an improvement to the current state would already be if that means running them only once a day etc.

@famura
Copy link
Contributor Author

famura commented Mar 25, 2024

Interesting test fail. I think it is unrelated to this PR. What do you think @Baschdl ?

Note: it happened during the slow tests, which are newly integrated in this PR.

I think it might be due to test parallelism.

@famura
Copy link
Contributor Author

famura commented Mar 25, 2024

To quote myself (#1063 (comment)):

I think @michaeldeistler had a strong opinion against increasing the runtime of our CI.

and increasing it to over 30min is quite a lot. I would be in favor of running the slow tests as well as tests on other Python versions somewhere but an improvement to the current state would already be if that means running them only once a day etc.

Yeah sure, I mean my original proposal was to only run them once a PR is merged (not ideal, but minimally invasive from the current state)

@Baschdl
Copy link
Contributor

Baschdl commented Mar 25, 2024

Interesting test fail. I think it is unrelated to this PR. What do you think @Baschdl ?

Quick summary for everyone who doesn't want to look at CI. This test fails for sim_batch_size=1, num_workers=10 but works for all other configurations:

@pytest.mark.slow
@pytest.mark.parametrize("num_workers", [10, -2])
@pytest.mark.parametrize("sim_batch_size", ((1, 10, 100)))
def test_benchmarking_parallel_simulation(sim_batch_size, num_workers):

I also had a look and wanted to open an issue for it. I could imagine multiple reasons for it: overhead of creating 10 workers (on a machine with only 2 cores) is not worth it if we only have 1 sample per batch or problems with running this test with -n auto. I think the first is more likely because all other variants work.

@famura
Copy link
Contributor Author

famura commented Mar 25, 2024

I also had a look and wanted to open an issue for it. I could imagine multiple reasons for it: overhead of creating 10 workers (on a machine with only 2 cores) is not worth it if we only have 1 sample per batch or problems with running this test with -n auto. I think the first is more likely because all other variants work.

It is a bit unintuitve though that it fails for the smallest batch size.
Anyhow, I think one could safely use 2 or 3 workers instead of 10. It proves the same point.

@janfb
Copy link
Contributor

janfb commented Mar 25, 2024

A bit late to the party but here is my input:

  • slow tests used to be really slow, like 8h or so. Therefore, we really wanted to run them separately, usually before every new release.
  • they are much faster now, but I still like the separation and the possibility to add more extensive functional tests without having to worry so much about CI time.
  • Thus, I think we should stick with -m "not slow and not gpu" tests during the "CI" phase
  • But it would be good to run all tests before merging into main, or before each new release at least. Thus, I really like the separation into CI and CD.
  • However, we would still need to test GPU tests locally, right?

@Baschdl
Copy link
Contributor

Baschdl commented Mar 25, 2024

However, we would still need to test GPU tests locally, right?

Yes, they aren't run anywhere currently. We could set up a self-hosted runner with a GPU but we would need to think about long-term maintenance.

@famura
Copy link
Contributor Author

famura commented Mar 26, 2024

  • they are much faster now, but I still like the separation and the possibility to add more extensive functional tests without having to worry so much about CI time.

  • Thus, I think we should stick with -m "not slow and not gpu" tests during the "CI" phase

  • But it would be good to run all tests before merging into main, or before each new release at least. Thus, I really like the separation into CI and CD.

OK, so this PR is almost ready then. Just a few questions / points from my side. I added the verbose -v flag, however, I am not sure if we really want this in the CI tests, since we just care if one fails. The -x (exit on first fail) flag seems to be ineffective with parallelism. Finally, the new workflow which just tests can not be tested before it's merged (quite ironic). The idea here is to run the slow tests manually if desired. I can also make CD run the low non-GPU tests, but this workflow will only happen when a PR is merged, i.e., the damage has been done.

@famura
Copy link
Contributor Author

famura commented Mar 28, 2024

So, what is left to do here? The PR is still missing a review... 🙏 @Baschdl @janfb

Copy link
Contributor

@Baschdl Baschdl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it's done but we should first fix #1111, otherwise we'll get a failing main.

Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks for adding this and discussing it in detail!
One question: The main difference between CI and CD are the slow tests and building the docs, right?
I.e., codecov is checked in both?

I suggest that we merge this now and just see how it works in practice and then adapt. The CI-CD separation gives a lot of flexibility..

@janfb janfb merged commit 6bbb446 into main Apr 3, 2024
5 checks passed
@janfb janfb deleted the feat/split_ci_cd branch April 3, 2024 06:23
@famura
Copy link
Contributor Author

famura commented Apr 3, 2024

The main difference between CI and CD are the slow tests and building the docs, right? I.e., codecov is checked in both?

Yes, but there is also a difference in when they are triggered:

  • CI as soon an ppl are committing on a non-draft PR
  • CD as soon as a PR is merged (actually on every push to main)

Both compute and upload the coverage, but under different (new) names, i.e., codecov-sbi-fast-cpu and codecov-sbi-all-cpu

Moreover, CI runs the OS-PyTorch matrix, while CD only runs the combination Python3.8 and pip's choice for the torch version.

CD also builds the docs to check if that process runs without any errors. However, it does not upload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture Internal changes without API consequences enhancement New feature or request hackathon less-urgent This is beyond the current 2 week horizon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants