Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize DB during image build #457

Merged
merged 34 commits into from
Jun 11, 2024
Merged

Initialize DB during image build #457

merged 34 commits into from
Jun 11, 2024

Conversation

danielhollas
Copy link
Contributor

@danielhollas danielhollas commented May 21, 2024

Speed up container startup by initialized Posgresql DB during image build.

Turns out we were already doing that in the base-with-services image, but this initialized DB had a wrong name, inconsistent with 20_start-postgresql.sh. To prevent this mismatch in the future, we now set PGDATA envvar in the image, which is automatically picked up by initdb and pg_ctl commands. See e.g. https://www.postgresql.org/docs/current/app-initdb.html#APP-INITDB-OPTION-PGDATA

This PR is build on top of #456. Taking these changes all together, the full-stack image now starts up in 9s on my machine.

@danielhollas danielhollas changed the title WIP: Initialize DB during image build Initialize DB during image build May 24, 2024
@danielhollas danielhollas marked this pull request as ready for review May 26, 2024 19:34
@@ -54,22 +54,8 @@ group "default" {
targets = "${TARGETS}"
}

target "base-meta" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated cleanup

@danielhollas danielhollas requested a review from unkcpz May 26, 2024 20:31
@unkcpz
Copy link
Member

unkcpz commented Jun 4, 2024

Are you sure this is the correct way to go? The purpose of initialize DB during the runtime phase is to having the DB in home and therefore in the persistent volume.

@danielhollas
Copy link
Contributor Author

I am not sure what you mean @unkcpz, the DB here is initialized in the home directory. So this should work when using aiidalab-launch and normal Docker. In Kubernetes, you'll have the same issue as with everything else in the home directory.

@unkcpz
Copy link
Member

unkcpz commented Jun 4, 2024

In Kubernetes, you'll have the same issue as with everything else in the home directory.

For everything (not everything but something) else, we do "home prepare" with https://github.com/aiidalab/aiidalab-docker-stack/blob/main/stack/base/before-notebook.d/10_prepare-home-config.sh. This not happened to the DB folder.
Either we adapt the same strategy as we are planing for the qeapp image to have a home some where else and load it back after. Otherwise, with your changes here, the full-stack will fail on k8s deployment.

@danielhollas
Copy link
Contributor Author

For everything (not everything but something) else, we do "home prepare" with https://github.com/aiidalab/aiidalab-docker-stack/blob/main/stack/base/before-notebook.d/10_prepare-home-config.sh. This not happened to the DB folder.

Yes, but that's because of k8s right?

Either we adapt the same strategy as we are planing for the qeapp image to have a home some where else and load it back after. Otherwise, with your changes here, the full-stack will fail on k8s deployment.

I don't think so, I didn't actually changed the logic in stack/base-with-services/before-notebook.d/20_start-postgresql.sh. If the DB folder doesn't exist, it will get created at startup, as was done till now, so k8s should work I believe.

Base automatically changed from speedup-startup to main June 4, 2024 13:24
@unkcpz
Copy link
Member

unkcpz commented Jun 4, 2024

I didn't actually changed the logic in stack/base-with-services/before-notebook.d/20_start-postgresql.sh

I see, I didn't take close look at the PR, sorry. Then all make sense.

Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danielhollas! Sorry for the late review.

The changes are all good, just some minor questions.

Since we are in the place, it is worth to discuss, do we really need a environment aiida-core-services? But for sure it will be another PR if we decide to move the conda env into base.

@@ -74,6 +74,6 @@ jobs:
- name: Set output variables
id: bake_metadata
run: |
.github/workflows/extract-image-names.sh | tee -a "${GITHUB_OUTPUT}"
.github/workflows/extract-image-names.sh | tee -a "${GITHUB_OUTPUT}" | awk -F'=' '{print $2}' | jq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I don't understand, what different it will make?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this just pretty prints the json output so that it is easier to read in the Github Actions logs. Previously it was on a single line, and it was a bit annoying when I needed to copy the image name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it looks now in the amd64 image build

image

@@ -58,4 +65,4 @@ USER ${NB_USER}
WORKDIR "/home/${NB_USER}"

# Initialize the database
RUN mamba run -n aiida-core-services initdb -D aiida_db -U aiida
RUN mamba run -n aiida-core-services initdb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the DB name used when create aiida profile after this change? We set it explicitly in config-quick-setup.yaml as aiida_db.

BTW, we need keep in mind to adapt the change to use verdi presto after we move to 2.6.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. -D parameter doesn't set the db name, it sets the DB location.

https://www.postgresql.org/docs/current/app-initdb.html#APP-INITDB-OPTION-PGDATA

in this case it would set it to ~/aiida_db. But this was broken anyway, since the location in stack/base-with-services/before-notebook.d/20_start-postgresql.sh was set to ~/.postgresql, so essentially this initdb command was being ignored. Now the location is set via the PGDATA variable and thus is guaranteed to be the same everywhere to prevent these kind of mismatches in the future.

Note that the the -U parameter sets the so-called bootstrap superuser I am not sure if this would have any effect, but we are not using this parameter in stack/base-with-services/before-notebook.d/20_start-postgresql.sh so I deleted it here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we need keep in mind to adapt the change to use verdi presto after we move to 2.6.0

I don't think we'll want to use verdi presto since we'd loose some config options. Instead I believe we should migrate from verdi quicksetup to the full verdi profile setup. But that's not that urgent, quicksetup has been deprecated but not removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

@danielhollas
Copy link
Contributor Author

Since we are in the place, it is worth to discuss, do we really need a environment aiida-core-services? But for sure it will be another PR if we decide to move the conda env into base.

Why would you want to move it to the base environment? base environment contains literally everything else so I quite like that we can have at least these services in a separate one. If you disagree, could you please open an issue with your reasoning?

@danielhollas danielhollas requested a review from unkcpz June 10, 2024 11:53
@unkcpz
Copy link
Member

unkcpz commented Jun 11, 2024

Why would you want to move it to the base environment? base environment contains literally everything else so I quite like that we can have at least these services in a separate one.

The main reason is of this mlocate which is in the aiida-core-services environment and required to run pytest. It just over complex to have an extra conda environment that we gain little from it. I open the issue for the discussion #467

Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! All good to me. @danielhollas I'll let you merge and write up for the commit message.

@danielhollas danielhollas merged commit 3208794 into main Jun 11, 2024
27 checks passed
@danielhollas danielhollas deleted the initdb branch June 11, 2024 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants