Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up a database at CSD3 #2

Open
nsevilla opened this issue Oct 29, 2024 · 3 comments
Open

Set up a database at CSD3 #2

nsevilla opened this issue Oct 29, 2024 · 3 comments

Comments

@nsevilla
Copy link
Contributor

According to George Beckett:

CSD3 recommend we set up a PostgreSQL database in a service call ARCAS: this is a cloud platform that sits next to CSD3 (I think it is in the same racks). The only limitation they note is that ARCAS VMs cannot mount the (Lustre) work file system of CSD3. I'm assuming this isn't an issue for our case?
Another option, which we have used when running on CSD3, is to use an SQLite database, though this may/ may not be capable enough for what you have planned.
10:17
Assuming the ARCAS platform sounds suitable, could you let me know roughly what specs we would need for the database (number of cores, memory, and disk space).

@nsevilla
Copy link
Contributor Author

nsevilla commented Nov 7, 2024

After some discussion, we think that the PostgresSQL database to be set up at the ARCAS cloud system does not need to see the data directly.

Jim: I’m not sure that the db itself needs to be able to access the files on disk. The LSST software that queries the db does certainly.
George: Yes, I wasn't sure if PostgreSQL had an API that let it source table data from a remote file system so, perhaps, the key is that the Pipeline (running on CSD3 compute nodes) can see both the PostgreSQL database and the Lustre file system and mediates between the two in terms of writing metadata to the Butler registry or reading metadata from the Butler registry?
Heather: If Jim is correct (and I believe he is) - then the LSST Sci Pipelines code would handle that mediation. The Postgres db is just a registry that provides the necessary information for the LSST Sci Pipelines to find the data in the "data store".

@heather999
Copy link

Jim reminded us we should try to use Postgresql 16 to match the version Rubin is using

@heather999
Copy link

dropping some notes here
Rubin seems to be on postgres 16.3, I noticed some chatter about potentially moving to 17 due to the desire for improved support for logical replication. Going to ignore that for now and stick with postgres 16

One extension is needed: CREATE EXTENSION IF NOT EXISTS btree_gist;

Once set up, we run nightly pgdumps to backup to database contents.

We slightly update the postgres configuration at NERSC on Rancher2 using a ConfigMap, these settings may be too conservative given the resources we may have available here, but just noting what we have for now:
max_connections = 500
listen_addresses = '*'
shared_buffers = 1GB
temp_buffers = 16MB
work_mem = 16MB
effective_cache_size = 4GB
maintenance_work_mem = 1GB
dynamic_shared_memory_type = posix
effective_io_concurrency = 2
checkpoint_timeout = 15min
max_wal_size = 1GB
min_wal_size = 80MB
autovacuum = on
autovacuum_naptime = 1h
autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 250
datestyle = 'iso, mdy'
timezone = 'US/Pacific'
lc_messages = 'en_US.utf8'
lc_monetary = 'en_US.utf8'
lc_numeric = 'en_US.utf8'
lc_time = 'en_US.utf8'
default_text_search_config = 'pg_catalog.english'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants