Set up a database at CSD3 #2

nsevilla · 2024-10-29T11:26:21Z

According to George Beckett:

CSD3 recommend we set up a PostgreSQL database in a service call ARCAS: this is a cloud platform that sits next to CSD3 (I think it is in the same racks). The only limitation they note is that ARCAS VMs cannot mount the (Lustre) work file system of CSD3. I'm assuming this isn't an issue for our case?
Another option, which we have used when running on CSD3, is to use an SQLite database, though this may/ may not be capable enough for what you have planned.
10:17
Assuming the ARCAS platform sounds suitable, could you let me know roughly what specs we would need for the database (number of cores, memory, and disk space).

nsevilla · 2024-11-07T13:01:52Z

After some discussion, we think that the PostgresSQL database to be set up at the ARCAS cloud system does not need to see the data directly.

Jim: I’m not sure that the db itself needs to be able to access the files on disk. The LSST software that queries the db does certainly.
George: Yes, I wasn't sure if PostgreSQL had an API that let it source table data from a remote file system so, perhaps, the key is that the Pipeline (running on CSD3 compute nodes) can see both the PostgreSQL database and the Lustre file system and mediates between the two in terms of writing metadata to the Butler registry or reading metadata from the Butler registry?
Heather: If Jim is correct (and I believe he is) - then the LSST Sci Pipelines code would handle that mediation. The Postgres db is just a registry that provides the necessary information for the LSST Sci Pipelines to find the data in the "data store".

heather999 · 2024-11-12T16:01:00Z

Jim reminded us we should try to use Postgresql 16 to match the version Rubin is using

heather999 · 2024-11-26T21:43:31Z

dropping some notes here
Rubin seems to be on postgres 16.3, I noticed some chatter about potentially moving to 17 due to the desire for improved support for logical replication. Going to ignore that for now and stick with postgres 16

One extension is needed: CREATE EXTENSION IF NOT EXISTS btree_gist;

Once set up, we run nightly pgdumps to backup to database contents.

We slightly update the postgres configuration at NERSC on Rancher2 using a ConfigMap, these settings may be too conservative given the resources we may have available here, but just noting what we have for now:
max_connections = 500
listen_addresses = '*'
shared_buffers = 1GB
temp_buffers = 16MB
work_mem = 16MB
effective_cache_size = 4GB
maintenance_work_mem = 1GB
dynamic_shared_memory_type = posix
effective_io_concurrency = 2
checkpoint_timeout = 15min
max_wal_size = 1GB
min_wal_size = 80MB
autovacuum = on
autovacuum_naptime = 1h
autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 250
datestyle = 'iso, mdy'
timezone = 'US/Pacific'
lc_messages = 'en_US.utf8'
lc_monetary = 'en_US.utf8'
lc_numeric = 'en_US.utf8'
lc_time = 'en_US.utf8'
default_text_search_config = 'pg_catalog.english'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up a database at CSD3 #2

Set up a database at CSD3 #2

nsevilla commented Oct 29, 2024

nsevilla commented Nov 7, 2024

heather999 commented Nov 12, 2024

heather999 commented Nov 26, 2024

Set up a database at CSD3 #2

Set up a database at CSD3 #2

Comments

nsevilla commented Oct 29, 2024

nsevilla commented Nov 7, 2024

heather999 commented Nov 12, 2024

heather999 commented Nov 26, 2024