You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello everyone. At first i want to thank you guys for your cool postgresql HA solutions and k8s operator. Unfortunately we have an issue with restoring(cloning) from wal-e backups in GCS.
2023-10-10 16:24:59,174 INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-10-10 16:24:59,180 INFO: Lock owner: None; I am test
2023-10-10 16:24:59,213 INFO: trying to bootstrap a new cluster
2023-10-10 16:24:59,213 INFO: Running custom bootstrap script: envdir "/run/etc/wal-e.d/env-clone-prod" python3 /scripts/clone_with_wale.py --recovery-target-time="2023-10-05T18:06:52+00:00"
2023-10-10 16:24:59,422 INFO: Trying gs://somebucket/spilo/prod/<UID>/wal/15/ for clone
wal_e.main INFO MSG: starting WAL-E
DETAIL: The subcommand is "backup-list".
STRUCTURED: time=2023-10-10T16:24:59.724227-00 pid=204
2023-10-10 16:25:00,304 ERROR: Clone failed
Traceback (most recent call last):
File "/scripts/clone_with_wale.py", line 185, in main
run_clone_from_s3(options)
File "/scripts/clone_with_wale.py", line 166, in run_clone_from_s3
backup_name, update_envdir = find_backup(options.recovery_target_time, env)
File "/scripts/clone_with_wale.py", line 153, in find_backup
backup = choose_backup(backup_list, recovery_target_time)
File "/scripts/clone_with_wale.py", line 74, in choose_backup
if last_modified < recovery_target_time:
TypeError: can't compare offset-naive and offset-aware datetimes
We analyzed the source code of spilo a bit and found the route cause.
So script clone_with_wale.py executes wal-e backup-listcommand and tries to parse the output to get the timestamp. The output is returned in format
So timestamp here should be 2021-06-23 01:00:14.498000+00:00 but only the first part (2021-06-23) of the timestamp is used when being compared to the recovery timestamp. Because of this an error happens
TypeError: can't compare offset-naive and offset-aware datetimes
We fixed this issue by making a custom image of spilo and applying this patch
diff --git a/postgres-appliance/bootstrap/clone_with_wale.py b/postgres-appliance/bootstrap/clone_with_wale.py
index e8d3196..e6c6b12 100755
--- a/postgres-appliance/bootstrap/clone_with_wale.py
+++ b/postgres-appliance/bootstrap/clone_with_wale.py
@@ -62,7 +62,7 @@ def fix_output(output):
if started:
line = line.replace(' modified ', ' last_modified ')
if started:
- yield '\t'.join(line.split())
+ yield '\t'.join(line.split('\t'))
def choose_backup(backup_list, recovery_target_time):
We can make a PR to fix it the issue in original image but we are not sure that
this repo is still maintained
this will not brake the s3 wal-e backups (i guess there should be tests in CI that check that)
The text was updated successfully, but these errors were encountered:
Hello everyone. At first i want to thank you guys for your cool postgresql HA solutions and k8s operator. Unfortunately we have an issue with restoring(cloning) from wal-e backups in GCS.
Environment
Spilo image - ghcr.io/zalando/spilo-15:3.0-p1
Postgres operator - registry.opensource.zalan.do/acid/postgres-operator:v1.10.1
Postgres crd
When container starts it has this errors in logs
We analyzed the source code of spilo a bit and found the route cause.
So script clone_with_wale.py executes
wal-e backup-list
command and tries to parse the output to get the timestamp. The output is returned in formatSo timestamp here should be
2021-06-23 01:00:14.498000+00:00
but only the first part (2021-06-23
) of the timestamp is used when being compared to the recovery timestamp. Because of this an error happensWe fixed this issue by making a custom image of spilo and applying this patch
We can make a PR to fix it the issue in original image but we are not sure that
The text was updated successfully, but these errors were encountered: