Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] VSS backup runs but does not update the backup log. Backup stays pending #8

Open
PeterStrackx opened this issue May 30, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@PeterStrackx
Copy link

HCL Product Version
V12.0.2FP1

Describe the bug
Snapshot backup runs but stays in "pending"state. In the log.nsf I see that the job is fully completed but that backup result log can not be updated.

The first 3 times it worked. Since the last 5 days we receive the error.

Error Message
30/05/2023 12:24:33 Backup: Domino Database Backup

30/05/2023 12:24:33 Backup: Started

30/05/2023 12:24:33 Backup: Pruning backups

30/05/2023 12:24:33 Backup: BackupNode: [NSADI2], BackupName: [default], Translog Mode: [CIRCULAR], Backup Mode: [SNAP]

30/05/2023 12:24:33 Backup: LastBackupTime: 30/05/2023 10:01:23

30/05/2023 12:24:33 BackupVSS: Domino Agent VSS Freeze Event received

30/05/2023 12:24:33 Backup: Starting backup for 333 database(s) [SnapShotMode]

30/05/2023 12:24:35 Backup: Ready for VSS Snapshot

30/05/2023 12:24:35 BackupVSS: DominoVSSWriter::OnFreeze Domino in Freeze Mode after 1 seconds!

30/05/2023 12:24:35 Backup: VSS Post Snapshot status reached after 0 seconds!

30/05/2023 12:24:35 Backup: Capturing changes in 333 database(s) [SnapShotMode]

30/05/2023 12:24:35 BackupVSS: Sending unfreeze to Domino Backup
30/05/2023 12:24:35 Backup: VSS Snapshot Backup finalized

30/05/2023 12:24:36 BackupVSS: DominoVSSWriter::OnPostSnapshot Post backup changes applied after 0 seconds.
30/05/2023 12:24:37 BackupVSS: DominoVSSWriter::OnBackupComplete Updating backup completed time

30/05/2023 12:24:37 BackupVSSCompletedTimeUpdate: Error opening backup result log document: Entry not found in index
30/05/2023 12:24:37 BackupVSS: Cannot update backup result log with SnapshotBackupCompletedTime!: Entry not found in index
30/05/2023 12:24:42 Backup:

30/05/2023 12:24:42 Backup: --- Backup Summary ---

30/05/2023 12:24:42 Backup: Previous Backup : 30/05/2023 10:01:23

30/05/2023 12:24:42 Backup: Start Time : 30/05/2023 12:24:33

30/05/2023 12:24:42 Backup: End Time : 30/05/2023 12:24:35

30/05/2023 12:24:42 Backup: Runtime : 00:00:02.578

30/05/2023 12:24:42 Backup:

30/05/2023 12:24:42 Backup: All : 333

30/05/2023 12:24:42 Backup: Processed : 225

30/05/2023 12:24:42 Backup: Excluded : 108

30/05/2023 12:24:42 Backup: Pending Compact : 0

30/05/2023 12:24:42 Backup: Compact Retries : 0

30/05/2023 12:24:42 Backup: Backup Errors : 0

30/05/2023 12:24:42 Backup: Not Modified : 0

30/05/2023 12:24:42 Backup: Delta Files : 0

30/05/2023 12:24:42 Backup: Delta applied : 0

30/05/2023 12:24:42 Backup:

30/05/2023 12:24:42 Backup: Total DB Size : 85,6 GB

30/05/2023 12:24:42 Backup: Total DeltaSize : 0,0 Bytes

30/05/2023 12:24:42 Backup: Data Rate : 33.999,7 MB/sec

30/05/2023 12:24:42 Backup: --- Backup Summary ---

30/05/2023 12:24:42 Backup:

30/05/2023 12:24:42 Backup: Finished

To Reproduce
Steps to reproduce the behavior:
Run the VSS backup :-). We use Rubrik backup

Screenshots
image

@PeterStrackx PeterStrackx added the bug Something isn't working label May 30, 2023
@Daniel-Nashed
Copy link
Collaborator

Thanks for this issue. We looked into the logs and they are very helpful to understand what is going on.

The underlying problem is a timing issue. The backup is completed, before the Domino side completes the remaining backup operations.
In this case the time is around 6 seconds.

The log document for the backup is saved when the backup on the Domino side completes.
Saving it earlier would not help, because it would cause a replication/save conflict.

We think the best way would be to let backupvss wait until the document with the UNID stored in notes.ini is created.

There is no work-around from backupvss side and it is not clear why it needs 3 seconds to complete the backup operations.
In any case a 1 second backup complete after the snapshot operations is problematic from timing point of view.

We are looking into this and have a potential work-around. Can you check with your backup provider if there is a way to delay the backup complete for a couple of seconds.

Here is the SPR for reference:
SPR #DNADCSBHWZ Domino Backup VSS backup end date is not updated if snapshot is too fast

Can you please open a ticket and reference this issue and the SPR.

Also what I noticed is that you are excluding databases. In a snapshot backup all databases are included in the snapshot anyhow. Excluding databases is not really helpful and could also delay operations. But I don't see a direct root cause here.
I just want to understand why your are excluding databases. The databases will not show up in the inventory and you can't restore them. But they take up space in the snapshot.

@PeterStrackx
Copy link
Author

PeterStrackx commented May 31, 2023 via email

@Daniel-Nashed
Copy link
Collaborator

The VSS operation is complex on the Domino side, but we have an idea.
Probably we will let backupvss wait when the UNID for the log document is not yet found.
A delay for like up to 1 minute should be fine to get the timing right.

The exclusions make sense in your case. I was just surprised.
Because backup is critical Domino backup checks in snapshot mode if the databases are on the same disk and reports an error.
Great to hear you are getting the error message and linked it to the data being on another disk.
The check also knows about junctions. not just directory and nsf links.

IBM Spectrum protect is supported by Domino 12.0.2.
You mean probably the customer dropped to support it?

For archives a solution would be to have a separate archive server.
Many customers use clusters for active data and archive servers for the long term, slow moving data.

Also if you have most of the data in DAOS for example the backup should be incremental.
A DAOS store is a simple open file backup. Or snapshot.

Let us know what you find out about the backup and let us know about your ticket number.
The SPR references your GitHub issue.

-- Daniel

@PeterStrackx
Copy link
Author

PeterStrackx commented May 31, 2023 via email

@Daniel-Nashed
Copy link
Collaborator

Great to hear you liked the session and the information. Backup is a complex topic and it really starts with storage optimization.
The backup is completed only the final backup time is not stored. So it should all still work.

Still we want to fix it and will look into the SPR and the ticket.

Thanks for your detailed info, which helped us to understand the timing issue.

@Daniel-Nashed
Copy link
Collaborator

@PeterStrackx looks like we found a solution. It's currently under test and the plan is to submit it to the next feature release and also include it in 14.0 FP1.

It turned out that registering components does not tell VSS to not call the Domino VSS Writer instance for a freeze.
But we found a way to get the information about the volumes which are included in the snapshot.

This allows the Domino VSS Writer to "ignore" the VSS snapshot request if the volume the Domino databases are located, is not requested.

If really needed we could check if the SPR can be also back-ported to 12.0.2 FP4.
For an official schedule for fixpacks check the official fixlist database.
Both fixpacks will take a while to ship, because 14.0 and 12.0.2 FP3 have just been released.

-- Daniel

@Daniel-Nashed
Copy link
Collaborator

There is an update from the Rubrik. We are working with them directly.
It turned out that they are not creating auto recovery snapshots, but the team is working on a solution.
For a full integration on Windows auto recovery snapshots are highly recommended. Else you would create a snapshot plus have to store delta files on a different backup.

The data in the snapshot would not be consistent. So Auto Recovery snapshots are really what we want.

Domino 14.0 FP1 Backup had a regression that fix to only check for the requested Domino volume to process the VSS operations have been case-sensitive where a related Windows API was not always giving back the same casing.

If you are running on 14.0 please install the latest FP and IF, which is currently Domino 14.0 FP2IF1.

Still for Rubrik support you will need to wait for their VSS changes.

I will update this post once we have all details and a solution.

There is a integration on the way for automated restore which will be fully published once this VSS backup integration limitation has been addressed.

Thanks

Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants