-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle expiration of transactions between operator checkpoint and global checkpoint commit #5
Comments
I addition to all of the above I can imagine several possible solutions:
|
For applications like Flink that guarantees the triggering of For example, during the Going by the current defaults, the default transaction timeout will be 10 Seconds times the multiplier 1000 i.e., 10 seconds * 1000 = 10000 seconds (~2.7 Hours) or a max of 24 Hours (if custom configuration value exceeds the default minimum) according to this calculation. So it is likely that we may not hit the timeout issue but still a possibility. |
Problem description
The
FlinkExactlyOncePravegaWriter
works similar to a 2-PC protocol:That model assumes that transactions can be committed once the checkpoint's completeness notification is received (step 3). If a transaction times out between step (2) and step (3), there will be data loss.
This is an inherent fragility that seems hard to circumvent with the current primitive, and has to do with transaction timeouts. Given sufficiently long timeouts, this may never be a problem in most setups, but it is not nice to have this weak point in the long run.
Problem location
The problem is in the use of Pravega Transactions in the
io.pravega.connectors.flink.FlinkExactlyOncePravegaWriter
.Suggestions for an improvement
From the top of my head, there are three types of solutions to that issue:
Pravega offers a pre-commit / full-commit distinction in transactions, where a pre-commit means the transaction becomes immutable and the usual timeouts do not apply (except possibly a garbage prevention-timeout which could be very long). The full commit publishes the temporary segment.
Flink makes sure it can recover transactions, for example by persisting the data as well in the checkpoints. If a transaction timed out, it can open a new transaction and re-write the data. Disadvantage is that effectively, everything is persisted in two distinct system (each with its own durability/replication).
Flink adds a way that lets tasks complete a checkpoint completely independent of other tasks. Then the transaction could be immediately committed on trigger checkpoint. That would require to guarantee that the sinks would never be affected by a recovery of other tasks. To keep the current consistency guarantees, this would require persisting the result from the input to the sink operation (similar as for example Samza persists every shuffle), which is in some sense not too different from approach (2).
The text was updated successfully, but these errors were encountered: