Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk Data Model supports service event indices #6744

Open
wants to merge 15 commits into
base: feature/efm-recovery
Choose a base branch
from

Conversation

jordanschalm
Copy link
Member

@jordanschalm jordanschalm commented Nov 20, 2024

This PR adds support for specifying which service events were emitted in which chunk, by modifying the ChunkBody data structure in a backward compatible manner. Addresses #6622.

Changes

  • Adds ServiceEventIndices field to ChunkBody:
    • This field creates an explicit mapping, committed to by Execution Nodes, of service event to chunk. This allows Verification Nodes to know which service events to expect when validating any chunk.
    • This field is defined to be backward-compatible with prior data model versions:
      • Existing serializations of ChunkBody will unmarshal into a struct with a nil ServiceEventIndices. We define a chunk with nil ServiceEventIndices to have the same semantics as before the field existed: if any service events were emitted, then they were emitted from the system chunk.
      • Post-upgrade, all new (honest) serializations of ChunkBody will always have non-nil ServiceEventIndices

Upgrade Notes

Currently, no service events are emitted outside of the system chunk. Let's call the first block at which this happens block X.

  • The upgrade must be completed across all Verification and Execution Nodes before block X.
  • Prior to block X, the upgrade can be done using a rolling upgrade.

This PR replaces two prior approaches, implemented in part #6629 and #6730.

@codecov-commenter
Copy link

codecov-commenter commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 70.14925% with 20 lines in your changes missing coverage. Please review.

Project coverage is 41.73%. Comparing base (d0c9695) to head (718a5ee).

Files with missing lines Patch % Lines
utils/slices/slices.go 0.00% 9 Missing ⚠️
utils/unittest/encoding.go 0.00% 5 Missing ⚠️
model/flow/chunk.go 81.25% 2 Missing and 1 partial ⚠️
module/chunks/chunkVerifier.go 57.14% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@                   Coverage Diff                    @@
##           feature/efm-recovery    #6744      +/-   ##
========================================================
+ Coverage                 41.70%   41.73%   +0.02%     
========================================================
  Files                      2030     2031       +1     
  Lines                    180462   180519      +57     
========================================================
+ Hits                      75261    75331      +70     
+ Misses                    99003    98996       -7     
+ Partials                   6198     6192       -6     
Flag Coverage Δ
unittests 41.73% <70.14%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@jordanschalm jordanschalm changed the title DRAFT: Chunk Data Model supports service event indices Chunk Data Model supports service event indices Nov 21, 2024
@AlexHentschel
Copy link
Member

strategic / conceptual thoughts

  1. I am thinking about enforcing activation of the protocol extension (specifically, the usage of the Chunk.ServiceEventIndices field). For example, it would be nice if consensus nodes could drop Execution Results with the deprecated format and avoid incorporating them into a block (and rejecting blocks that still do).

  2. To ensure consensus incorporates only execution receipts following the new convention after a certain height, it would be great if we could include some consistency check also in the receiptValidator (somewhere around here).

  3. I was wondering if your plan is still to remove Chunk.ServiceEventIndices field for Byzantine Fault Tolerance in the long term? I think we had talked about turning ExecutionResult.ServiceEventList into an indexed list. Or are you thinking about keeping the ServiceEventIndices field as the long-term solution -- just with removing the backwards-compatibility case?

    In general, my preference would be to also allow Chunk.ServiceEventIndices = nil when there are no service events generated in the chunk. Thereby the final solution becomes a lot more intuitive:

    • if ExecutionResult.ServiceEvents is not empty (nil allowed), then the new convention requires for consistency:

      $$\sum_\texttt{Chunk} \texttt{len}(Chunk.ServiceEventIndices) = \texttt{len}(\texttt{ExecutionResult.ServiceEventList})\qquad\qquad\qquad\qquad(1) $$

      I feel is check is very similar to other stuff, which consensus nodes already verify about an Execution Receipt (see ReceiptValidator implementation)

    • We would temporary relax the new convention, eq. $(1)$, for downwards compatibility as follows: the ServiceEventIndices fields of all chunks can be nil despite there being service events.

    • As service events are rare, ExecutionResult.ServiceEventList is empty in the majority cases. Then both the deprecated as well as the new conventions would allow ChunkBody.ServiceEventIndices to be nil (which is the most intuitive convention anyway). Also, for individual chunks that don't produce any service events, their ChunkBody.ServiceEventIndices could be nil or empty according to the new convention. Then, also the new convention is very self-consistent in my opinion - and the depreciation condition is only an add on that can be later removed.

I mean is this is only a temporary solution, I am happy and we can skip most of question 3.

@jordanschalm
Copy link
Member Author

jordanschalm commented Nov 25, 2024

Summary of Discussion with @AlexHentschel

Change of ServiceEventIndices structure

  • Replaces list with ServiceEventsNumber, a uint16 which is a count of the number of service events emitted in that chunk
    • Since VNs have access to all chunks, they can easily compute the index range based on this field alone
    • This is a more compact, and fixed-size representation
    • Structural validation is simpler, we require only that sum(chunk.ServiceEventsNumber for chunk in chunks) == len(ServiceEvents)
  • Backward compatibility:
    • If any service events are emitted in a result, and all ServiceEventsNumber fields are 0, then this is interpreted as a v0 model version: all service events must have been emitted in the system chunk.

Removing backward-compatibility at next spork

We plan to keep the overall structure as the long term solution, and only remove the backward-compatibility support at the next spork. We do not plan to use a different representation (ie. Chunk.ServiceEventIndices field).

Upgrade Comments

  • Protocol HCUs are triggered by a ProtocolVersionUpgade service event. This service event is emitted outside the system chunk, meaning that the first Protocol HCU must take place after the service event validation fix has been deployed.
  • We plan to incorporate all changes under feature/efm-recovery in one Protocol HCU.

Rough Outline of Process

  1. Do a manual rolling upgrade of all node roles, to a version including feature/efm-recovery.
  2. Manually verify that all ENs are upgraded (this is the only correctness-critical step!)
    • this is necessary prior to emitting the first ex-system-chunk service event
    • ideally, VNs are also updated, but we can rely on emergency sealing as a fallback if necessary
  3. Emit ProtocolVersionUpgrade service event, scheduling the protocol upgrade at view V.
    • Nodes which are not upgraded when we enter view V will halt.
    • We must have a substantial majority (>> supermajority) of SNs and LNs for the network to remain live, before entering view V (this is the only liveness-critical step!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants