Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for yielded execution #519

Merged
merged 33 commits into from
Jun 18, 2024
Merged

Support for yielded execution #519

merged 33 commits into from
Jun 18, 2024

Conversation

akhi3030
Copy link
Contributor

No description provided.

Copy link

render bot commented Nov 17, 2023

@akhi3030 akhi3030 mentioned this pull request Nov 17, 2023
@frol frol added WG-protocol Protocol Standards Work Group should be accountable S-draft/needs-author-revision A NEP in the DRAFT stage that needs an author revision. A-NEP A NEAR Enhancement Proposal (NEP). labels Nov 17, 2023
Copy link
Contributor

@DavidM-D DavidM-D left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a really good start and I'm broadly happy with it in it's current form. The biggest thing I'd change is the changing promise ID each time resume is called, everything else is nits.

/// either respond to the caller or create another promise.
///
/// If the contract fails to call `yield_resume()` within `yield_num_blocks`,
/// then the protocol will call the method on the smart contract that is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// then the protocol will call the method on the smart contract that is
/// then the protocol may call the method on the smart contract that is

I don't think this needs to be required in the protocol, rather it's an option the protocol has.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I would've thought that this must happen at some point otherwise the smart contract might leak memory. If the contract created some state and then called yield_create which it then wanted to free in the callback, that might never happen.

Also I see this as analogous to the case where contract A calls B and B fails to respond to A. Then the protocol generates a response for A.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I agree with this, it makes life easier for contract developers if there is always a response eventually.

/// `method_name_len` and `method_name_ptr` and this method will be expected to
/// either respond to the caller or create another promise.
///
/// If the contract fails to call `yield_resume()` within `yield_num_blocks`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If the contract fails to call `yield_resume()` within `yield_num_blocks`,
/// If the contract fails to, in a `yield_resume()`, call `value_return`, `panic` or `abort` within `yield_num_blocks`,

This makes the API much more flexible, but does it make it harder to implement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood the idea properly, then I think we need to think about the semantics some more. Let's say the following has happened:

  • contract A called the signer to get a signature and the signer has called yield create.
  • contract B called the signer to get a signature and the signer has called yield create.
  • now some other contract C calls the signer and the signer panics, which one of the above two outstanding executions should we terminate? I don't think we can really pick between them so the only sensible choice seems to be to terminate both. It doesn't seem like that that is a good choice either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if contract C calls the signer and that call panics before it calls yield_create then the call from contract C should panic and none of the others.

This refers to a situation where contract D calls yield_create, the signer calls yield_resume and then the resumed call panics, ending contract D's suspended call. It shouldn't effect any other outstanding calls.

neps/nep-519-yield-execution.md Outdated Show resolved Hide resolved
neps/nep-519-yield-execution.md Show resolved Hide resolved
Copy link
Collaborator

@bowenwang1996 bowenwang1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akhi3030 @saketh-are I'd like to see an end-to-end example of how this is going to be used in a smart contract. While there is some high-level description on the execution flow, I think some important details are missing. For example, I don't think the API works well as is because the caller of the yield_resume is expected to pass in a receipt id, not a promise index, whereas for runtime, it needs the index instead of receipt id. Also, it is not clear from the existing API how an argument would be passed to the callback, which is crucial for chain signatures

Copy link
Contributor Author

@akhi3030 akhi3030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @DavidM-D. I've added some comments. I am going to make some changes to the NEP to remove the concept of yield_num_blocks. Unfortunately, given the crummy github UI, it might become challenging to continue the comment threads after the changes. We will have to stumble along somehow.

/// either respond to the caller or create another promise.
///
/// If the contract fails to call `yield_resume()` within `yield_num_blocks`,
/// then the protocol will call the method on the smart contract that is
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I would've thought that this must happen at some point otherwise the smart contract might leak memory. If the contract created some state and then called yield_create which it then wanted to free in the callback, that might never happen.

Also I see this as analogous to the case where contract A calls B and B fails to respond to A. Then the protocol generates a response for A.

neps/nep-519-yield-execution.md Show resolved Hide resolved
/// `method_name_len` and `method_name_ptr` and this method will be expected to
/// either respond to the caller or create another promise.
///
/// If the contract fails to call `yield_resume()` within `yield_num_blocks`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood the idea properly, then I think we need to think about the semantics some more. Let's say the following has happened:

  • contract A called the signer to get a signature and the signer has called yield create.
  • contract B called the signer to get a signature and the signer has called yield create.
  • now some other contract C calls the signer and the signer panics, which one of the above two outstanding executions should we terminate? I don't think we can really pick between them so the only sensible choice seems to be to terminate both. It doesn't seem like that that is a good choice either.

neps/nep-519-yield-execution.md Outdated Show resolved Hide resolved
@bowenwang1996
Copy link
Collaborator

At the protocol working group meeting yesterday, different members expressed concerns on these new runtime APIs. I'll attempt the summarize them below:

  • There is nothing that stops someone from emulating the API in the sdk by saving the promise in the state and triggering it again once it is ready to be resumed. The call stack is discontiguous but that could potentially be fixed on the tooling level and it seems that Aurora already does this.
  • Whether or not the delayed promise has an expiration time, the introduction of this API could lead to a new type of failure that does not previously exist where the contract execution fails with no way to clean up the state. It could be fine for the chain signature use case, but if some (potentially malicious) developer uses this in a defi contract for oracles (for example), then it is possible that the contract call stalls/fails in the middle, which results in users losing access to their fund.
  • The API does not specify who can call "resume". This means that a malicious attacker can just call resume with invalid data to make a transaction fail. While the attacker does need to spend gas themselves, it is quite cheap. This allows for a relatively easy DoS attack, especially given that the gas that a user needs to spend may be larger than what an attacker needs to spend to cancel the transaction. In addition, failure in a chain of calls A -> B -> C may also lead to inconsistent state in one of the smart contract involved.

cc @mm-near @birchmd @mfornet

@akhi3030
Copy link
Contributor Author

akhi3030 commented Jan 8, 2024

* There is nothing that stops someone from emulating the API in the sdk by saving the promise in the state and triggering it again once it is ready to be resumed. The call stack is discontiguous but that could potentially be fixed on the tooling level and it seems that Aurora already does this.

This main motivation for this work is the MPC project. If we can satisfy the requirements without having to change the protocol, that much better.

Can you provide some links to how aurora does this? Maybe then @DavidM-D and @itegulov can review them.

* Whether or not the delayed promise has an expiration time, the introduction of this API could lead to a new type of failure that does not previously exist where the contract execution fails with no way to clean up the state. It could be fine for the chain signature use case, but if some (potentially malicious) developer uses this in a defi contract for oracles (for example), then it is possible that the contract call stalls/fails in the middle, which results in users losing access to their fund.

I am not sure I completely understand this point. Especially about not having a way to clean up the state. If things expire, the protocol will call the contract. So is the contract may not handle that case properly? Isn't that similar to how the current Promise API works?

* The API does not specify who can call "resume". This means that a malicious attacker can just call resume with invalid data to make a transaction fail. While the attacker does need to spend gas themselves, it is quite cheap. This allows for a relatively easy DoS attack, especially given that the gas that a user needs to spend may be larger than what an attacker needs to spend to cancel the transaction. 

If contract A called yield_create, only contract A can call resume on that promise. No other contract can.

In addition, failure in a chain of calls A -> B -> C may also lead to inconsistent state in one of the smart contract involved.

Can you please expand on this? If A is calling B, then B can fail for any number of reasons today, (it can trap, run out of gas, have invalid state). Is the problem that we are introducing a new failure mode now that B has to handle?

@bowenwang1996
Copy link
Collaborator

I am not sure I completely understand this point. Especially about not having a way to clean up the state. If things expire, the protocol will call the contract. So is the contract may not handle that case properly? Isn't that similar to how the current Promise API works?

I misread the proposal and therefore thought incorrectly that the execution would just fail at that point.

If contract A called yield_create, only contract A can call resume on that promise. No other contract can.

Yes, I guess the concern here is that a contract developer can shoot themselves in the foot if they are not careful with how they valid the computation result that is passed to yield_resume because while only the contract itself can resume the computation. Anyone can call the method (signature_available in this case) that provides the computation result. If there is no proper validation of the result, then a malicious attacker can easily submit something that causes the original transaction to fail. I don't think this is a concern for MPC signature per se since that contract will be carefully written and audited, but still worth noting for the general use case.

@akhi3030
Copy link
Contributor Author

Yes, I guess the concern here is that a contract developer can shoot themselves in the foot if they are not careful with how they valid the computation result that is passed to yield_resume because while only the contract itself can resume the computation. Anyone can call the method (signature_available in this case) that provides the computation result. If there is no proper validation of the result, then a malicious attacker can easily submit something that causes the original transaction to fail. I don't think this is a concern for MPC signature per se since that contract will be carefully written and audited, but still worth noting for the general use case.

Yes, this is always possible. Luckily, we have some nice tools (https://github.com/aurora-is-near/near-plugins/blob/master/near-plugins/src/access_controllable.rs) available that could be used.

/// should be resumed.
///
/// `payload_len` and `payload_ptr`: the smart contract can provide an optional
/// list of arguments that should be passed to the method that will be resumed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is not really a list of arguments from the yield/resume perspective, just an arbitrary byte sequence. The rust sdk makes it easy to jsonify a tuple of objects and parse them back out, but that is not relevant at the level of the host function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully improved now.


## Summary

Today, when a smart contract is called by a user or another contract, it has no sensible way to delay responding to the caller. There exist use cases where contracts would benefit from being able to delay responding till some arbitrary time in the future. This proposal introduces such possibility into the NEAR protocol.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

till some arbitrary time

This feels potentially misleading to me given that we enforce a timeout. Could it be clearer to say something like "await a future transaction" rather than "delay responding"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully improved now.


Examples include when a smart contract (`S`) provides MPC signing capabilities parties external to the NEAR protocol are computing the signature. The rough steps are:
1. Signer contract provides a function `fn sign_payload(Payload, ...)`.
2. When called, the contract updates some contract state which is being monitored by external indexers to indicate that a new signing request has been received. It also defers replying to the caller.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice I think the signers will directly index the transactions on the account and observe the receipts generated by yield (no need for any storage in contract state). But perhaps it is easier to explain things with this example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully improved now. I'll add the example in a subsequent commit.

@akhi3030
Copy link
Contributor Author

One thing I have a hard time wrapping my head around it the necessity for the "Reference Implementation" section

I agree, I also found it a bit difficult to figure out how much details to present here. Inspired by this NEP I think it is fine to just provide a link to the PR. I will make the change to this PR now. And let the SMEs decide if they are happy with the change or not.

@akhi3030
Copy link
Contributor Author

@birchmd, @nagisa: thank you very much for the review. I believe I have addressed all the feedback that you provided. There is an ongoing discussion on the edge case around when a timeout has happened but the callback has not executed and the contract calls resume. Once we resolve the discussion, I will update the spec. But this should be a minor change. So I believe that this should be ready for you to look once again.

@victorchimakanu
Copy link

victorchimakanu commented Feb 29, 2024

NEP Status (Updated by NEP Moderators)

Status: Approved

SME reviews:

Protocol Work Group voting indications (❔ | 👍 | 👎 ):

Copy link
Contributor

@birchmd birchmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for updating the NEP @akhi3030 ! It looks good to me now.

As both an SME and working group member I approve this NEP and look forward to seeing the feature go live on testnet in the future.

@victorchimakanu
Copy link

As the moderator, I would like to thank the author @akhi3030 for submitting this NEP extension, and the work group members for your review. Based on the voting indications above, it seems like this proposal is close to a decision. We are therefore scheduling the 8th Protocol Work group meeting next week.

Anyone can discuss the technical details by adding your comments to this discussion. And join the call to learn about the final decision and how to get more involved.

Meeting Info:

📅 Date: Friday, March 8th, 4 PM UTC
✏️ Register here

@bowenwang1996
Copy link
Collaborator

As a working group member, I lean towards accepting this proposal. It is a great step towards enabling smart contracts to interact with offchain computation while still preserving the call stack.

@mfornet
Copy link
Member

mfornet commented Mar 7, 2024

As a working group member, I lean toward approving this NEP.

This NEP solves a problem that can't be solved today, which is having simultaneously:

  • Delayed responses
  • Guaranteed execution of callback
  • Abstraction from the caller about the delayed response.

Delayed responses are particularly useful to seamlessly use off-chain computation on-chain. This could involve for example oracle queries (eg fetch_tweet(tweet_id)) or privacy protocol execution such as signing some data offchain.

@mm-near
Copy link

mm-near commented Mar 7, 2024

As a working group member, I lean toward approving this NEP.

  • one small comment: in comments you say:
/// `data_id_len` and `data_it_ptr`: Used to pass the unique resumption token
/// that was returned to the smart contract in the `promise_yield_create()`
/// function.

But promise_yield_create seems to be returning u64

@akhi3030
Copy link
Contributor Author

akhi3030 commented Mar 7, 2024

* one small comment: in comments you say:
/// `data_id_len` and `data_it_ptr`: Used to pass the unique resumption token
/// that was returned to the smart contract in the `promise_yield_create()`
/// function.

But promise_yield_create seems to be returning u64

Yeah, the comment is referring to the resumption token returned in the register. I'll update the comment to address this.

@walnut-the-cat
Copy link
Contributor

can we merge this into master?

@akhi3030
Copy link
Contributor Author

@near/nep-moderators: please review and approve the PR. The work was just released to the mainnet.

@akhi3030 akhi3030 merged commit 48c9f16 into master Jun 18, 2024
5 of 6 checks passed
@akhi3030 akhi3030 deleted the yield-execution branch June 18, 2024 14:29
@flmel flmel added S-approved A NEP that was approved by a working group. and removed S-review/needs-wg-to-assign-sme A NEP that needs working group to assign two SMEs. labels Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-NEP A NEAR Enhancement Proposal (NEP). S-approved A NEP that was approved by a working group. WG-protocol Protocol Standards Work Group should be accountable
Projects
Status: APPROVED NEPs
Development

Successfully merging this pull request may close these issues.