Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make core Error enums less specific #270

Closed
3 tasks done
Tracked by #554
plafer opened this issue Nov 29, 2022 · 7 comments · Fixed by #1350
Closed
3 tasks done
Tracked by #554

Make core Error enums less specific #270

plafer opened this issue Nov 29, 2022 · 7 comments · Fixed by #1350
Assignees
Labels
O: maintainability Objective: cause to ease modification, fault corrections and improve code understanding S: errors Scope: related to error handlings
Milestone

Comments

@plafer
Copy link
Contributor

plafer commented Nov 29, 2022

Follow-up to #255.

Our Error enums have too many variants. We should drastically condense them following the ideas described here.

Tasks

  1. O: code-hygiene O: maintainability O: usability S: errors
    seanchen1991
  2. S: errors rust
  3. S: errors
    seanchen1991
@plafer plafer added the O: new-feature Objective: aims to add new feature label Nov 29, 2022
@Farhad-Shabani Farhad-Shabani added S: errors Scope: related to error handlings O: maintainability Objective: cause to ease modification, fault corrections and improve code understanding and removed O: new-feature Objective: aims to add new feature labels Jan 5, 2023
@plafer
Copy link
Contributor Author

plafer commented Feb 8, 2023

A thought: ideally, internal functions that do use an Error enum would return an Error enum where the variants capture only the possible errors for that function. For example, verify_delay_passed()'s Error would only have the TimestampOverflow,NotEnoughTimeElapsed, and NotEnoughBlocksElapsed variants (or less if we don't care to be that specific). This provides better documentation of what can go wrong, and callers won't write code for errors that cannot happen.

@plafer plafer added this to the v0.30.0 milestone Feb 22, 2023
@Farhad-Shabani Farhad-Shabani modified the milestones: v0.30.0, v0.31.0 Feb 27, 2023
@plafer plafer added this to ibc-rs Feb 27, 2023
@plafer plafer self-assigned this Feb 27, 2023
@plafer plafer moved this to 🏗️ In Progress in ibc-rs Feb 27, 2023
@plafer
Copy link
Contributor Author

plafer commented Feb 27, 2023

In #164, there was some push-back against using anyhow for our errors. However, looking at projects that use ibc-rs, none of them actually make use of our Error enum:

In other words, programmatic logic based on which error occurred is non-existent in all these projects. This tells me that the most important use case for our errors is aiding in debugging; either host chain implementations, or relayers such as Hermes.

Therefore, I feel like our error system should focus on producing great human-readable error diagnostics, which is exactly what libraries such as anyhow are designed to do. It seems to me like the best way forward is to ignore the conventional wisdom that libraries shouldn't use anyhow, and use anyhow::Error as our main Error type.

I would love the hear people's thoughts on this!

cc @Farhad-Shabani @romac @yito88 @keppel

@Farhad-Shabani Farhad-Shabani modified the milestones: v0.31.0, v0.32.0 Feb 28, 2023
@Farhad-Shabani
Copy link
Member

My take, Anyhow has a concise syntax for propagating errors, cleaning up the code look, and is easier to implement since it doesn't care what error types your functions return. Though, this approach (freeform as mentioned) comes at a cost of not having a finely tuned error-handling system. We would be limited in the places where we might need to perform specific actions based on the type of error. It can sometimes even make debugging harder by providing unnecessary info/backtrace and representing all kinds of errors making it tricky to identify the exact causes.

Maybe you have a way of implementing anyhow in your mind? Perhaps not a bad idea to demo a part of our codebase if we think it works.

And, It would be great if you elaborate more on the limitation we face with the current error enum. Couldn’t we refactor it for a better one?

I see the fact you mentioned that IBC-rs errors are not used well in projects, but I think the main reason is they are still in development (e.g. bunch of unwrap() in those codebases) and don't have a refined error-handling system yet. Though, that's true for us too - we're still figuring out some of the designs, so our error enum isn't perfect either, but I don't think it necessarily is the reason why our current errors are not widely used.

Actually, I am a bit concerned, as IIRC we just revamped our error system. If we switch to Anyhow, that would be the second change. For me, It sounds like a more detailed study/proposal is required to make sure! Maybe other folks have excellent insights and ideas to make it an easy choice.

@romac
Copy link
Member

romac commented Mar 1, 2023

I agree with Fahrad, and believe he has a good point wrt codebases not handling errors properly when under heavy development.

Moreover, I too am not sure that I see how anyhow can improve the quality of the errors reported over custom error enums. Perhaps you could into more details about the problems you are seeing?

@plafer
Copy link
Contributor Author

plafer commented Mar 15, 2023

I believe I found a good strategy for a better error systems (discussion with @romac heavily influenced the design).

The new system recognizes 2 "error sources":

  • Host errors: the errors controlled by the host, coming from methods Validation/ExecutionContext
  • Protocol errors: the errors that come form IBC handlers' validation (e.g. channel not in the expected state)

Host errors

The host error type will be defined by the host. Specifically, we'd make it an associated type of ValidationContext:

trait ValidationContext {
  type Error;

  fn host_timestamp(&self) -> Result<Time, Self::Error>;
}

Protocol errors

The type for protocol errors would roughly be a cleaned up version of our current ContextError. The crucial differences for ContextError in this system compared to the current one are:

  1. we no longer need error variants for host errors in ContextError
  2. its purpose is solely to generate clear debugging error messages
  • that is, we don't expect users to handle these errors

Top-level error type

Our top-level error type, returned by dispatch(), would be:

enum Error<E> {
    Host(E),
    Protocol(CleanedUpContextError),
}

As an example, I would expect the user code to look like:

match dispatch(...) {
    Host(err) => /* optionally handle my errors */,
    Protocol(err) => /* log errors */
}

Internal logic based on errors from the host

This section borrows the great idea presented in this sled.rs's blog post.

In a few instances we need to perform some logic based on the specific error returned from the host. For example, when the packet commitment is not present during acknowledgement processing, we want to return early without an error. We currently return early for all errors, but ideally we'd distinguish the "packet commitment not present" error from all other errors, and only return early for the "packet commitment not present" error. Here's where we'd use sled's idea. get_packet_commitment would look like:

fn get_packet_commitment(
    &self,
    commitment_path: &CommitmentPath,
) -> Result<Result<PacketCommitment, CommitmentError>, Self::Error>;

enum CommitmentError {
    NotFound,
}

The benefit of this signature is that our internal code would look like:

// note how the outer `Self::Error` gets propagated out, and we 
// match on the inner error that we care about
if matches!(ctx_a.get_packet_commitment(&commitment_path_on_a)?, CommitmentError::NotFound) {
    return Ok(());
};

And that's it! I also expect cleaning up ContextError to solve #342.

@Farhad-Shabani
Copy link
Member

I liked the idea of separating the error from the sources. So, we only have to take care of ours without assuming anything about the hosts! That’s perfect in my view.

There are just a few questions it may help to clarify some of the details:

  • Given that hosts will be free to introduce their desired errors through the associated Error type under the ValidationContext, why is it needed to include the Host(E) variant, and basically having a top-level error type?

  • Regarding the "not present" state, I have a different perspective. In similar cases, unavailability shouldn't be treated as an error. Instead, it should be represented by None as a legitimate response from a storage call. Accordingly, users won’t have to deal with the additional overhead of introducing errors like the CommitmentError. Since there is a single acceptable error variant, we can automate this process in our boundary.
    I believe Result<Option<Output>, Error> is the correct signature, and I've expanded my view on this in [Context] Issues caused by improper output signatures of provable store readers #607

And a suggestion:

  • Some of the confusing error variants we already have are due to how we implemented certain logic. Prioritizing to study of error sources and the resolution of issues like 603, 607, 536 will streamline the error variants and reduce the burden of redesigning the system.

@plafer
Copy link
Contributor Author

plafer commented Apr 10, 2023

Given that hosts will be free to introduce their desired errors through the associated Error type under the ValidationContext, why is it needed to include the Host(E) variant, and basically having a top-level error type?

The "top-level" type is just the type that validate(), execute() and dispatch() return. We have 2 variants because 2 separates types of errors can occur (ValidationContext::Error and CleanedUpContextError), and both can be bubbled up and returned from one of the 3 calls. If you don't have a Host(E) variant, then how do you return Host errors?

Accordingly, users won’t have to deal with the additional overhead of introducing errors like the CommitmentError. Since there is a single acceptable error variant, we can automate this process in our boundary.

Errors such as CommitmentError are defined by us, not the user. The user only defines ValidationContext::Error; we define all the others.

In similar cases, unavailability shouldn't be treated as an error. Instead, it should be represented by None as a legitimate response from a storage call.

Returning an error doesn't mean it's "illegitimate". Result<Option<Output>, Error> is mathematically equivalent to Result<Result<PacketCommitment, CommitmentError>, Self::Error> (where CommitmentError is defined with just one variant); they're isomorphic. Choosing one over the other is a subjective decision; a matter of taste. I prefer Result<Result<PacketCommitment, CommitmentError>, Self::Error>, since it effectively avoids form of boolean blindness (which we could call "Option blindness"). Basically, reading

if matches!(ctx_a.get_packet_commitment(&commitment_path_on_a)?, CommitmentError::NotFound) {
    return Ok(());
};

tells me a ton more about what's going on; I see that the commitment was not found directly at the call site. Compare to your suggested signature:

if ctx_a.get_packet_commitment(&commitment_path_on_a)?.is_none() {
    return Ok(());
};

is_none() doesn't tell me the semantics of None in this case; I need to go read the docstring of the method to remember what None means in this case.

So using a Result doesn't mean the error is "illegitimate"; it's just an equivalent but more readable way of encoding the scenario where the packet commitment is not found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O: maintainability Objective: cause to ease modification, fault corrections and improve code understanding S: errors Scope: related to error handlings
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants