Skip to content

Latest commit

 

History

History
308 lines (226 loc) · 10.2 KB

20230511-BCIRS-error-handling-codes.md

File metadata and controls

308 lines (226 loc) · 10.2 KB

Dapr Error Handling/Codes

  • Author(s): Roberto J. Rojas
  • State: Draft
  • Updated: 5/11/2023

Overview

Across Dapr errors are surfaced for different conditions, without consistent messages, details of the error, standard formats, and no clear indication of what/where the error initiated.

This makes troubleshooting and debugging quite difficult and requires a deep understanding of the parts of Dapr and how those parts interact with each other.

To help with the issues raised above, it would be ideal if a solution could provide:

  • Greater details of errors that occured.
  • Error details in a structured format.
  • Consistency in the error details.
  • An indication where within the Dapr execution (Init, Runtime, Components, SDKs, etc...) the error occurred.

Background

Related Items

Related proposals

Related issues

dapr/dapr#6068

Expectations and alternatives

Implementation Details

Design

Solution

Utilize and follow the gRPC Richer Error Model and Google API Errors Model in the Design Guide

Error Code Standard

The Google API Error Model has the following Protobuf format:

package google.rpc;

// The `Status` type defines a logical error model that is suitable for
// different programming environments, including REST APIs and RPC APIs.
message Status {
  // A simple error code that can be easily handled by the client. The
  // actual error code is defined by `google.rpc.Code`.
  int32 code = 1;

  // A developer-facing human-readable error message in English. It should
  // both explain the error and offer an actionable resolution to it.
  string message = 2;

  // Additional error information that the client code can use to handle
  // the error, such as retry info or a help link.
  repeated google.protobuf.Any details = 3;
}

Here is one of the possible details that can be added to the above error structure. This is defined in the error_details.proto Protobuf

message ErrorInfo {
  // The reason of the error. This is a constant value that identifies the
  // proximate cause of the error. Error reasons are unique within a particular
  // domain of errors. This should be at most 63 characters and match a
  // regular expression of `[A-Z][A-Z0-9_]+[A-Z0-9]`, which represents
  // UPPER_SNAKE_CASE.
  string reason = 1;

  // The logical grouping to which the "reason" belongs. The error domain
  // is typically the registered service name of the tool or product that
  // generates the error. Example: "pubsub.googleapis.com". If the error is
  // generated by some common infrastructure, the error domain must be a
  // globally unique value that identifies the infrastructure. For Google API
  // infrastructure, the error domain is "googleapis.com".
  string domain = 2;

  // Additional structured details about this error.
  //
  // Keys should match /[a-zA-Z0-9-_]/ and be limited to 64 characters in
  // length. When identifying the current value of an exceeded limit, the units
  // should be contained in the key, not the value.  For example, rather than
  // {"instanceLimit": "100/request"}, should be returned as,
  // {"instanceLimitPerRequest": "100"}, if the client exceeds the number of
  // instances that can be created in a single (batch) request.
  map<string, string> metadata = 3;
}

Error Status

The properties of the google.rpc.Status will be populated as following:

Below is partial table of the Standard Error code provided by gRPC and how they map to HTTP error codes. The entire list can found in the following links:

HTTP gRPC Description
200 OK No error.
400 INVALID_ARGUMENT Client specified an invalid argument. Check error message and error details for more information.
400 FAILED_PRECONDITION Request can not be executed in the current system state, such as deleting a non-empty directory.
400 OUT_OF_RANGE Client specified an invalid range.
401 UNAUTHENTICATED Request not authenticated due to missing, invalid, or expired authorization credentials.
403 PERMISSION_DENIED Client does not have sufficient permission.
404 NOT_FOUND A specified resource is not found.
409 ABORTED Concurrency conflict, such as read-modify-write conflict.

ErrorInfo (Required)

The properties of the type.googleapis.com/google.rpc.ErrorInfo will be populated as following:

  • Reason - A combination of prefix from prefix of the table below plus the error condition code.

    Example: "DAPR_STATE_" + "ETAG_MISMATCH"

  • Domain - With the value dapr.io.

  • Metadata - A key/value map/dictionary data relevant to the error condition.

**Note:** The metadata property retriable with a truthable value("true", "false", "True", "False", "TRUE", "FALSE", "1", "0") is required.

ResourceInfo (Optional)

The properties of the type.googleapis.com/google.rpc.ResourceInfo will be populated as following:

  • ResourceType - The building block type with version.

    Example: "state.redis/v1"

  • ResourceName - The component name.

    Example: "my-component-name"

  • Owner - The owner of the component.

  • Description - Resource descrpition or error details.

Error Details Prefixes

The following tables shows the propsosed error codes prefixes used in the reason for the google.rpc.ErrorInfo for various Dapr building blocks:

INIT

Dapr Module Prefix
CLI DAPR_CLI_INIT_*
Self-hosted DAPR_SELF_HOSTED_INIT_*
K8S DAPR_K8S_INIT_*
Invoke DAPR_INVOKE_INIT_*

RUNTIME

Dapr Module Prefix
CLI DAPR_RUNTIME_CLI_*
Self-hosted DAPR_SELF_HOSTED_*
dapr-2-dapr(gRPC) DAPR_RUNTIME_GRPC_*

COMPONENTS

Dapr Module Prefix
PubSub DAPR_PUBSUB_*
StateStore DAPR_STATE_*
Bindings DAPR_BINDING_*
SecretStore DAPR_SECRET_*
ConfigurationStore DAPR_CONFIGURATION_*
Lock DAPR_LOCK_*
NameResolution DAPR_NAME_RESOLUTION_*
Middleware DAPR_MIDDLEWARE_*

The following snippet shows an error status returned due to a ETAG_MISMATCH error condition. The reason is populated with PREFIX+ERROR_CONDITION:

{
  "code": 3,
  "message":  "possible etag mismatch. error from state store",
  "details": [
    {
      "@type": "type.googleapis.com/google.rpc.ErrorInfo",
      "reason": "DAPR_STATE_ETAG_MISMATCH",
      "domain": "dapr.io",
      "metadata": {
        "key": "myapp||name"
      }
    },
    {
      "@type": "type.googleapis.com/google.rpc.ResourceInfo",
      "resource_type": "state.redis/v1",
      "resource_name": "my-component",
      "owner": "",
      "description": "possible etag mismatch. error from state store"
    }
  ]
}

Sample Code Snippet (Go)

import (
   ...
   "google.golang.org/genproto/googleapis/rpc/errdetails"
   "google.golang.org/grpc/codes"
   "google.golang.org/grpc/status"
   ...
)
...
if req.ETag != nil {
  ...
  ste := status.Newf(codes.InvalidArgument, messages.ErrStateGet, in.Key, in.StoreName, err.Error())
  ei := errdetails.ErrorInfo{
	    Domain: "dapr.io",
      Reason: "DAPR_STATE_ETAG_MISMATCH",
      Metadata: map[string]string{
            "storeName": in.StoreName,
      },
  }
  ri := errdetails.ResourceInfo{
      ResourceType: "state.redis/v1",
      ResourceName: "my-redis-component",
      Owner:        "user",
      Description:  "possible etag mismatch. error from state store",
	}
  ste, err2 := ste.WithDetails(&ei, &ri)
  ...
  return ste.Err()
}

Pros

  • Since the Dapr Runtime is using protocol buffers as the data format, support for the richer error model is already included in most of the gRPC implementations.
  • This would help minimize the changes with the Dapr ecosystem.
  • This solution could be used to programmatically react to errors as it provides a standard structure for the errors with details.

Cons

  • Dependencies on gPRC richer error model.
  • Need to test gRPC implementations support for all Dapr SDKs.

gRPC Richer Error Model POC

For the POC I've made changes to some parts of the Dapr modules (). The POC code can be found in my GH Repo under the branch error-codes-poc

These are the gRPC imports used:

import (
  ...
  "google.golang.org/genproto/googleapis/rpc/errdetails"
  "google.golang.org/grpc/codes"
  "google.golang.org/grpc/status"
  ...
)

The files changed for this POC:

https://github.com/robertojrojas/components-contrib/tree/error-codes-poc

  • state/redis/redis.go
  • state/store.go

https://github.com/robertojrojas/dapr-kit/tree/error-codes-poc

  • pkg/proto/customerrors/v1/customerrors.pb.go
  • proto/customerrors/v1/customerrors.proto
  • status/customerrorcodes.go
  • status/status.go

https://github.com/robertojrojas/dapr/tree/error-codes-poc

  • pkg/diagnostics/grpc_tracing.go
  • pkg/grpc/api.go
  • pkg/http/api.go
  • pkg/http/responses.go

https://github.com/robertojrojas/dapr-go-sdk/tree/error-codes-poc

  • client/state.go

https://github.com/robertojrojas/dapr-cli/tree/error-codes-poc

  • pkg/standalone/invoke.go

https://github.com/robertojrojas/dapr-dotnet-sdk/tree/error-codes-poc

  • src/Dapr.Client/DaprClientGrpc.cs

Feature lifecycle outline

Acceptance Criteria

Completion Checklist