Home

Overview

The Durable Task Framework provides developers a means to write code orchestrations in C# using the .NET Task framework and the async/await keywords added in .NET 4.5.

Here are the key features of the durable task framework:

Definition of code orchestrations in simple C# code
Automatic persistence and check-pointing of program state
Versioning of orchestrations and activities
Async timers, orchestration composition, user aided checkpointing

The framework itself is very light weight and only requires an Azure Service Bus namespace and optionally an Azure Storage account. Running instances of the orchestration and worker nodes are completely hosted by the user. No user code is executing ‘inside’ Service Bus.

Problem Statement

Many scenarios involve updating state or executing actions in multiple places in a transactional manner. E.g. debiting some amount of money from some account in database A and crediting it to some other account in database B needs to be done atomically. This consistency can be achieved by using a distributed transaction where this transaction would span the debit & credit operations against database A and B respectively.

However, for strict consistency, transactions imply locks and locks are detrimental for scale as subsequent operations that require the same lock would be blocked until the lock is released. This becomes a big scale bottleneck for cloud services which are designed to be highly available as well as consistent. Furthermore, even if we decided that we could take the hit of a distributed transaction, we’d find that almost none of the cloud services actually supported distributed transactions (or even simple locking for that matter).

The alternate model for achieving consistency is by executing the business logic for debit and credit within a durable workflow. In this case the workflow will do something like this in pseudo-code:

debit from some account in DB A
if the debit was successful then:
credit to some account in DB B
if the above failed then keep retrying until some threshold
if the credit still failed then undo the debit from DB A as well and send notification email

In the happy path, this will give us ‘eventual’ consistency. I.e. after (1) the overall system state becomes inconsistent but it will become consistent eventually after the workflow is completed. However, in the unhappy path a number of things can go wrong; the node executing the pseudo-code can crash at an arbitrary point, debit from DB A can fail or credit to DB B can fail. In these cases, to maintain consistency we must ensure the following:

The debit and credit operations are idempotent i.e. re-executing the same debit or credit operation would become no-ops.
If the executing node crashes it would restart from the last place where we did a successful durable operation (e.g. #1 or #2a above)

From these two items, (1) can only be supplied by the debit/credit activity implementation. Item (2) can also be done via code by keeping track of the current position in some database. But this state management becomes a hassle especially when the number of durable operations grows. This is where a framework to do automatic state management would greatly simplify the experience of building a code based workflow.