Releases: dotnet/orleans
v9.0.1
What's Changed
- Change EventCounterIntervalSec to 1 sec to fix issue caused in dotnet-counters by @ntovas in #9235
- Downgrade dependencies to .NET 8.0 group by @ReubenBond in #9246
Full Changelog: v9.0.0...v9.0.1
v9.0.0
What's Changed since v8.2.0
- Switch to pre-built Cassandra Docker image by @rkargMsft in #9076
- Update build for dnceng by @benjaminpetit in #9086
- Enable nightly on dnceng by @benjaminpetit in #9091
- Set GDN_SUPPRESS_FORKED_BUILD_WARNING to true by @benjaminpetit in #9088
- F#: Fixing code generation for enum style discriminated unions and for discriminated unions with 4 of more cases by @gfix in #9095
- Update nightly feed to point to dnceng by @ReubenBond in #9098
- Bump Aspire and Azure.Core versions by @adityamandaleeka in #9099
- Update Azure nuget packages by @benjaminpetit in #9089
- Throw exception on null/empty/whitespace key for
IGrainWithStringKey
by @ledjon-behluli in #9111 - Fix potential
ArgumentOutOfRangeException
inLeaseBasedQueueBalancer
by @ledjon-behluli in #9112 - Clustering: ensure snapshots are globally uniform by @ReubenBond in #9115
- Fix race condition in PersistentStreamPullingAgent by @benjaminpetit in #9109
- Allow deactivation to be triggered during activation, clean up deactivation flow by @ReubenBond in #9116
- Support triggering migration during deactivation by @ReubenBond in #9117
- Replace custom task extensions with built-in counterparts by @ReubenBond in #9118
- Simplify Silo lifecycle logic by @ReubenBond in #9120
- InMemoryTransportConnection: graceful tear down by @ReubenBond in #9122
- Add BitArray serialization codec by @ReubenBond in #9121
- Fix NRE in SiloControl introduced by #9120 by @ReubenBond in #9123
- Avoid mutating
IPEndPoint
inRedisGatewayListProvider
by @ReubenBond in #9146 - Update documentation in IReminderTable by @rubenwe in #9132
- When not being to find a gateway, changed error to warning closes 9133 by @rbuergi in #9134
- Add IClusterConnectionStatusObserver support by @galvesribeiro in #9145
- Revert "Add IClusterConnectionStatusObserver support" until InProcessTestCluster is merged by @ReubenBond in #9157
- Add in-process test cluster by @ReubenBond in #9148
- Client connection status observer by @galvesribeiro in #9158
- Avoid exceptions accessing nonexistent etags in RedisGrainStorage by @hendrikdevloed in #9147
- Fix support for externally declared records by @kzu in #9104
- Support MessagePack union types by @wassim-k in #9151
- Allow empty IdSpan by @ReubenBond in #9175
- Log argument types instead of values by @ReubenBond in #9176
- Upgrade MicroBuild task by @ReubenBond in #9173
- Fix JSON serialization of
GrainReference
s with null or emptyGrainInterfaceType
by @ReubenBond in #9178 - [Cassandra] Safer consistency settings by @rkargMsft in #9171
- Azure DevOps: upload logs, blame/crash dumps, and publish to nuget by @ReubenBond in #9181
- Auto-generated baselines by 1ES Pipeline Templates by @dotnet-policy-service in #9183
- Strong consistency, distributed, in-memory grain directory by @ReubenBond in #9103
- Remove chatty trace logging from transaction tests by @ReubenBond in #9187
- Gracefully dispose IAsyncEnumerable requests by @ReubenBond in #9186
- Activation Rebalancing by @ledjon-behluli in #9140
- Fix stateless worker race condition causing activation directory leak by @EdeMeijer in #9190
- Ignore 404 when deleting defunct K8S pod by @tomachristian in #9194
- Do not override
SendingSilo
when sending messages to clients by @ReubenBond in #9189 - Improve timely shutdown of directory partitions when snapshot transfer has been abandoned by @ReubenBond in #9197
- Enable TSA upload by @wtgodbe in #9210
- Correctly route undeserializable response messages from external clients by @ReubenBond in #9212
- Dispose
ClientClusterManifestProvider
whenOutsideRuntimeClient
is stopping by @ledjon-behluli in #9211 - Update distributed tests to use AAD auth by @benjaminpetit in #9207
- Implement IBaseCodec on CollectionCodec by @Chris-Eckhardt in #9209
- Randomize CorrelationId generation per host by @ReubenBond in #9213
- Evict silos from cluster if they remain in the Joining or Created state for longer than MaxJoinAttemptTime by @Chris-Eckhardt in #9201
- Describe the design of the grain directory by @ReubenBond in #9223
- Enable to activate Grain after clearing its state by @scalalang2 in #9165
- Makes
ImmovableAttribute
configurable by @ledjon-behluli in #9205 - ADO.NET
IHashPicker
customization API + Orleans v3-compatibleIHashPicker
implementation by @vladislav-prishchepa in #9217 - Reduce log noise and improve formatting by @ReubenBond in #9227
- Improve ActivationMigrationManager shutdown resilience and responsiveness by @ReubenBond in #9229
- CI: Use correct variable syntax for Azure DevOps by @ReubenBond in #9230
- Update for .NET 9.0 by @ReubenBond in #9232
New Contributors
- @rubenwe made their first contribution in #9132
- @rbuergi made their first contribution in #9134
- @hendrikdevloed made their first contribution in #9147
- @kzu made their first contribution in #9104
- @wassim-k made their first contribution in #9151
- @tomachristian made their first contribution in #9194
- @wtgodbe made their first contribution in #9210
- @Chris-Eckhardt made their first contribution in #9209
- @scalalang2 made their first contribution in #9165
- @vladislav-prishchepa made their first contribution in #9217
Full Changelog: v8.2.0...v9.0.0
v7.2.7
What's Changed
- Fix potential grain timer deadlock during disposal by @ReubenBond in #8951
- [7.x] Cherry-picked commits from [main] by @ReubenBond in #8995
- [7.x] Fix build + signing by @ReubenBond in #9174
- Log argument types instead of values by @ReubenBond in #9177
- [7.x] Azure DevOps: upload logs, blame/crash dumps, and publish to nuget by @ReubenBond in #9180
- [7.x] Fix SourceLink repository by @ReubenBond in #9182
Full Changelog: v7.2.6...v7.2.7
v8.2.0
New features
Activation repartitioning
ActivationRepartitioning.mp4
Above: a demonstration showing Activation Repartitioning in action. The red lines represent cross-silo communication. As the red lines are eliminated by the partitioning algorithm, throughput improves to over 2x the initial throughput.
Ledjon Behluli and @ReubenBond implemented activation repartitioning in #8877. When enabled, activation repartitioning collocates grains based on observed communication patterns to improve performance while keeping load balanced across your cluster. In initial benchmarks, we observe throughput improvements in the range of 30% to 110%. The following paragraphs provide more background and implementation details for those who are interested. The feature is currently experimental and to enable it you need to opt-in on every silo in your cluster using the ISiloBuilder.AddActivationRepartitioner()
extension method, suppressing the experimental feature warning:
#pragma warning disable ORLEANSEXP001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
siloBuilder.AddActivationRepartitioner();
#pragma warning restore ORLEANSEXP001 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.
The fastest and cheapest grains calls are ones which don't cross process boundaries. These grain calls do not need to be serialized and do not need to incur network transmission costs. For that reason, collocating related grains within the same host can significantly improve the performance of your application. On the other hand, if all grains were placed in a single host, that host may become overloaded and crash, and you would not be able to scale your application across multiple hosts. How can we maximize collocation of related grains while keeping load across your hosts balanced? Before describing our solution, we need to provide some background.
Grain placement in Orleans is flexible: Orleans executes a user-defined function when deciding where in a cluster to place each grain, providing your function with a list of the compatible silos in your cluster, that is, the silos which support the grain type and interface version which triggered placement. Grains calls are location-transparent, so callers do not need to know where a grain is located, allowing grains to be placed anywhere across your cluster of hosts. Each grain's current location is stored in a distributed directory and lookups to the directory are cached for performance.
Resource-optimized placement was implemented by @ledjon-behluli in #8815. Resource-optimized placement uses runtime statistics such as total and available memory, CPU usage, and grain count, collected from all hosts in the cluster, smooths them, and combines them to calculate a load score. It selects the least-loaded silo from a subset of hosts to balance load evenly across the cluster[^4]. If the load score of the local silo is within some configured range of the best candidate's load score, the local silo is chosen preferentially. This improves grain locality by leveraging the knowledge that the local silo initiated a call to the grain and therefore has some relation to that grain.
Ledjon wrote more about Resource-optimized placement in this blog post.
Originally, there was no straightforward way to move an active grain from one host to another without needing to fully deactivate the grain, unregister it from the grain directory, contend with concurrent callers on where to place the new activation, and reload its state from the database when the new activation is created. Live grain migration was introduced in #8452, allowing grains to transparently migrate from one silo to another on-demand without needing to reload state from the database, and without affecting pending requests. Live grain migration introduced two new lifecycle stages: dehydration and rehydration. The grain's in-memory state (application state, enqueued messages, metadata) is dehydrated into a migration packet which is sent to the destination silo where it's rehydrated. Live grain migration provided the mechanism for grains to migrate across hosts, but did not provide any out-of-the-box policies to automate migration. Users trigger grain migration by calling this.MigrateOnIdle()
from within a grain, optionally providing a placement hint which the grain's configured placement director can use to select a destination host for the grain activation.
Finally, we have the pieces in place for activation repartitioning: grain activations are load-balanced across the cluster, and they are able to migrate from host to host quickly. While live grain migration gives developers a mechanism to migrate grain activations from one host to another, it does not provide any automated policy to do so. Remember, we want grains to be balanced across the cluster and collocated with related grains to reduce networking and serialization cost. This is a difficult challenge since:
- An application can have millions of in-memory grains spread across tens or hundreds of silos.
- Each grain can message any other grain.
- The set of grains which each grain communicates with can change from minute to minute. For example, in an online game, player grains may join one match and communicate with each other for some time and then join a different match with an entirely different set of players afterwards.
- Computing the minimum edge-cut for an arbitrary graph is NP-hard.
- No single host has full knowledge of which grains are hosted on which other host and which grains they communicate with: the graph is distributed across the cluster and changes dynamically.
- Storing the entire communication graph in memory could be prohibitively expensive.
Folks at Microsoft Research studied this problem and proposed a solution in a paper titled Optimizing Distributed Actor Systems for Dynamic Interactive Services. The paper, dubbed ActOp, proposes a decentralized approximate solution which achieves good results in their benchmarks. Their implementation was never merged into Orleans and we were unable to find the original implementation on Microsoft's internal network. So, after first implementing resource-optimized placement, community contributor @ledjon-behluli set out to implement activation repartitioning from scratch based on the ActOp paper. The following paragraphs describe the algorithm and the enhancements we made along the way.
The activation repartitioning algorithm involves pair-wise exchange of grains between two hosts at a time. Silos compute a candidate set of grains to send to a peer, then the peer does similarly, and uses a greedy algorithm to determine a final exchange set which minimizes cost while keeping silos balanced.
To compute the candidate sets, silos track which grains communicate with which other grains and how frequently. The whole graph would be unwieldy, so we only maintain the top-K communication edges using a variant of the Space-Saving[^1] algorithm. Messages are sampled via a multi-producer, single consumer ring buffer which drops messages if the partition is full. They are then processed by a single thread, which yields frequently to give other threads CPU time. When the distribution has low skew and the K parameter is fairly small, Space-Saving can require a lot of costly shuffling at the bottom of its max-heap (we use the heap variant to reduce memory). To address this, we use Filtered Space-Saving[^2] instead of Space-Saving. Filtered Space-Saving involves putting a 'sketch' data structure at the bottom of the max heap for the lower end of the distribution, which can greatly reduce churn at the bottom and improve performance by up to ~2x in our tests.
If the top-K communication edges are all internal (eg, because the algorithm has already optimized partitioning somewhat), silos won't find many good transfer candidates. We need to track internal edges to work out which grains should/shouldn't be transferred (cost vs benefit). To address this, we introduced a bloom filter to track grains where the cost of movement is greater than the benefit, removing them from the top-K data structure. From our experiments, this works very well with even a 10x smaller K. This performance improvement will come with a reduced ability to handle dynamic graphs, so in the future we may need to implement a decay strategy to address this as the bloom filter becomes saturated. To improve lookup performance, @ledjon-behluli implemented a blocked bloom filter[^3], which is used inste...
v8.2.0-preview1
What's Changed
- Fix potential grain timer deadlock during disposal by @ReubenBond in #8950
- Add missing description node to XML docs by @scottaddie in #8959
- Clean up
SafeTimer
usage, replace withPeriodicTimer
where possible by @ReubenBond in #8953 - Fix capitalization of 'MachineName' structured logging parameter by @ReubenBond in #8980
- Ensure PeriodicTimer period >= 1ms by @ReubenBond in #8981
- Ensure reminder table is initialized before access by @ReubenBond in #8982
- Update Npgsql by @ReubenBond in #8994
- Fix serialization of types inheriting from
Dictionary<K,V>
which add values in their constructor by @ReubenBond in #8993 - Prevent generated types from appearing in IDE by @ReubenBond in #8987
- Use dotnet-public instead of nuget.org by @benjaminpetit in #8931
- Add Orleans.Runtime to implicit usings by @ReubenBond in #8996
- Stop watchdog when container is disposed by @ReubenBond in #8998
- Stop silo on Dispose by @ReubenBond in #9000
- Dispose all activations when host is disposed by @ReubenBond in #9001
- Dispose cluster & silo health monitors are disposed when host is disposed, and clean up code by @ReubenBond in #8999
- Unsubscribe
ConsistentRingProvider
&VirtualBucketsRingProvider
fromISiloStatusOracle
on shutdown by @ReubenBond in #8997 - Avoid unnecessary
Interlocked.Or
inSingleWaiterAutoResetEvent
by @ReubenBond in #9003 - test(codegen): add derived from list by @claylaut in #8858
- Add serialization support for types derived from
List<T>
andHashSet<T>
by @ReubenBond in #9005 - Use
PeriodicTimer
instead ofGrainTimer
inLeaseBasedQueueBalancer
by @ReubenBond in #9002 - Updatable grain timers by @ReubenBond in #8954
- Fix streaming config validator registration by @benjaminpetit in #8876
- Update samples README.md to point to samples repo & explorer by @ReubenBond in #9010
- [CodeGen] Always specify grain extension interface for grain extension calls by @ReubenBond in #9009
- Fix silo shutdown logging when silo is already shutting down. by @ReubenBond in #9013
- Fix perf of PooledBufferTests by @ReubenBond in #9015
- Fix termination condition in ActivationMigrationManager.AcceptMigratingGrains by @ReubenBond in #9017
- Improve
ActivationData
shutdown process by @ReubenBond in #9018 - Exclude explicitly implemented interface methods from proxy by @alrz in #8992
- Consider interface method accessibility when generating the invoker by @alrz in #9019
New Contributors
- @scottaddie made their first contribution in #8959
- @alrz made their first contribution in #8992
Full Changelog: v8.1.0...v8.2.0-preview1
v3.7.2
What's Changed
- [3.x] Fix directory/cache validation for defunct silos by @ReubenBond in #8498
- [3.x] Fix potential grain timer deadlock during disposal by @ReubenBond in #8949
- [3.x] Ensure reminder service is initialized before access by @ReubenBond in #8983
Full Changelog: v3.7.1...v3.7.2
v8.1.0
New features
Integration with Aspire
This release includes initial integration with .NET Aspire, allowing you to configure an Orleans cluster in your Aspire app host, specifying the resources the cluster uses. For example, you can specify that an Azure Table will be used for cluster membership, an Azure Redis resource will be used for the grain directory, and an Azure Blob Storage resource will be used to store grain state. The integration currently supports Redis and Azure Table & Blob storage resources. Support for other resources will be added later.
In the app host project, an Orleans cluster can be declared using the AddOrleans
method, and then configured with clustering, grain storage, grain directory, and other providers using methods on the returned builder:
var storage = builder.AddAzureStorage("storage");
var clusteringTable = storage.AddTables("clustering");
var defaultStorage = storage.AddBlobs("grainstate");
var cartStorage = builder.AddRedis("redis-cart");
var orleans = builder.AddOrleans("my-app")
.WithClustering(clusteringTable)
.WithGrainStorage("Default", grainStorage)
.WithGrainStorage("cart", cartStorage);
// Add a server project (also called "silo")
builder.AddProject<Projects.OrleansServer>("silo")
.WithReference(orleans);
// Add a project with a reference to the Orleans client
builder.AddProject<Projects.FrontEnd>("frontend")
.WithReference(orleans);
In the client and server projects, add Orleans to the host builder as usual.
// For an Orleans server:
builder.UseOrleans();
// Or, for an Orleans client:
builder.UseOrleansClient();
Orleans will read configuration created by your Aspire app host project and configure the providers specified therein. To allow Orleans to access the configured resources, add them as keyed services using the corresponding Aspire component:
builder.AddKeyedAzureTableService("clustering");
builder.AddKeyedAzureBlobService("grainstate");
builder.AddKeyedRedis("redis-cart");
Resource-optimized placement
Resource-optimized placement, enabled via the [ResourceOptimizedPlacement]
attribute on a grain class, balances grains across hosts based on available memory and CPU usage. For more details, see the PR: #8815.
What's Changed
Since 8.1.0-preview3
- Migrate to build template by @benjaminpetit in #8905
- Remove mention of build.sh from README.md by @ardrabczyk in #8901
- Fix build template for nightly by @benjaminpetit in #8909
- Add mirror pipeline by @benjaminpetit in #8918
- Fix nightly nuget publishing by @benjaminpetit in #8919
- Fix mirror pipeline by @benjaminpetit in #8923
- Do not trigger mirror for PR by @benjaminpetit in #8924
- Enable TCP keep-alive on all sockets by default by @ReubenBond in #8927
- Fix typos in SQSStorage.cs by @Malpp in #8933
- Add more tests for [Alias("x")] attribute by @ReubenBond in #8926
- Update Azure.Identity by @ReubenBond in #8942
- Serialization doc fixes, add missing tests, fix PooledBufferCodec by @ReubenBond in #8943
- Fix some reference documentation warnings by @ReubenBond in #8941
- Always read grain state during activation if it has not been rehydrated by @ReubenBond in #8944
Additional changes since 8.0.0
- Clean up WorkItemGroup and ActivationTaskScheduler logic by @ReubenBond in #8865
- Orleans is now officially supported by Microsoft by @ReubenBond in #8882
- Avoid over-counting stateless worker activations by @ReubenBond in #8886
- Move
ResourceOptimizedPlacementOptions
toOrleans.Configuration
by @ledjon-behluli in #8892 - Guard against null grain context in test code by @ReubenBond in #8897
- Use GrainDirectoryCacheFactory to construct a IGrainDirectoryCache by @Mostafa-Goher in #8844
- Ensure
StatelessWorkerAttribute.MaxLocal
property is accounted for by @ReubenBond in #8885 - Make EventHub tests more reliable by @benjaminpetit in #8889
- Fix
PooledBuffer
serialization by @ReubenBond in #8852 - Prepare the FabricBot config for migration to Policy Service by @jeffhandley in #8855
- Address IDE0038. Use pattern matching. by @IEvangelist in #8619
- Always reset
RuntimeContext
to previous value after use by @ReubenBond in #8864 - Distributed Tracing: Use recommended conventions by @ReubenBond in #8856
- FabricBot: Onboarding to GitOps.ResourceManagement because of FabricBot decommissioning by @dotnet-policy-service in #8869
- Clarify [AlwaysInterleave] interleaves with anything, including itself by @ReubenBond in #8804
- Adds
LinearBackoffClientConnectionRetryFilter
in the default client services by @ledjon-behluli in #8793 - Microsoft.Extensions.Configuration support by @ReubenBond in #8764
- Avoid constant try/catch on a non-existing gateway for external cluster client by @ledjon-behluli in #8792
- Add Analyzer and CodeFix for duplicate method aliases (ORLEANS0011) by @ledjon-behluli in #8662
- Adds code fixer for: Report error on duplicate [Id(x)] (ORLEANS0012) by @ledjon-behluli in #8808
- Make repeatable the execution of SQLServer Ado scripts without errors by @m3nax in #8799
GenerateAliasAttribtuesAnalyzer
needs to account for file-scoped namespaces by @ledjon-behluli in #8809- Add nightly feed publishing by @ReubenBond in #8810
- Update README.md to include new nightly build feed details by @ReubenBond in #8814
- Resource optimized placement strategy by @ledjon-behluli in #8815
- Upgrade
System.Data.SqlClient
by @ledjon-behluli in #8821 - Provide cross-platform environment statistics collection + Modify
OverloadDetector
to account for memory too by @ledjon-behluli in #8820 - Centralize environment statistics filtering by @ledjon-behluli in #8827
- Add support for enabling distributed tracing via configuration switch by @ReubenBond in #8829
- Add default Redis options when Redis is configured via keyed service by @ReubenBond in #8847
- Fix insert condition check in
RedisMembershipTable
by @ReubenBond in #8848 - Downgrade Microsoft.CodeAnalyis to v4.5.0 by @ReubenBond in #8849
New Contributors
- @ardrabczyk made their first contribution in #8901
- @Malpp made their first contribution in #8933
- @Mostafa-Goher made their first contribution in #8844
- @dotnet-policy-service made their first contribution in #8869
- @m3nax made their first contribution in #8799
Full Changelog: v8.0.0...v8.1.0
v7.2.6
What's Changed
- Fix StatelessWorker MaxLocal for Orleans 7.x by @isaachili in #8887
- [7.x] Avoid over-counting stateless worker activations (#8886) by @ReubenBond in #8890
- [7.x] Use GrainDirectoryCacheFactory to construct a IGrainDirectoryCache (#8844) by @ReubenBond in #8898
New Contributors
- @isaachili made their first contribution in #8887
Full Changelog: v7.2.5...v7.2.6
v8.1.0-preview3
What's Changed
- Clean up WorkItemGroup and ActivationTaskScheduler logic by @ReubenBond in #8865
- Orleans is now officially supported by Microsoft by @ReubenBond in #8882
- Avoid over-counting stateless worker activations by @ReubenBond in #8886
- Move
ResourceOptimizedPlacementOptions
toOrleans.Configuration
by @ledjon-behluli in #8892 - Guard against null grain context in test code by @ReubenBond in #8897
- Use GrainDirectoryCacheFactory to construct a IGrainDirectoryCache by @Mostafa-Goher in #8844
- Ensure
StatelessWorkerAttribute.MaxLocal
property is accounted for by @ReubenBond in #8885 - Make EventHub tests more reliable by @benjaminpetit in #8889
New Contributors
- @Mostafa-Goher made their first contribution in #8844
Full Changelog: v8.1.0-preview2...v8.1.0-preview3
v8.1.0-preview2
What's Changed
- Fix
PooledBuffer
serialization by @ReubenBond in #8852 - Prepare the FabricBot config for migration to Policy Service by @jeffhandley in #8855
- Address IDE0038. Use pattern matching. by @IEvangelist in #8619
- Always reset
RuntimeContext
to previous value after use by @ReubenBond in #8864 - Distributed Tracing: Use recommended conventions by @ReubenBond in #8856
- FabricBot: Onboarding to GitOps.ResourceManagement because of FabricBot decommissioning by @dotnet-policy-service in #8869
New Contributors
- @dotnet-policy-service made their first contribution in #8869
Full Changelog: v8.1.0-preview1...v8.1.0-preview2