Issue with pruning in Remember entities of Actor Cluster Sharding

Problem

In a 3-node Actor cluster, during a rolling restart of nodes, some of the newly created entity entries overwrite the old values for certain keys in Remember entities ORSet.
This issue when pruning occures after the restart. The shards in the cluster are completely rebalanced, and all entity actors are successfully re-created.
However, when retrieving the values using the Replicator Get operation, a difference is observed between the entries before and after the restart for some keys.

Cluster Configurations

akka.cluster.sharding.remember-entities = on
akka.cluster.sharding.remember-entities-store = ddata
akka.cluster.sharding.distributed-data {
            	durable.keys = []
            	gossip-interval = 500 ms
            	notify-subscribers-interval = 100 ms
            }
akka.remote.artery{
        enabled = on
        transport = tcp
}

When an entity is created, it is stored in an ORSet within the Replicator actor and replicated to other nodes. Each shard has 5 keys, and the entity ID is hashed and stored within these 5 keys.

We can retrieve the current remember entities stored in the Replicator using the Replicator Get operation.
Everything seems to be working fine until the nodes are restarted.

The issue here is

  • We perform a sequential rolling restart of all 3 nodes.
  • After restarting and joining the cluster, all the shards are rebalanced, and the entities are recreated correctly.
  • New entities are continuously created from another proxy node to the cluster.
  • After the rolling restart after some time around 5 mins ( max-pruning-dissemination = 300 s) the Rememeber Entities Count is incorrect for some keys
  • Upon further investigation, it is found that while pruning, some of the newly created entities overwrite their respective keys in ORSet.

Example :-

Node 1 restarted 18:58
Node 2 restarted 18:59
Node 3 restarted 19:00

  • At 19:06:xx - The live actor count for Shard_2 is 25.
  • At 19:06:xx - The remember entities count for Shard_2 is 25 (across 5 keys).
  • At 19:07:35 - Actor_23 is created, which belongs to Shard_2.
  • At 19:08:xx - The live actor count for Shard_2 is 26.
  • At 19:08:xx - The remember entities count for Shard_2 is 22 (across 5 keys).
    In Here when Actor_23 created, it overwrites the key Shard_2-1 from Shard_2-1[Actor_3,Actor_8,Actor_14,Actor_19] to Shard_2-1[Actor_23]

Suspects

  1. The pruning of ddata is happening on Node 1 only. Pruning of shards data is performed at 19:05:46.
  2. Before pruning the Shard_2 the Actor_23 is created. And after pruning done 19:05:46 only the new Actor values is present for the Key. From Shard_2-1[Actor_3,Actor_8,Actor_14,Actor_19] to Shard_2-1[Actor_23]
  3. However, this issue occurs only for some of the keys.

What could be causing this issue?

Version of Actor Cluster 2.5.32.
If any bugs or issues in this old version kindly share the bug link to understand before upgrading

Are there any configurable properties that can be adjusted to resolve this problem?

@patriknw @leviramsey @davidogren @johanandren kindly share your thougthts on this

Maybe I’m being pessimistic, but the reality is that you’ve been asking repeated questions about remember entities on this and other forums and the answer you’ve gotten over and over again for months is “there are known improvements in newer releases, you are on an unsupported version, try to reproduce on a supported version”.

I don’t want to be negative, but you are asking internet strangers (such as myself) to debug a very complicated problem on a version that hasn’t been supported for years. Please create a reproduction on a recent Akka release. You are running a six year old version of Akka. I know you want bug links to prove that upgrading will help you, but the reality is that you are asking for free help. The onus for showing that this wouldn’t be solved on a newer version of Akka is on you, not on me (or Lightbend).

I agree with David, we will not investigate something in Akka 2.5.32.